Querying the same compound in Materials Project and AFLOW returns different formation energies. Both values are correct. They used different DFT functionals, different pseudopotentials, different reference states. That’s not an error. That’s the thing you need to understand before you build a training set from any of them.

Materials Project, AFLOW, OQMD, and JARVIS-DFT each make different computational choices. Here’s what those choices mean for your workflow.

What each database actually offers

Materials Project

~155,000 inorganic compounds (check materialsproject.org for the current count; it grows regularly). The mp-api Python client is the best-designed API in this space: clean, typed, easy to filter on any property. Calculations use GGA+U for most compounds, with r2SCAN available for a growing subset.

The main strength is breadth: formation energy, bandgap (GGA and HSE06 for a subset), elastic properties, phonons, surface energies, defect calculations, and phase diagrams. No other open database covers as many property types with as much tooling built around them. Pymatgen, emmet, and the phase diagram app are all built against MP’s data model. If you’re doing thermodynamic analysis and want Python tools that already work, start here.

The tradeoff: MP has added new calculation workflows over time, so not all entries use the same functional. Track which have r2SCAN vs GGA+U if formation energy consistency across large datasets matters to your work.

AFLOW (Automatic FLOW)

~3.5 million entries, the largest of the four by a significant margin. AFLOW generates structures systematically from prototype crystal structures, which gives it exceptional coverage for alloys and intermetallics. Calculations are primarily PBE, applied consistently across entries.

For screening studies that need breadth across composition space, AFLOW’s catalog covers regions that MP simply doesn’t reach.

The AFLOWLIB REST API works but is verbose — plan on writing wrapper code. Per-entry property coverage is narrower than MP: primarily formation energy and a handful of electronic descriptors. If you need phonons, surfaces, or defect properties at scale, AFLOW doesn’t have them.

OQMD (Open Quantum Materials Database)

~1.4 million entries. OQMD’s defining feature is computational consistency: the entire database uses the same VASP settings, the same pseudopotentials, the same PBE-PAW setup. That makes it the most defensible choice for formation energy benchmarking.

If your research depends on comparing formation energies across many compounds (convex hull construction, thermodynamic stability screening, phase boundary analysis), OQMD is the right database. You’re comparing equivalent calculations, not correcting for functional differences between entries.

The API (Django REST) has limited filtering compared to mp-api. For exports above a few thousand entries, use the bulk download.

JARVIS-DFT (NIST)

~41,000 bulk crystalline materials plus ~1,000 low-dimensional structures, calculated with OptB88vdW and TBmBJ functionals. It’s the only major open database to include van der Waals corrections by default, which makes it specifically useful for 2D materials, layered compounds, and systems where dispersion interactions matter.

JARVIS also includes properties that don’t exist at this scale anywhere else: topological invariants, solar cell efficiency predictions, piezoelectric tensors, and exfoliation energies for 2D materials. For functional materials screening in those domains, it’s a distinct resource, not a substitute for the others.

The catalog size rules it out for bulk alloy or standard thermodynamic benchmarking. Use it to supplement, not replace.

Side-by-side comparison

DatabaseEntriesPrimary functionalBandgapPhononsElasticOPTIMADE
Materials Project~155kGGA+U / r2SCANGGA + HSE06 (subset)Yes (subset)YesYes
AFLOW~3.5MPBEPBELimitedLimitedYes
OQMD~1.4MPBE-PAWPBENoNoYes
JARVIS-DFT~41kOptB88vdW + TBmBJTBmBJYes (subset)YesYes

All four expose OPTIMADE endpoints. But OPTIMADE standardizes structure queries, not computed properties. You can filter on elements, space group, and lattice parameters across providers in one request. Formation energy, bandgap, and phonon data still require the native API for each database.

OPTIMADE in practice: useful but limited

Cross-provider OPTIMADE queries are genuinely useful for candidate screening. Run a single filter (all compounds with a specific element set and space group) and get back unified structure data from multiple databases without writing four separate API calls.

The limits show up when you need computed properties. A formation energy from AFLOW and a formation energy from Materials Project are not directly comparable. Different reference states, different functionals. OPTIMADE doesn’t account for that. A 2023 Phys. Rev. Materials study comparing AFLOW, MP, and OQMD on shared structures found meaningful disagreement in reported properties across all three. Cross-provider property comparison requires you to align those conventions yourself and document that alignment in your methods section so reviewers can assess it.

For ML training sets above ~50,000 entries, paginated OPTIMADE queries are slow and fragile. Bulk downloads (MP’s NDJSON exports, OQMD’s database dumps, JARVIS figshare releases) are more reliable at scale. Tools like Alloybase, a materials-data workspace, handle cross-provider OPTIMADE queries in a single request and persist results with full source attribution on every row: provider, OPTIMADE ID, query string, fetch timestamp. That solves the session-state and attribution problem, but doesn’t replace native APIs for deep property access.

A note on API access and bulk data

mp-api requires an API key and is rate-limited; AFLOW’s REST API is open with no auth. Both are worth knowing if you’re automating queries. For ML pipelines at meaningful scale, all four databases offer bulk downloads. Use them. Paginated API calls are the wrong tool above ~50,000 entries regardless of which database you’re querying.

Which database for which workflow

ML training sets: Use OQMD for formation energy prediction. Consistency is the point. Supplement with MP for broader property coverage per entry. Pull from AFLOW when you need alloy and intermetallic coverage at scale. For 2D materials or vdW-sensitive properties, JARVIS is the only open option at meaningful scale.

Thermodynamics and phase diagrams: MP’s phase diagram tooling (pymatgen’s PhaseDiagram, the convex hull utilities) is the most complete. OQMD’s formation energy consistency makes it the better ground truth for cross-database benchmarking.

Functional materials screening: JARVIS for topological, solar cell efficiency, and 2D materials. MP for defects, surfaces, and a wider general property space. AFLOW for systematic coverage of prototype structures across large composition spaces.

Multi-database queries: Use OPTIMADE for structure matching and candidate filtering across providers. Pull properties from whichever database has the best coverage for your target property. Document which database and which version you used. Providers update datasets, and a query run today may return different results than the same query run six months from now.

The bottom line

Default to Materials Project. Add OQMD when formation energy consistency matters more than property variety. Pull from JARVIS for vdW systems, 2D materials, or topological properties. Use AFLOW for composition space coverage that MP doesn’t reach.

Provider disagreements on computed properties aren’t bugs. They’re information about where the underlying science is still unsettled. Build pipelines that surface those disagreements rather than hide them.