The OPTIMADE ecosystem looks different in 2026 than it did two years ago. Materials Project finished its r2SCAN migration. Alexandria added millions of new structures. Several smaller providers went offline; new ones came online. Most comparison resources date from 2022–2023. The r2SCAN migration alone changes which MP values you should trust. Follow an old tutorial and you may end up filtering on the wrong thermo endpoint.
This directory covers every active OPTIMADE provider as of early 2026: entry counts, functionals, primary data types, compliance level, and the specific workflows each one handles well.
What OPTIMADE Standardizes (and What It Doesn’t)
OPTIMADE standardizes a REST API for querying crystal structures and calculated properties (filter syntax, response format, structure endpoints). Property namespaces are not standardized. Formation energy is _mp_formation_energy_per_atom in Materials Project and _aflow_energy_formation_atom in AFLOW. Cross-provider queries and aggregated results need a mapping layer to reconcile those differences.
Complete Provider Directory
Materials Project (MP)
~155K inorganic structures as of early 2026. The r2SCAN migration is complete for the majority of entries. MP now serves both GGA+U and r2SCAN results, and the default thermo endpoint returns r2SCAN. If your workflow was pinned to legacy GGA values, the formation energies you’re getting now are systematically different. The thermo_type field in the API lets you filter explicitly (most code examples online predate the migration and will return mixed values without it).
MP has the strongest developer ecosystem. The mp-api Python client handles pagination, field selection, and summary endpoints cleanly. Best fit: thermodynamic stability, phase diagrams, benchmarks where broad community adoption matters.
See also: Materials Project vs AFLOW vs OQMD vs JARVIS-DFT
AFLOW
~3.6 million entries, the largest OPTIMADE-accessible database by entry count. GGA-PBE throughout, with AFL filter syntax for direct REST queries. The AUID (AFLOW Unique Identifier) system provides persistent entry-level identifiers, but AUIDs don’t map to identifiers in other databases. Cross-referencing requires matching on structure fingerprints or reduced formulas.
Use AFLOW when you need volume: ternary and quaternary systems, broad elemental coverage, or training set size. The AFLUX endpoint handles bulk downloads efficiently once you learn the filter grammar.
OQMD
~1M+ entries, all computed with a single consistent GGA-PBE workflow. That methodological uniformity is the defining feature. Formation energies across OQMD entries are directly comparable in a way that mixed-functional databases can’t match. No functional migration history, no legacy calculation splits to filter around.
The REST API works but is dated (limited filtering, no Python client with the polish of mp-api). If you’re building a stability benchmark or need consistent thermochemistry at scale, OQMD is the right source.
JARVIS-DFT
~80K structures, but the coverage goes deeper than the entry count suggests. JARVIS includes OptB88vdW calculations (van der Waals corrected) and TBmBJ bandgaps for a significant subset. TBmBJ bandgaps are more accurate for semiconductors than standard GGA. Elastic tensors, optical spectra, and solar efficiency descriptors are available for many entries.
The jarvis-tools Python package doubles as an API client and analysis toolkit. Pre-split train/val/test sets are available for ML benchmarking (ALIGNN and similar graph neural network papers use them). If bandgap accuracy matters more than coverage, use JARVIS-DFT for that property.
Crystallography Open Database (COD)
~500K+ experimental crystal structures determined by X-ray and neutron diffraction. COD is not a DFT database. Use it to check whether DFT-predicted structures have experimental analogs, or to anchor a multi-database dataset with measured geometry. OPTIMADE compliance is strong.
NOMAD
NOMAD ingests raw calculation outputs from multiple codes (VASP, Quantum ESPRESSO, FHI-aims, others) and makes them searchable. Entry counts are in the millions, but the data is heterogeneous: different codes, different settings, different completeness levels. The value is provenance. You can retrieve input files, raw output, and full calculation context, not just derived properties.
Use NOMAD when you need to verify calculation parameters, access full calculation context, or find community-contributed calculations that didn’t make it into the curated databases.
Alexandria
Millions of structures generated through ML-driven exploration using PBEsol. Alexandria covers novel crystal structure candidates that don’t appear in the major databases, weighted toward materials not yet synthesized rather than known compounds. Strongest for identifying candidates in under-explored compositional spaces.
PBEsol vs GGA-PBE energies are not directly comparable without corrections. Factor that in if you’re combining Alexandria results with OQMD or AFLOW data.
MC3D
A curated dataset of ~10K experimentally known 3D crystal structures from the Materials Cloud platform. Smaller than the others but well-curated and fully OPTIMADE-compliant. Useful for benchmarking and cross-validation against curated experimental references.
ODBX
Cambridge-hosted datasets with full OPTIMADE compliance. Primarily useful for researchers already in the Cambridge structural ecosystem or working with specific curated collections.
Other Active Providers
The official OPTIMADE providers list at optimade.org/providers includes additional entries: 2D materials databases (C2DB), specialized collections (Carolina Materials Database), and institutional providers. Check the live provider list before assuming this directory is complete.
Cross-Provider Gotchas You’ll Hit in Practice
Schema and Property Differences
Property prefixes are provider-specific. _mp_formation_energy_per_atom, _aflow_energy_formation_atom (AFLOW), and delta_e (OQMD) all mean formation energy. They are not interchangeable fields. A filter that works against one provider won’t automatically apply to others.
Pagination defaults vary significantly across providers. Check each provider’s documentation before assuming bulk queries are returning complete results.
Methodology Mixing Traps
Mixing GGA and r2SCAN formation energies from MP introduces significant systematic offsets for transition metal oxides and correlated systems. Most code examples online predate the migration. If you’re not filtering by thermo_type, you’re likely pulling mixed data.
Combining OQMD formation energies with AFLOW’s without energy corrections introduces noise from different pseudopotentials and k-point convergence settings. Cross-database energy comparisons need explicit methodology matching or correction schemes.
When querying across providers in a single workflow, Alloybase maintains provider-specific adapters and flags entries where the same compound appears in multiple databases with diverging property values. Those disagreements are real information about where computational settings matter most.
Decision Guide: Match Your Workflow to the Right Provider
By research task:
Thermodynamic stability screening → MP (r2SCAN phase diagrams) or OQMD (consistent GGA-PBE)
Large-scale ML training sets → AFLOW (volume) or OQMD (methodological consistency)
Accurate semiconductor bandgaps → JARVIS-DFT (TBmBJ)
ML benchmarks with pre-built train/val/test splits → JARVIS-DFT
Experimental structure validation → COD
Novel structure candidates → Alexandria
Full calculation provenance → NOMAD
Best developer experience and Python ecosystem → MP
By data need:
Most entries → AFLOW
Most internally consistent energies → OQMD
Properties beyond formation energies and bandgaps → JARVIS-DFT
Experimental ground truth → COD
Reproduce a published calculation → NOMAD
If you’re querying across multiple providers, Alloybase runs cross-database OPTIMADE queries with a normalized schema and per-row source attribution. Provider name, OPTIMADE ID, query, and fetch timestamp are recorded for every entry, which covers the methods section requirement without manual reconciliation.
The Provider You Query Today May Not Return the Same Data Tomorrow
MP’s r2SCAN migration is the clearest example. During 2024–2025, the default thermo endpoint shifted to returning r2SCAN values instead of GGA+U. Some researchers built workflows against the old defaults without pinning a query date. When the endpoint changed, their cached data was already stale. Some didn’t catch the change until peer review.
The lesson isn’t that MP did something wrong. Database updates are how science improves. But the data layer moves, and your workflow needs to account for that. Record the provider, the version, the query date, and the filters every time. Or use a tool that does it automatically. Alloybase stores immutable versioned snapshots of every result, with provider, query, and timestamp recorded for each entry.