The Problem With How We Cite Materials Data

Most methods sections that reference Materials Project cite Jain et al. 2013 and stop. No database version. No property-specific methodology paper. No access date. A reviewer reading that citation cannot determine which release of the database was queried, which DFT methodology produced the values, or whether the data would still be retrievable today.

This is not a discipline problem. Each provider publishes different citation requirements on different pages, in different formats. AFLOW’s citation guidance lives on its documentation page. Materials Project’s lives on a legacy URL. OQMD and JARVIS-DFT list theirs elsewhere. No standard exists, and journal guidelines have not caught up.

Alloybase records source attribution on every row you add to a dataset: provider name, OPTIMADE ID, query text, fetch timestamp, and source URL. When you export, the citation metadata comes with it. Free at alloybase.app.

What Does a Complete Materials Data Citation Include?

Six elements. Most published methods sections include the first one only.

  • Foundational database paper. The paper that describes the database itself (e.g., Jain et al. 2013 for Materials Project).

  • Property-specific methodology paper(s). The paper describing how that specific property was computed. Elastic constants, formation enthalpies, and piezoelectric tensors each have separate methodology papers.

  • Database version or release identifier. Materials Project, AFLOW, and OQMD all update their computed values over time. Without a version, your citation points at a moving target.

  • Access date. When you actually retrieved the data.

  • Analysis software and version. If you used pymatgen, ASE, or another tool to retrieve or post-process the data, cite it separately.

  • Access method. Direct API, OPTIMADE federation, or a client tool. This matters for reproducibility: the same query can return different results depending on the API version and endpoint.

Elements 2 through 4 are where reproducibility breaks down. A citation missing the database version and property methodology is not reproducible, regardless of how precisely the rest of the methods section is written.

What Does Materials Project Require You to Cite?

Materials Project publishes explicit citation guidance at legacy.materialsproject.org/citing. The requirements go deeper than most researchers expect.

Primary citation (required for all uses): Jain, A. et al., “The Materials Project: A materials genome approach to accelerating materials innovation.” APL Materials, 2013, 1(1), 011002. doi:10.1063/1.4812323

Property-specific citations (required in addition to the primary):

Materials Project’s citation page lists over a dozen additional property-specific papers. The ones above are the most commonly needed.

Database version: MP explicitly recommends citing version strings like “Database version 2020_09_08” and the pymatgen version used. License: All content is under Creative Commons Attribution 4.0.

A complete Materials Project citation for elastic constant data needs at minimum three papers (Jain 2013, de Jong 2015, Ong 2013 if using pymatgen) plus a version string and access date.

What Does AFLOW Require?

AFLOW’s documentation page (aflow.org/documentation) specifies that users downloading via the REST API must cite: Taylor, R. H. et al., “A RESTful API for exchanging materials data in the AFLOWLIB.org consortium.” Comput. Mater. Sci. 93, 178–192 (2014). doi:10.1016/j.commatsci.2014.05.014

AFLOW identifies entries using Aflowlib Unique Identifiers (AUIDs). Include the AUID in your methods text when referencing specific entries. Additional foundational papers for AFLOW and AFLOWLIB are listed on aflow.org. Check the current list before submitting; the required citations have expanded as the project has grown.

At minimum, cite Taylor et al. 2014 for any data downloaded via the AFLOW API, plus the AUID for each specific entry you reference.

What About OQMD and JARVIS-DFT?

Both databases publish citation requirements on their respective sites: oqmd.org for OQMD (Wolverton group, Northwestern) and jarvis.nist.gov for JARVIS-DFT (Choudhary, NIST).

The required papers change as new methodology releases supersede earlier ones. A paper written in 2024 citing a 2019 version of the OQMD citation list may reference a superseded methodology. Always check the current citation page before submitting, not the one you bookmarked when you first downloaded the data.

Check citation pages at submission time, not at data-download time. Requirements evolve.

What Do You Cite When You Query Through OPTIMADE?

OPTIMADE is a standardized REST API for querying computational materials databases. It federates queries across providers but does not own the data. This distinction has specific citation implications.

When you access data through OPTIMADE, cite three things:

  • The OPTIMADE specification paper: Andersen, C. W. et al., “OPTIMADE, an API for exchanging materials data.” Sci. Data 8, 217 (2021). doi:10.1038/s41597-021-00974-z. For recent developments and applications, see also Evans et al., Digital Discovery 3, 1509 (2024). doi:10.1039/D4DD00039K.

  • Each source database whose data you used. OPTIMADE responses include provider metadata for each entry. If your query returned results from Materials Project and AFLOW, cite both foundational papers.

  • Access date and API version. The current specification is v1.2, with 22 registered providers serving 25 databases as of early 2026.

The common mistake: citing OPTIMADE alone without the source databases. OPTIMADE is the access layer. The data authority is the provider. OPTIMADE does not replace source-database citations. It adds a citation requirement on top of them.

Five Citation Gaps the Current Infrastructure Leaves Open

1. No standard citation format across databases. Materials Project uses one format, AFLOW another. No equivalent of BibTeX-for-databases exists. Every researcher assembles citations manually from scattered documentation pages.

2. No per-entry DOIs for computed data. Unlike experimental datasets deposited on Zenodo or Figshare, individual computed entries in Materials Project, AFLOW, OQMD, and JARVIS-DFT do not have DOIs. You cite the database paper and version, not a resolvable identifier for mp-5229 specifically.

3. Property-specific citations are buried. Materials Project lists them on a citation page. That page does not appear in most tutorials or quickstart guides. The foundational paper shows up everywhere. The property papers do not.

4. Database versions are not prominently surfaced. Most API responses do not include a version identifier in the payload. You have to know to look for it, and where.

5. Journal guidelines have not caught up. The Joint Declaration of Data Citation Principles (2014, endorsed by 125 publishers and institutions) established that data citations belong in the Reference Section with machine-actionable metadata. Most materials science journals still treat database citations as informal in-text references.

Until databases surface citation metadata in their API responses (not just on documentation pages), the burden of assembling a complete citation falls on the researcher.

FAQ

Do any materials databases issue DOIs for individual computed entries?

Not for individual computed entries in the major databases (Materials Project, AFLOW, OQMD, JARVIS-DFT). NOMAD and Materials Cloud mint DOIs for uploaded datasets, but those are user-deposited collections, not per-entry identifiers. The standard practice is citing the database paper plus a version identifier.

Do I need to cite pymatgen separately from Materials Project?

Yes. Pymatgen is analysis software maintained by the same group but published as a separate tool: Ong et al., Comput. Mater. Sci. 68, 314 (2013). doi:10.1016/j.commatsci.2012.10.028. If you used pymatgen to retrieve or post-process Materials Project data, it gets its own reference entry.

How do I cite a specific material entry like mp-5229?

Include the identifier in your methods text: “SrTiO3 (mp-5229), retrieved from Materials Project database version 2023.11.1 on 2026-03-15.” No DOI exists for mp-5229 itself. The identifier plus version plus access date is the current best practice for traceability.

What if my OPTIMADE query returned results from five different providers?

Cite all five source databases. Each provider’s foundational paper belongs in your reference list. The OPTIMADE specification paper (Andersen et al. 2021) goes in as well, since it describes your access method. Five providers means six citations minimum.