How to Write a Reproducible Methods Section for Computational Materials Research

“PBE functional, PAW pseudopotentials, 520 eV cutoff.” That sentence appears in thousands of computational materials papers. It is not a methods section. It is a caption.

Lejaeghere et al. compared 71 elemental crystals across 15 DFT codes using 40 different potentials or basis set types (Science, 2016). Most modern codes converged, but only when parameters were carefully controlled. Hegde et al. later quantified what happens when they are not: comparing the same materials across AFLOW, Materials Project, and OQMD, formation energy variance reached 0.105 eV/atom (Phys. Rev. Materials, 2023). Up to 7% of materials disagreed on whether a compound was metallic or insulating. Fifteen percent disagreed on magnetic state. The three root causes: pseudopotential choice, DFT+U implementation, and elemental reference states. All three are routinely underreported in methods sections.

The problem is not that researchers withhold information. It is that most of us underestimate what “complete” means when describing a DFT calculation.

What Does “Reproducible” Actually Mean for a DFT Calculation?

Bosoni et al. tested conceptual reproducibility systematically (Nature Reviews Physics, 2024): 960 equations of state, cross-checked between two all-electron codes and used to verify nine pseudopotential-based approaches, covering elements Z=1 through 96. They used AiiDA automated workflows to eliminate human setup variation entirely. The result: when codes use well-verified, converged settings, agreement is excellent. The discrepancies come from unreported or underconverged parameters, not from fundamental differences between codes.

Exact reproducibility is the practical bar, and the one reviewers test against: take your input files, run the same code version, get the same numbers.

Coudert identified the core failure mode in Chemistry of Materials (2017): methods sections describe the physics (“we used PBE with PAW potentials”) but omit the actual input file parameters. A VASP INCAR file contains dozens of settings that affect results. Your methods section reports maybe five of them. Coudert called this the “input file gap,” and it explains why a seemingly complete methods section still leaves reproduction to guesswork.

Which Parameters Actually Change Your Results?

Not everything in your input file matters equally. Here is what does, ranked by demonstrated impact.

Pseudopotentials and DFT+U. The largest discrepancy source in the Lejaeghere comparison. Different databases apply different Hubbard U values to the same elements, and even the formulation matters (Dudarev vs. Liechtenstein). If you used DFT+U and your methods section does not specify U, J, the target orbitals, and the formulation, your formation energies are unreproducible. Full stop.

K-point density and plane-wave cutoff. Choudhary and Tavazza studied convergence behavior across 30,000+ materials in JARVIS-DFT (Computational Materials Science, 2019). Major high-throughput databases use a fixed 520 eV cutoff, primarily for screening speed. But some materials need up to 1,400 eV for converged elastic properties. K-point requirements are equally material-specific. Materials Project itself recommends 7,000 per-atom k-points for elastic constant calculations versus their standard screening settings. Report your mesh dimensions, centering scheme, and whether you tested convergence.

Exchange-correlation functional. The errors here are systematic and predictable. A 2024 study in Scientific Reports tested LDA, PBE, PBEsol, and vdW-DF-C09 across 141 binary and ternary oxides. LDA overbinds. PBE underbinds. PBEsol and vdW-DF-C09 showed the lowest overall errors. The more consequential finding: GGA can predict semiconductors as metals. That is not a quantitative error. That is a qualitative failure that changes your entire screening result.

Integration grids (molecular and cluster DFT). If you are running meta-GGA functionals (M06-2X, r2SCAN, ωB97X-V), grid choice matters more than you think. Rowan and Wagen (2024) documented free energy variations of up to 5 kcal/mol depending on molecular orientation alone, caused entirely by grid underconvergence. Many codes still default to grids far below the current community standard of (99,590). Check yours.

Elemental reference states. The third major discrepancy source from Lejaeghere, and the one most often invisible in methods sections. Different choices for elemental reference energies propagate directly into every formation energy you report. If you used a non-standard reference, state it explicitly.

Your Query Results Have an Expiration Date

There is a reproducibility problem that has nothing to do with your DFT settings: the databases you query are not static.

Materials Project has published 14+ database releases since 2021. The May 2021 release introduced the MP2020 correction scheme, which changed every formation energy in the database without adding a single new material. The stated improvement: 7% better agreement with experiment. In v2024.11.14, 21,144 tasks were discovered to have incorrect electronic structure type labels. All ytterbium compounds were temporarily deprecated in v2023.11.1 due to pseudopotential issues. The ongoing r2SCAN migration (starting v2022.10.28) has changed the default thermodynamic hierarchy, meaning the same mp-id can return a different energy value today than it did two years ago.

OPTIMADE standardizes how you query across 22 providers and 25 databases containing 22+ million structures. But the specification is completely silent on data versioning. No snapshot queries. No time-travel. Each provider manages versioning independently, if at all. A query you ran in January may return different results in June, and there is no built-in mechanism to detect the difference.

If your paper cites a formation energy from any of these databases, your methods section needs to record the database name, the version or release tag, the query date, and the specific entry IDs. Alloybase addresses this directly: every query result is stored as an immutable versioned dataset snapshot with full source attribution (provider, OPTIMADE ID, query parameters, fetch timestamp, source URL). That snapshot persists across sessions and can be cited. But whatever tool you use, the principle is the same: mutable data needs a timestamp and a version.

What Should Your Computational Details Section Actually Include?

Use this as a pre-submission checklist. If your methods section is missing items from any category, add them before you hit submit.

Software and Code

DFT code name and exact version (e.g., VASP 6.4.1, not just “VASP”)
Post-processing tools and versions (Phonopy 2.20, VASPKIT 1.4.1, pymatgen 2024.2.8)
Custom scripts: GitHub link or statement of availability

Theory Level

Exchange-correlation functional with citation
DFT+U: U and J values, target orbitals, formulation (Dudarev/Liechtenstein)
Van der Waals correction method (DFT-D3, DFT-D4, optB88-vdW, or none)
Spin polarization (on/off, initial magnetic moments if relevant)
Relativistic treatment (scalar relativistic, SOC, or none)

Pseudopotentials and Basis Sets

Type: PAW, ultrasoft, or norm-conserving
Library and version (e.g., “PBE PAW potentials distributed with VASP 5.4”)
Valence electron configurations treated
Plane-wave cutoff energy, with convergence evidence or justification

Structural Models

Source of initial structure (ICSD entry number, Materials Project ID, experimental reference)
Supercell dimensions and construction method
Surface models: slab thickness, vacuum layer, termination
Defect models: type, concentration, charge states considered

Sampling and Convergence

K-point mesh: grid dimensions, centering (Gamma vs. Monkhorst-Pack), per-atom density
Smearing method and width (Gaussian, Methfessel-Paxton order, tetrahedron with Blöchl corrections)
SCF energy convergence threshold
Ionic relaxation: force and energy convergence criteria
Evidence that cutoff and k-mesh are converged for your target property

Calculation Workflow

Geometry optimization algorithm (conjugate gradient, RMM-DIIS, damped MD)
What was relaxed: ionic positions, cell shape, cell volume, or a subset
Band structure: high-symmetry path convention (Setyawan-Curtarolo, Hinuma)
Phonon calculations: method (DFPT or finite displacement), supercell size, displacement magnitude
How calculations chain together (relaxation fed into single-point, which fed into DOS)

Data Availability

Input files deposited in a persistent repository (Zenodo, Materials Cloud, or NOMAD)
Raw data: optimized structures as CIF files, total energies
Database queries: which database, which version/release, query date, entry IDs

Methods Section, SI, or Repository?

Not every detail belongs in the main text. Use a three-tier system.

Methods section: A human-readable summary of every choice listed above, with enough specificity that a peer can assess whether your approach is valid. This is where reviewers look first.

Supplementary Information: Complete input files (INCAR, KPOINTS, POSCAR, or the equivalent for your code). Convergence test plots. Full parameter tables for high-throughput studies. This is the layer that enables exact reproduction.

Repository: Raw output data, optimized structures in standard formats (CIF), provenance metadata. NOMAD is the gold standard here. Its Metainfo schema (Draxl and Scheffler, Scientific Data, 2023) is code-agnostic and uses a recursive provenance model where inputs are metadata to outputs. If you deposit in NOMAD, most of the metadata capture is automatic. But the repository does not replace the methods section. NOMAD is archival infrastructure. Your paper is a communication to human readers.

Nature Portfolio journals now require a data availability statement as a condition of publication, and mandate that code be shared “without undue qualifications.” The npj Computational Materials editorial team published an ML reproducibility checklist in 2024 (hosted on GitHub as a living document), and explicitly stated that reproducibility standards “should not only apply to ML-based studies, but striving for openness and reproducibility should be central to all computational materials science.” Write your methods sections as if full transparency is already mandatory, because it soon will be.

FAQ

Do I need to report convergence tests, or just the converged values?

Both. Converged values go in the methods section. Convergence test plots go in the SI. Reviewers need to see that you checked, not just the final number. A stated cutoff of 520 eV means nothing without evidence that 520 eV is sufficient for your specific system and target property.

Should I include my VASP INCAR and KPOINTS files, or describe them in prose?

Both, and they serve different purposes. Prose in the methods section lets reviewers evaluate your choices quickly. Actual input files in the SI or a repository let someone reproduce your calculation without transcription errors. Prose descriptions are lossy. The INCAR file is the ground truth.

How do I cite a database query that might return different results next year?

Record four things: database name, version or release tag (e.g., Materials Project v2024.11.14), query date, and specific entry IDs. If the database does not offer versioned releases, archive the query results yourself. Alloybase creates immutable versioned snapshots of every query result with full source attribution. Whatever approach you take, the principle is the same: pin your data to a point in time.

What if I used default parameters and did not run convergence tests?

Be honest. State that you used the code’s default settings, cite the documentation version, and acknowledge this as a limitation. Then run the convergence tests before your next submission. Reviewers increasingly flag this, and they are right to.

Does any of this matter if I am depositing my data in NOMAD?

Yes. NOMAD captures computational metadata automatically from your output files, which is excellent for archival and machine-readability. But your methods section is what reviewers and readers actually read when evaluating your paper. NOMAD is the archive. The methods section is the argument. You need both.

What Does “Reproducible” Actually Mean for a DFT Calculation?#

Which Parameters Actually Change Your Results?#

Your Query Results Have an Expiration Date#

What Should Your Computational Details Section Actually Include?#

Software and Code#

Theory Level#

Pseudopotentials and Basis Sets#

Structural Models#

Sampling and Convergence#

Calculation Workflow#

Data Availability#

Methods Section, SI, or Repository?#

FAQ#

Do I need to report convergence tests, or just the converged values?#

Should I include my VASP INCAR and KPOINTS files, or describe them in prose?#

How do I cite a database query that might return different results next year?#

What if I used default parameters and did not run convergence tests?#

Does any of this matter if I am depositing my data in NOMAD?#