YAML configuration

AmorphGen accepts a YAML config file via --config <file> (CLI) or cfg_override=load_yaml_config(...) (Python). YAML is recommended for any non-trivial workflow because it:

  • Keeps simulation parameters in version control alongside the code that produced them

  • Makes runs reproducible — one file describes the entire protocol

  • Reads cleanly compared to a long CLI flag chain

  • Supports comments to document why each parameter is set

Configuration precedence

CLI flags  >  YAML config  >  DEFAULT_CONFIG

Anything passed on the CLI overrides the YAML; YAML overrides defaults. So you can keep a baseline YAML and tweak one parameter at the command line:

amorphgen POSCAR --config full_pipeline.yaml --device cuda --eq-high-T 4000

Structure

A YAML config is a nested dictionary. Top-level keys map to calculator settings and stage names; stage values are themselves dictionaries:

model: mace-mpa-0
device: cuda
default_dtype: float64

opt:
  fmax: 0.05
  optimizer: LBFGS
  cell_filter: FrechetCellFilter

eq_premelt:
  ensemble: NVT
  T: 300
  steps: 5000
  timestep: 0.5

melt:
  ensemble: NPT
  T_start: 300
  T_end: 3000
  rate: 100        # K/ps

Only the keys you want to override need to be present — anything you omit falls back to the default.

Example: full melt-quench pipeline

# examples/full_pipeline.yaml
model: mace-mpa-0
device: cuda
default_dtype: float64

opt:
  fmax: 0.05
  max_steps: 500
  optimizer: LBFGS
  cell_filter: none

eq_premelt:
  ensemble: NVT
  T: 300
  steps: 10000        # 10 ps at 1 fs
  timestep: 0.5

melt:
  ensemble: NPT
  T_start: 300
  T_end: 3000
  T_step: 100
  rate: 100           # K/ps
  timestep: 0.5

eq_high:
  ensemble: NVT
  T: 3000
  steps: 50000        # 50 ps
  timestep: 0.5

quench:
  ensemble: NVT
  T_start: 3000
  T_end: 300
  rate: 100
  timestep: 0.5

eq_low:
  ensemble: NVT
  T: 300
  steps: 10000        # 10 ps
  timestep: 0.5

Run with:

amorphgen POSCAR --config full_pipeline.yaml -o my_run/

Example: hybrid (random + quench)

Skip stages 1–3 (already disordered starting structure), anneal at high T, quench:

# examples/hybrid_airss_mq.yaml
model: chgnet
device: cuda
default_dtype: float64

eq_high:
  ensemble: NVT
  T: 3000
  steps: 20000        # 20 ps anneal
  timestep: 0.5
  friction: 0.01

quench:
  ensemble: NVT
  T_start: 3000
  T_end: 300
  rate: 100
  timestep: 0.5
  friction: 0.01

eq_low:
  ensemble: NVT
  T: 300
  steps: 5000
  timestep: 0.5
  friction: 0.01

opt:
  fmax: 0.05
  optimizer: LBFGS
  cell_filter: cubic

Run via batch-quench:

amorphgen --batch-quench --snapshot-dir random_inputs/ \
    --config hybrid_airss_mq.yaml --batch-stages 4 5 6 7 \
    -o hybrid_runs/

Example: validation reference YAML

For --analyse --reference, write a structured reference of expected literature ranges. This adds a match/concern/fail validation table to the analysis output.

# examples/reference_a_Ga2O3.yaml
system: a-Ga2O3

references:
  - "Kaewmeechai, Strand & Shluger, Phys. Rev. B 111, 035203 (2025)"
  - "Stehlik et al., J. Non-Cryst. Solids 458 (2017) 14"

density:
  expected: [4.70, 5.10]      # g/cm^3
  units: "g/cm^3"

bond_distances:
  Ga-O:
    expected: [1.85, 1.95]    # Angstrom
    units: "A"

coordination:
  Ga-O:
    mean_expected: [4.0, 4.8]
  O-Ga:
    mean_expected: [2.7, 3.0]

bond_angles:
  Ga-O-Ga:
    expected: [110.0, 130.0]
    units: "deg"
  O-Ga-O:
    expected: [100.0, 115.0]
    units: "deg"

Run with:

amorphgen --analyse --input-dir my_structures/ \
    --cutoff auto-rdf \
    --reference reference_a_Ga2O3.yaml

Each metric is reported as match (within range), concern (within ~5% of either bound), or fail (outside range). Fast, defensible answer to “do my structures agree with the literature?”

Example: classical potential

Buckingham + Coulomb for SiO₂:

model: buckingham
device: cpu

classical_params:
  params:
    Si-O: {A: 18003.76, rho: 0.2052, C: 133.54}
    O-O:  {A: 1388.77,  rho: 0.3623, C: 175.0}
  charges: {Si: 2.4, O: -1.2}
  cutoff: 10.0
  alpha: 0.2          # Wolf-summation damping (1/A)
  coulomb: true

opt:
  fmax: 0.05
  optimizer: FIRE
  cell_filter: none

Loading YAML in Python

from amorphgen.configs import load_yaml_config
from amorphgen import MeltQuenchPipeline

cfg = load_yaml_config("full_pipeline.yaml")
pipe = MeltQuenchPipeline("POSCAR", cfg_override=cfg)
pipe.run()

load_yaml_config() validates the YAML and emits warnings for unknown keys.

Stage-1 vs stage-7 optimisation: the final_opt fallback

Both Stage 1 (initial crystal opt) and Stage 7 (final amorphous opt) use the structure-optimiser code. By default, both read from the same opt: block. If you want them to differ (e.g. a tighter fmax for the final amorphous structure, or FrechetCellFilter for full cell relaxation while Stage 1 keeps the cell fixed), add a separate final_opt: block:

# Stage 1 — initial crystal opt
opt:
  fmax: 0.05
  max_steps: 200
  optimizer: LBFGS
  cell_filter: none           # fixed cell for the crystal

# Stage 7 — final amorphous opt (overrides only the keys you specify)
final_opt:
  fmax: 0.01                  # tighter convergence
  max_steps: 500
  optimizer: LBFGS
  cell_filter: FrechetCellFilter   # full cell relax for accurate density

When final_opt: is absent, Stage 7 silently falls back to opt:. This is fine for many workflows but worth knowing if you’re producing publication-quality structures.

Tips

  • Keep YAMLs in version control. They’re tiny and document your protocol.

  • Mix YAML + CLI for parameter sweeps: a baseline YAML, with the swept variable on the CLI:

    for rate in 50 100 200; do
        amorphgen POSCAR --config baseline.yaml --quench-rate $rate -o run_${rate}Kps/
    done
    
  • Comment liberally# ... after any value explains why you chose it. Reviewers and future you will thank you.

  • Pre-built examples ship in examples/: full_pipeline.yaml, hybrid_airss_mq.yaml, fast_test.yaml, reference_a_Ga2O3.yaml.