# YAML configuration AmorphGen accepts a YAML config file via `--config ` (CLI) or `cfg_override=load_yaml_config(...)` (Python). YAML is recommended for any non-trivial workflow because it: - Keeps simulation parameters in version control alongside the code that produced them - Makes runs reproducible — one file describes the entire protocol - Reads cleanly compared to a long CLI flag chain - Supports comments to document why each parameter is set ## Configuration precedence ``` CLI flags > YAML config > DEFAULT_CONFIG ``` Anything passed on the CLI overrides the YAML; YAML overrides defaults. So you can keep a baseline YAML and tweak one parameter at the command line: ```bash amorphgen POSCAR --config full_pipeline.yaml --device cuda --eq-high-T 4000 ``` ## Structure A YAML config is a nested dictionary. Top-level keys map to calculator settings and stage names; stage values are themselves dictionaries: ```yaml model: mace-mpa-0 device: cuda default_dtype: float64 opt: fmax: 0.05 optimizer: LBFGS cell_filter: FrechetCellFilter eq_premelt: ensemble: NVT T: 300 steps: 5000 timestep: 0.5 melt: ensemble: NPT T_start: 300 T_end: 3000 rate: 100 # K/ps ``` Only the keys you want to override need to be present — anything you omit falls back to the default. ## Example: full melt-quench pipeline ```yaml # examples/full_pipeline.yaml model: mace-mpa-0 device: cuda default_dtype: float64 opt: fmax: 0.05 max_steps: 500 optimizer: LBFGS cell_filter: none eq_premelt: ensemble: NVT T: 300 steps: 10000 # 10 ps at 1 fs timestep: 0.5 melt: ensemble: NPT T_start: 300 T_end: 3000 T_step: 100 rate: 100 # K/ps timestep: 0.5 eq_high: ensemble: NVT T: 3000 steps: 50000 # 50 ps timestep: 0.5 quench: ensemble: NVT T_start: 3000 T_end: 300 rate: 100 timestep: 0.5 eq_low: ensemble: NVT T: 300 steps: 10000 # 10 ps timestep: 0.5 ``` Run with: ```bash amorphgen POSCAR --config full_pipeline.yaml -o my_run/ ``` ## Example: hybrid (random + quench) Skip stages 1–3 (already disordered starting structure), anneal at high T, quench: ```yaml # examples/hybrid_airss_mq.yaml model: chgnet device: cuda default_dtype: float64 eq_high: ensemble: NVT T: 3000 steps: 20000 # 20 ps anneal timestep: 0.5 friction: 0.01 quench: ensemble: NVT T_start: 3000 T_end: 300 rate: 100 timestep: 0.5 friction: 0.01 eq_low: ensemble: NVT T: 300 steps: 5000 timestep: 0.5 friction: 0.01 opt: fmax: 0.05 optimizer: LBFGS cell_filter: cubic ``` Run via batch-quench: ```bash amorphgen --batch-quench --snapshot-dir random_inputs/ \ --config hybrid_airss_mq.yaml --batch-stages 4 5 6 7 \ -o hybrid_runs/ ``` ## Example: validation reference YAML For `--analyse --reference`, write a structured reference of expected literature ranges. This adds a match/concern/fail validation table to the analysis output. ```yaml # examples/reference_a_Ga2O3.yaml system: a-Ga2O3 references: - "Kaewmeechai, Strand & Shluger, Phys. Rev. B 111, 035203 (2025)" - "Stehlik et al., J. Non-Cryst. Solids 458 (2017) 14" density: expected: [4.70, 5.10] # g/cm^3 units: "g/cm^3" bond_distances: Ga-O: expected: [1.85, 1.95] # Angstrom units: "A" coordination: Ga-O: mean_expected: [4.0, 4.8] O-Ga: mean_expected: [2.7, 3.0] bond_angles: Ga-O-Ga: expected: [110.0, 130.0] units: "deg" O-Ga-O: expected: [100.0, 115.0] units: "deg" ``` Run with: ```bash amorphgen --analyse --input-dir my_structures/ \ --cutoff auto-rdf \ --reference reference_a_Ga2O3.yaml ``` Each metric is reported as **match** (within range), **concern** (within ~5% of either bound), or **fail** (outside range). Fast, defensible answer to "do my structures agree with the literature?" ## Example: classical potential Buckingham + Coulomb for SiO₂: ```yaml model: buckingham device: cpu classical_params: params: Si-O: {A: 18003.76, rho: 0.2052, C: 133.54} O-O: {A: 1388.77, rho: 0.3623, C: 175.0} charges: {Si: 2.4, O: -1.2} cutoff: 10.0 alpha: 0.2 # Wolf-summation damping (1/A) coulomb: true opt: fmax: 0.05 optimizer: FIRE cell_filter: none ``` ## Loading YAML in Python ```python from amorphgen.configs import load_yaml_config from amorphgen import MeltQuenchPipeline cfg = load_yaml_config("full_pipeline.yaml") pipe = MeltQuenchPipeline("POSCAR", cfg_override=cfg) pipe.run() ``` `load_yaml_config()` validates the YAML and emits warnings for unknown keys. ## Stage-1 vs stage-7 optimisation: the `final_opt` fallback Both Stage 1 (initial crystal opt) and Stage 7 (final amorphous opt) use the structure-optimiser code. By default, both read from the same `opt:` block. If you want them to differ (e.g. a tighter `fmax` for the final amorphous structure, or `FrechetCellFilter` for full cell relaxation while Stage 1 keeps the cell fixed), add a separate `final_opt:` block: ```yaml # Stage 1 — initial crystal opt opt: fmax: 0.05 max_steps: 200 optimizer: LBFGS cell_filter: none # fixed cell for the crystal # Stage 7 — final amorphous opt (overrides only the keys you specify) final_opt: fmax: 0.01 # tighter convergence max_steps: 500 optimizer: LBFGS cell_filter: FrechetCellFilter # full cell relax for accurate density ``` When `final_opt:` is absent, Stage 7 silently falls back to `opt:`. This is fine for many workflows but worth knowing if you're producing publication-quality structures. ## Tips - **Keep YAMLs in version control.** They're tiny and document your protocol. - **Mix YAML + CLI** for parameter sweeps: a baseline YAML, with the swept variable on the CLI: ```bash for rate in 50 100 200; do amorphgen POSCAR --config baseline.yaml --quench-rate $rate -o run_${rate}Kps/ done ``` - **Comment liberally** — `# ...` after any value explains *why* you chose it. Reviewers and future you will thank you. - **Pre-built examples** ship in `examples/`: `full_pipeline.yaml`, `hybrid_airss_mq.yaml`, `fast_test.yaml`, `reference_a_Ga2O3.yaml`.