Utilities

User-facing helper functions used across the pipeline.

Common helpers

MD-dynamics builder, trajectory I/O, config merging, density helper (amorphgen.utils.common).

amorphgen.utils.common

Shared helpers used across all pipeline stages: cell manipulation, MD dynamics builder, temperature ramps, logging, trajectory I/O, config merging, and snapshot extraction.

Calculator-related functions are in amorphgen.utils.calculators.

amorphgen.utils.common.compute_density_gcm3(atoms)[source]

Compute density of an Atoms object in g/cm3.

Return type:

float

amorphgen.utils.common.make_cubic(atoms)[source]

Reshape the cell to a cube of equal volume, rescaling atom positions.

amorphgen.utils.common.build_md_dynamics(atoms, ensemble='NVT', T=300.0, timestep=1.0, friction=0.01, ttime=25.0, pfactor=None, external_stress=0.0, **kwargs)[source]

Create an NVT or NPT ASE dynamics object.

Parameters:
  • atoms (ase.Atoms) – Must already have a calculator attached.

  • ensemble (str) – "NVT" or "NPT".

  • T (float) – Temperature in Kelvin.

  • timestep (float) – Time step in fs.

  • friction (float) – Langevin friction coefficient (for NVT).

  • ttime (float) – Thermostat time constant in fs (for NPT Nose-Hoover).

  • pfactor (float, optional) – Barostat coupling (for NPT). If None, auto-calculated.

  • external_stress (float) – External pressure in GPa (for NPT).

  • **kwargs – Extra arguments forwarded to the ASE dynamics class.

Return type:

ASE dynamics object

amorphgen.utils.common.resolve_ramp(T_start, T_end, T_step)[source]

Generate a list of temperatures for a ramp.

Works for both heating (T_step > 0) and cooling (T_step < 0). Always includes T_end.

Parameters:
Return type:

list[float]

class amorphgen.utils.common.MDLogger(logfile, mode='w')[source]

Bases: object

Per-step MD logger that writes to both a file and stdout.

Logs step number, time (ps), temperature (K), potential energy (eV), kinetic energy (eV), total energy (eV), and volume (ų).

Parameters:
log(dyn, atoms)[source]
close()[source]
class amorphgen.utils.common.TrajectoryWriter(filename, fmt='extxyz')[source]

Bases: object

Unified trajectory output supporting multiple formats.

Wraps ASE’s write() for extxyz/xyz/lammps-dump and ASE’s Trajectory for .traj binary format.

Parameters:
write(atoms=None)[source]
close()[source]
amorphgen.utils.common.attach_outputs(dyn, atoms, logfile, trajfile, fmt='extxyz', interval=100)[source]

Attach an MDLogger and TrajectoryWriter to dyn.

Returns (logger, traj_writer) so they can be closed later.

Parameters:
amorphgen.utils.common.merge_config(defaults, overrides)[source]

Deep-merge overrides into a copy of defaults.

Parameters:
  • defaults (dict)

  • overrides (dict | None)

Return type:

dict

amorphgen.utils.common.extract_snapshots(traj_file, n_snapshots=20, select='uniform', output_dir='snapshots', burn_in_frames=0, output_format='xyz')[source]

Extract snapshot frames from a trajectory file.

Parameters:
  • traj_file (str) – Path to the trajectory file.

  • n_snapshots (int) – Number of snapshots to extract.

  • select (str) – Selection strategy: "uniform" (evenly spaced) or "last" (final n frames).

  • output_dir (str) – Directory for output files.

  • burn_in_frames (int, default 0) – Number of leading frames to discard before sampling. Useful for skipping the non-equilibrated portion of an MD trajectory (e.g. the first ~50 ps of a 100 ps high-T equilibration). The reported frame indices in the output filenames remain absolute (relative to the original trajectory) so the provenance of each snapshot is preserved.

  • output_format (str, default "xyz") – File format for written snapshots. One of xyz / extxyz (extxyz with full metadata), vasp (POSCAR, sorted by species), or cif. All formats are ASE-readable round-trip.

Returns:

Paths to extracted snapshot files.

Return type:

list of str

Radii, classification and auto-derivation

Shannon ionic / Cordero covalent / Goldschmidt metallic radii, bonding-type classification, automatic minsep + density + target-CN derivation, and charge-balance oxidation-state inference (amorphgen.utils.radii). These helpers are exercised end-to-end in Tutorial 2 (“Zero-config random structure generation”).

amorphgen.utils.radii

Atomic radii data, bonding classification, minsep calculation, and density estimation for random structure generation.

This module contains: - Shannon ionic radii (CN=4 and CN=6) - Metallic radii (Goldschmidt CN=12) - Pauling electronegativities - Element classification (nonmetal, metalloid) - Elemental solid densities - Bond type classification - Minsep calculation from radii - Cell volume / density estimation

amorphgen.utils.radii.infer_oxidation_state(sym, composition)[source]

Infer the oxidation state of sym in composition by charge balance.

Sums anion charge from ANION_CHARGES and assigns the remaining positive charge equally across cations of the queried element when only one cation type is present, or for any element that appears as the only non-anion type. Returns None when the inference is ambiguous (multiple cation species, non-integer oxidation state, or no anions at all), letting the caller fall back to a default Shannon entry.

Parameters:
Return type:

int | None

amorphgen.utils.radii.auto_target_cn(composition)[source]

Auto-detect target coordination numbers and tolerance from composition.

Returns (target_cn, cn_tolerance) tuple.

Rules: - Metalloids (Si, Ge, B): CN=4, tolerance=0 (strict tetrahedral) - Pure Si/Ge: CN=4, tolerance=0 - Metals in nitrides: CN=4, tolerance=0 (wurtzite tetrahedral) - Metals in halides/sulfides: CN=6, tolerance=0 (strict octahedral) - Metals in oxides: CN=5, tolerance=1 (flexible 4-6 range)

Parameters:

composition (dict) – e.g. {“In”: 32, “O”: 48}

Returns:

(target_cn, cn_tolerance) – target_cn: dict or None cn_tolerance: int (0=strict, 1=flexible)

Return type:

tuple

amorphgen.utils.radii.classify_bond(sym_a, sym_b)[source]

Classify a pair of elements by bonding type.

Uses an element-type classification rule (nonmetal / metalloid / metal membership) with a Pauling-electronegativity refinement: pairs that the type rules would call “ionic” but whose Pauling electronegativity difference is small (Δχ < 1.0) are reclassified as “covalent”. This catches III–V semiconductors (GaAs Δχ=0.37, AlAs ~0.6) and related compounds where the chemistry is predominantly covalent and Shannon ionic radii give unrealistically small inter-atomic distances.

Returns one of: “ionic”, “covalent”, “metallic”.

Parameters:
Return type:

str

amorphgen.utils.radii.get_ionic_radius(sym, cn=None, oxidation_state=None)[source]

Get Shannon ionic radius for an element.

Parameters:
  • sym (str) – Element symbol.

  • cn (int, optional) – Coordination number. If provided, use CN-specific radius. Defaults to CN=6.

  • oxidation_state (int, optional) – Specific oxidation state to look up (e.g.5 for Sb in Sb_2O_5). If supplied and present in the Shannon table the matching radius is returned. If supplied but absent, falls through to the default selection. When None (default), anions return their most-negative state and cations return their highest positive state - usually the right choice for binary oxides / halides / nitrides where the higher state dominates (TiO_2, V_2O_5, WO_3 etc.).

Return type:

float or None

amorphgen.utils.radii.get_metallic_radius(sym)[source]

Get metallic radius for an element. Returns None if unavailable.

Parameters:

sym (str)

Return type:

float | None

amorphgen.utils.radii.get_effective_radius(sym)[source]

Get the most appropriate radius for volume estimation.

  • Metals -> metallic radius

  • Anion-formers (O, S, Cl, …) -> Shannon ionic radius

  • Metalloids (Si, Ge, …) -> metallic radius if available

  • Fallback -> covalent radius

Parameters:

sym (str)

Return type:

float

amorphgen.utils.radii.default_minsep(symbols, scale=0.85, target_cn=None)[source]

Build a minsep dict for all element pairs.

Uses bonding-type-aware radii. When target_cn is provided, uses CN-specific Shannon radii (e.g. CN=4 for Si in SiO2).

Parameters:
  • symbols (list of str)

  • scale (float) – Fallback scale factor (default 0.85).

  • target_cn (dict, optional) – e.g. {“Si”: 4, “O”: 2}. Uses CN-specific radii when available.

Returns:

Minimum separations keyed as “A-B” with A <= B alphabetically.

Return type:

dict

amorphgen.utils.radii.estimate_density(composition, amorphous_factor=0.8)[source]

Estimate density from elemental solid densities.

Returns density in g/cm3, or None if any element lacks density data.

Parameters:
  • composition (dict)

  • amorphous_factor (float)

Return type:

float | None

amorphgen.utils.radii.estimate_cell_length(composition, target_density=None, packing_factor=None, density_scale=1.0)[source]

Estimate cubic cell length from composition via sphere packing.

A single class-aware sphere-packing model handles all amorphous compositions:

\[V_{\text{cell}} = \frac{\sum_i N_i \cdot \tfrac{4}{3}\pi r_i^3} {f_{\text{pack}} \cdot s},\]

where \(r_i\) is the radius for element \(i\) chosen according to the compound’s bonding regime (Shannon ionic for ionic compounds, Cordero covalent for covalent compounds, Goldschmidt metallic for alloys; see _radius_for_density()). \(f_{\text{pack}}\) is the class-aware packing fraction stored in PACKING_FACTORS, and \(s\) is the user-supplied density_scale (default 1.0).

If target_density is supplied, the cell is sized directly from it and density_scale / packing_factor are ignored.

Parameters:
  • composition (dict)

  • target_density (float | None)

  • packing_factor (float | None)

  • density_scale (float)

Return type:

float

amorphgen.utils.radii.format_auto_derive_summary(composition, target_cn, minsep, est_density, cell_length)[source]

Single-line summary of the auto-derivation chain for the random_gen log.

Captures the chemistry-informed decisions AmorphGen makes from a bare composition: material class, oxidation states (where inferable), per-cation target coordination, per-pair bond classification + Pauling Δχ + minsep, and the final estimated density / cell length. Designed to fit on one log line for typical 2–4 element systems (e.g. Ga₂O₃ ≈ 200 chars).

Format:
[auto-derive] <formula> → <class>, OS{…}, CN{…},

minsep{pair:val class [Δχ=val] | …}, ρ=<val> g/cm³, L=<val> Å

For ionic pairs (where the Pauling Δχ < 1.0 refinement fires) the Δχ value is shown so the user can see exactly why the pair was classified as ionic vs covalent. For metallic and anion-packing pairs the Δχ is omitted (the type rule alone decides those).

Parameters:
Return type:

str

Format conversion

Format-conversion helper used by amorphgen --convert and the convert-mode YAML config (amorphgen.utils.convert).

amorphgen.utils.convert

Public file-format conversion utility.

Convert one structure file or every ASE-readable file in a directory to a target format (xyz / vasp / cif). VASP outputs are sorted by species so the resulting POSCAR is clean.

Usable from CLI (amorphgen --convert PATH --format vasp), Python API (from amorphgen import convert), or via a convert: block in a YAML config file (amorphgen --config convert.yaml).

amorphgen.utils.convert.convert(input_path, output_format='vasp', output_dir=None, sort=True, verbose=True)[source]

Convert a structure file or directory of files to output_format.

Parameters:
  • input_path (str) – Path to a single ASE-readable structure file, or to a directory containing one or more such files. Directory globs match *.xyz, *.extxyz, *.vasp, *.cif and POSCAR*.

  • output_format (str) – Target format key. Allowed values: "xyz" / "extxyz" (both write ASE extended XYZ to .xyz), "vasp" (POSCAR to .vasp), "cif".

  • output_dir (str, optional) – Directory to write converted files into. If None, defaults to "<input>_<format>" for directory inputs, or the parent directory of input_path for single-file inputs.

  • sort (bool, default True) – For VASP outputs, sort atoms by species so the POSCAR is clean. Ignored for other formats.

  • verbose (bool, default True) – Print one progress line per file plus a summary footer.

Returns:

Paths to the converted output files, in input order.

Return type:

list of str

Examples

>>> from amorphgen import convert
>>> convert("snapshots/", output_format="vasp",
...         output_dir="snapshots_vasp/")
['snapshots_vasp/snapshot_0000_frame00000.vasp', ...]

MD equilibration convergence analysis

Functions for assessing whether an MD equilibration stage has converged (amorphgen.utils.equilibration). Useful as a post-pipeline quality check: did the high-T equilibration in your melt-quench actually reach steady-state, or did you quench from a still-thermalising liquid?

The high-level entry point is convergence_report, which generates energy-vs-time, block-averaging, MSD, temperature, and time-window-RDF plots in one call. Lower-level functions are exposed for users who want to assemble custom diagnostics.

amorphgen.utils.equilibration

Equilibration convergence analysis for AmorphGen MD trajectories.

Provides functions to assess whether an MD equilibration run has converged:
  1. Energy vs time (running average + drift detection)

  2. Block averaging (quantitative equilibration test)

  3. RDF in time windows (structural convergence)

  4. Mean Square Displacement (diffusion / liquid vs glass)

  5. Coordination number vs time

  6. Temperature vs time

Usage

from amorphgen.utils.equilibration import convergence_report

# Quick all-in-one convergence report report = convergence_report(

“stage4_eq.xyz”, timestep_fs=1.0, T_target=3000,

)

# Or individual analyses fig_energy, drift = plot_energy_convergence(traj, timestep_fs=1.0) is_eq, block_data = block_average_test(traj, n_blocks=4) fig_msd, D_dict = plot_msd(traj, timestep_fs=1.0)

Notes

  • Two input modes: trajectory file (.xyz/.traj) or log file (.log). Log files are faster (no atomic positions to load) but only support energy, temperature, and block-average analyses. Trajectory files are needed for MSD, RDF, and CN analyses.

  • Default timestep is 1.0 fs, matching AmorphGen’s default_config.py.

amorphgen.utils.equilibration.parse_md_log(logfile)[source]

Parse an AmorphGen MD stage log file.

Expects whitespace-separated columns:

Step Time(ps) T(K) Epot(eV) Ekin(eV) Etot(eV) Vol(A^3)

Lines starting with ‘Step’, ‘-’, or ‘->’ are skipped as headers/markers.

Returns:

  • dict with keys (‘step’, ‘time_ps’, ‘T_K’, ‘Epot_eV’, ‘Ekin_eV’,)

  • ’Etot_eV’, ‘Vol_A3’ — each a numpy array.

Parameters:

logfile (str)

Return type:

dict

amorphgen.utils.equilibration.running_average(data, window)[source]

Compute running average with given window size.

Parameters:
Return type:

numpy.ndarray

amorphgen.utils.equilibration.extract_energies(source, n_atoms=None)[source]

Extract potential energies (eV) and per-atom energies.

Parameters:
  • source (str or list) – Trajectory file, log file (.log), or list of Atoms.

  • n_atoms (int, optional) – Required if source is a .log file.

Returns:

  • energies (np.ndarray — total potential energy (eV))

  • energies_per_atom (np.ndarray — energy per atom (eV/atom))

Return type:

tuple[numpy.ndarray, numpy.ndarray]

amorphgen.utils.equilibration.plot_energy_convergence(source, timestep_fs=1.0, window_ps=0.5, per_atom=True, n_atoms=None, ax=None)[source]

Plot potential energy vs time with running average and linear drift.

Parameters:
  • source (str or list) – Trajectory file, log file (.log), or list of Atoms.

  • timestep_fs (float) – MD timestep in femtoseconds (default 1.0, matching AmorphGen).

  • window_ps (float) – Running average window in picoseconds.

  • n_atoms (int, optional) – Required if source is a .log file.

  • per_atom (bool)

Returns:

  • fig (matplotlib Figure)

  • drift_eV_per_ps (float) – Linear drift in energy. Should be ~0 if equilibrated.

amorphgen.utils.equilibration.block_average_test(source, n_blocks=4, discard_fraction=0.1, n_atoms=None)[source]

Split trajectory into blocks and compare mean energies.

A system is considered equilibrated if the block means agree within 2x the standard error of the mean (SEM).

Parameters:
  • source (str or list) – Trajectory file, log file (.log), or list of Atoms.

  • n_blocks (int) – Number of equal blocks to split the production phase into.

  • discard_fraction (float) – Fraction of initial trajectory to discard (thermalisation).

  • n_atoms (int, optional) – Required if source is a .log file.

Returns:

  • is_equilibrated (bool)

  • block_data (dict) – Keys: block_means, overall_mean, overall_std, sem, max_deviation, threshold, is_equilibrated.

Return type:

tuple[bool, dict]

amorphgen.utils.equilibration.plot_block_averages(source, n_blocks=4, discard_fraction=0.1, timestep_fs=1.0, n_atoms=None, ax=None)[source]

Visualise block averaging: block means vs overall mean +/- 2*SEM.

Parameters:
  • n_blocks (int)

  • discard_fraction (float)

  • timestep_fs (float)

  • n_atoms (int | None)

amorphgen.utils.equilibration.compute_msd(traj, timestep_fs=1.0, by_element=True)[source]

Compute MSD from trajectory using unwrapped positions.

Handles both orthorhombic and non-orthorhombic cells via fractional coordinate unwrapping.

Parameters:
  • traj (str, list of Atoms, or Trajectory)

  • timestep_fs (float)

  • by_element (bool) – If True, return MSD per element type in addition to total.

Returns:

  • time_ps (np.ndarray)

  • msd_dict (dict — keys are element symbols (+ “all”),) – values are MSD arrays (A^2).

Return type:

tuple[numpy.ndarray, dict]

amorphgen.utils.equilibration.plot_msd(traj, timestep_fs=1.0, ax=None)[source]

Plot MSD vs time per element. Fits diffusion coefficient D.

D is fitted from the linear regime (last 50% of trajectory) using MSD = 6*D*t (3D diffusion).

Returns:

  • fig (matplotlib Figure)

  • D_dict (dict — diffusion coefficients (cm^2/s) per element.) – D > 1e-6 cm^2/s indicates liquid/diffusive behaviour. D < 1e-6 cm^2/s indicates frozen/glass behaviour.

Parameters:

timestep_fs (float)

amorphgen.utils.equilibration.plot_temperature(source, timestep_fs=1.0, T_target=None, ax=None)[source]

Plot instantaneous temperature vs time.

Parameters:
  • source (str or list) – Log file (.log) or trajectory / list of Atoms.

  • T_target (float, optional) – Target temperature (K). If given, shows expected fluctuation band.

  • timestep_fs (float)

amorphgen.utils.equilibration.plot_rdf_time_windows(traj, pairs=None, n_windows=4, rmax=4.0, nbins=100, timestep_fs=1.0)[source]

Overlay partial RDFs from different time windows.

If the RDFs from all windows overlap, the structure is equilibrated.

Parameters:
amorphgen.utils.equilibration.compute_cn_vs_time(traj, centre, neighbour, cutoff=None, window_size=50, timestep_fs=1.0)[source]

Compute average coordination number vs time using a sliding window.

Parameters:
Return type:

tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

amorphgen.utils.equilibration.plot_cn_vs_time(traj, pairs, cutoffs=None, window_size=50, timestep_fs=1.0, ax=None)[source]

Plot CN vs time for multiple centre-neighbour pairs.

Parameters:
  • pairs (list of (centre, neighbour, expected_cn)) – e.g. [(“Si”, “O”, 4.0), (“O”, “Si”, 2.0)]

  • cutoffs (list of float, optional) – Bond cutoffs per pair. None = auto from covalent radii.

  • window_size (int)

  • timestep_fs (float)

amorphgen.utils.equilibration.convergence_report(source, timestep_fs=1.0, T_target=None, n_atoms=None, pairs_rdf=None, pairs_cn=None, cn_cutoffs=None, rmax=4.0, n_blocks=4, output_dir=None, prefix='convergence')[source]

Generate a comprehensive convergence report.

Parameters:
  • source (str, list of Atoms, or Trajectory) – Trajectory file (.xyz/.traj), log file (.log), or list of Atoms. Log files are faster but only provide energy/temperature (no MSD, RDF, or CN analysis).

  • timestep_fs (float) – MD timestep in femtoseconds (default 1.0).

  • T_target (float, optional) – Target temperature (K) for temperature check.

  • n_atoms (int, optional) – Required if source is a .log file.

  • pairs_rdf (list of (str, str), optional) – Element pairs for RDF windows. Auto-detected if None.

  • pairs_cn (list of (str, str, float), optional) – Element pairs + expected CN for CN tracking. e.g. [(“Si”, “O”, 4.0), (“O”, “Si”, 2.0)]

  • cn_cutoffs (list of float, optional) – Bond cutoffs for CN pairs. None = auto from covalent radii.

  • rmax (float) – Max distance for RDF (must not exceed half cell length).

  • n_blocks (int) – Number of blocks for block average test.

  • output_dir (str, optional) – If provided, save all plots as PNG and a text report here.

  • prefix (str) – Filename prefix for saved plots.

Returns:

report – Keys include: n_frames, n_atoms, total_time_ps, elements, energy_drift_eV_per_atom_per_ps, block_test_passed, block_data, diffusion_coefficients_cm2_s, summary_text.

Return type:

dict

Other utility modules

  • Calculators — multi-backend calculator factory (get_calculator).

  • amorphgen.utils.classical — Lennard-Jones and Buckingham+Coulomb calculators; loaded via get_calculator("lennard-jones" | "buckingham", ...).