# GAIA **Repository Path**: pfsuo/GAIA ## Basic Information - **Project Name**: GAIA - **Description**: GAIA from github - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-08 - **Last Updated**: 2026-04-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
gaia_logo
# GAIA GAIA is a framework to generate datasets with an automated pipeline for machine learning interatomic potentials. - [GAIA](#gaia) * [Prerequisites](#prerequisites) * [Usage](#usage) + [Config file](#config-file) + [Data-generator](#data-generator) + [Data-improver](#data-improver) + [GAIA-Bench](#benchmark) * [Dataset and model checkpoint](#dataset-and-model-checkpoint) * [Citation](#citation) ## Prerequisites ### Quantum mechanics (QM) package A QM package is required to run GAIA.
Currently it is designed to use `VASP`, but will support more packages later. ### Distributed environment with shared storage GAIA has been implemented under the assumption of distributed environment.
Also, `shared storage` is required for each node to access the same directory with an identical path. ### Job scheduler GAIA is currently designed to use `SLURM` as the job scheduler,
but with minor code modifications, one can easily adapt it to other schedulers or execute it on a single node. ### Dependencies We provide `requirements.txt` that allows users to fully reproduce the environment used for the GAIA implementation.
GAIA also requires the following binaries: [CREST](https://github.com/crest-lab/crest), [nebmake.pl](https://theory.cm.utexas.edu/henkelman/code/), [Open Babel](https://github.com/openbabel/openbabel), [xTB](https://github.com/grimme-lab/xtb), [xTB-IFF](https://github.com/grimme-lab/xtbiff) ## Usage ### Config file - [user_config](https://github.samsungds.net/SAIT/GAIA/tree/main/examples/config.yaml) provides an example YAML file with user-defined settings for the data-generator, data-improver, and GAIA-Bench. - [base_config](https://github.samsungds.net/SAIT/GAIA/blob/main/src/commons/config_base.yaml) serves as a skeleton configuration. It includes default values for advanced parameters, while user-defined parameters override those in the base config. ### Data-generator #### Input preparation - Chemical components
GAIA supports both periodic (e.g., metals) and non-periodic (e.g., molecules with organic species) components.
Each should follow the format of .POSCAR and .xyz, respectively. #### Run ``` $ cd GAIA $ python main.py -a data_generator -c {user_config (.yaml)} -o {out_dir} -p {prefix} ``` - If out_dir is `/home/GAIA_out` and prefix is `first`, artifacts and the log is saved in `/home/GAIA_out/first/` ### Data-improver #### Input preparation - Trainset, validset and model checkpoint
Data improver provides recommendations based on error metrics on validset, as well as trainset itself,
which requires a valid dataset (.extxyz) and a trained model checkpoint (e.g. .pt or .pth), in addition to a train dataset.
The MLIP framework with `calculator` for the checkpoint should be also set up. #### Run ``` $ cd GAIA $ python main.py -a data_improver -c {user_config (.yaml)} -o {out_dir} -p {prefix} ``` ### GAIA-Bench #### Input preparation - GAIA-Bench datasets and model checkpoint
GAIA-Bench includes four benchmark tasks, of which the datasets are available at [GAIA-Bench](https://huggingface.co/datasets/aixsim/GAIA-Bench)
A model checkpoint to test is required; the MLIP framework with `calculator` for the checkpoint should be also set up. #### Run ``` $ cd GAIA $ python main.py -a benchmark -c {user_config (.yaml)} -o {out_dir} -p {prefix} ``` ## Dataset and model checkpoint [Titan25](https://huggingface.co/datasets/aixsim/Titan25) is an MLIP dataset constructed with GAIA, comprising 1.8M data points across 11 elements. [SNet-T25](https://huggingface.co/aixsim/SNet-T25) is an MLIP trained on this dataset. See [GAIA paper](https://arxiv.org/abs/2509.25798) for details. ## Citation If using this code, please cite our work as follows: ``` @article{gaia2025, title={Scalable Reactive Atomistic Dynamics with GAIA}, author={Song, Suhwan and Kim, Heejae and Jang, Jaehee and Cho, Hyuntae and Kim, Gunhee and Kim, Geonu}, journal={arXiv preprint arXiv:2509.25798}, year={2025} } ```