# GAIA
**Repository Path**: pfsuo/GAIA
## Basic Information
- **Project Name**: GAIA
- **Description**: GAIA from github
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-08
- **Last Updated**: 2026-04-08
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# GAIA
GAIA is a framework to generate datasets with an automated pipeline for machine learning interatomic potentials.
- [GAIA](#gaia)
* [Prerequisites](#prerequisites)
* [Usage](#usage)
+ [Config file](#config-file)
+ [Data-generator](#data-generator)
+ [Data-improver](#data-improver)
+ [GAIA-Bench](#benchmark)
* [Dataset and model checkpoint](#dataset-and-model-checkpoint)
* [Citation](#citation)
## Prerequisites
### Quantum mechanics (QM) package
A QM package is required to run GAIA.
Currently it is designed to use `VASP`, but will support more packages later.
### Distributed environment with shared storage
GAIA has been implemented under the assumption of distributed environment.
Also, `shared storage` is required for each node to access the same directory with an identical path.
### Job scheduler
GAIA is currently designed to use `SLURM` as the job scheduler,
but with minor code modifications, one can easily adapt it to other schedulers or execute it on a single node.
### Dependencies
We provide `requirements.txt` that allows users to fully reproduce the environment used for the GAIA implementation.
GAIA also requires the following binaries:
[CREST](https://github.com/crest-lab/crest), [nebmake.pl](https://theory.cm.utexas.edu/henkelman/code/), [Open Babel](https://github.com/openbabel/openbabel), [xTB](https://github.com/grimme-lab/xtb), [xTB-IFF](https://github.com/grimme-lab/xtbiff)
## Usage
### Config file
- [user_config](https://github.samsungds.net/SAIT/GAIA/tree/main/examples/config.yaml)
provides an example YAML file with user-defined settings
for the data-generator, data-improver, and GAIA-Bench.
- [base_config](https://github.samsungds.net/SAIT/GAIA/blob/main/src/commons/config_base.yaml)
serves as a skeleton configuration.
It includes default values for advanced parameters,
while user-defined parameters override those in the base config.
### Data-generator
#### Input preparation
- Chemical components
GAIA supports both periodic (e.g., metals) and non-periodic (e.g., molecules with organic species) components.
Each should follow the format of .POSCAR and .xyz, respectively.
#### Run
```
$ cd GAIA
$ python main.py -a data_generator -c {user_config (.yaml)} -o {out_dir} -p {prefix}
```
- If out_dir is `/home/GAIA_out` and prefix is `first`, artifacts and the log is saved in `/home/GAIA_out/first/`
### Data-improver
#### Input preparation
- Trainset, validset and model checkpoint
Data improver provides recommendations based on error metrics on validset, as well as trainset itself,
which requires a valid dataset (.extxyz) and a trained model checkpoint (e.g. .pt or .pth), in addition to a train dataset.
The MLIP framework with `calculator` for the checkpoint should be also set up.
#### Run
```
$ cd GAIA
$ python main.py -a data_improver -c {user_config (.yaml)} -o {out_dir} -p {prefix}
```
### GAIA-Bench
#### Input preparation
- GAIA-Bench datasets and model checkpoint
GAIA-Bench includes four benchmark tasks, of which the datasets are available at [GAIA-Bench](https://huggingface.co/datasets/aixsim/GAIA-Bench)
A model checkpoint to test is required; the MLIP framework with `calculator` for the checkpoint should be also set up.
#### Run
```
$ cd GAIA
$ python main.py -a benchmark -c {user_config (.yaml)} -o {out_dir} -p {prefix}
```
## Dataset and model checkpoint
[Titan25](https://huggingface.co/datasets/aixsim/Titan25) is an MLIP dataset constructed with GAIA,
comprising 1.8M data points across 11 elements.
[SNet-T25](https://huggingface.co/aixsim/SNet-T25) is an MLIP trained on this dataset.
See [GAIA paper](https://arxiv.org/abs/2509.25798) for details.
## Citation
If using this code, please cite our work as follows:
```
@article{gaia2025,
title={Scalable Reactive Atomistic Dynamics with GAIA},
author={Song, Suhwan and Kim, Heejae and Jang, Jaehee and Cho, Hyuntae and Kim, Gunhee and Kim, Geonu},
journal={arXiv preprint arXiv:2509.25798},
year={2025}
}
```