# TGPO **Repository Path**: helldog2022/TGPO ## Basic Information - **Project Name**: TGPO - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-03-26 - **Last Updated**: 2026-03-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ✨Getting Started ## Installation You can install dependencies by running the following commands: ```bash conda create -n tgpo python=3.10 conda activate tgpo pip install airports-py git clone https://github.com/helldog-star/TGPO git clone https://github.com/dottxt-ai/outlines cd outlines git checkout 0.0.46 pip install . cd ../TGPO/luffy pip install -r requirements.v2.txt pip install -e . cd verl pip install -e . cd ../.. pip install transformers==4.55.4 ``` If you encounter issues when installing flash-attn, we recommend you to install it here [flash-attn](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.3). For example, we use this version. ```bash wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl pip install flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl ``` ## Repo Structure This repository includes: - `luffy`: Codes for training on-policy or mixed-policy (using off-policy reasoning traces) or on-policy distill models. Our main code changes are in luffy/verl/verl/mix_src. - `data`: Data and code for training and evaluating LUFFY. - `exp_scripts`: Example script to train models. - `eval_scripts`: Evaluation scripts on math and out-of-distribution benchmarks. - `ExGRPO`: Implementation and notes for ExGRPO, which leverages off-policy experience replay to further boost performance without external guidance. 我们的项目建立在luffy之上,感谢luffy的开源工作! --- # 🔧Usage ## Model and Dataset Preparation 确认 data/download.sh 和 data/my_prepare_train.sh 中的 CONDA_SH_PATH / CONDA_ENV_NAME / BASE_DIR 即可 ```bath cd data bash download.sh bash my_prepare_train.sh ```