# TokenHSI
**Repository Path**: monkeycc/TokenHSI
## Basic Information
- **Project Name**: TokenHSI
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-03
- **Last Updated**: 2025-12-03
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
Liang Pan1,2
·
Zeshi Yang 3
·
Zhiyang Dou2
·
Wenjia Wang2
·
Buzhen Huang4
·
Bo Dai2,5
·
Taku Komura2
·
Jingbo Wang1
1Shanghai AI Lab 2The University of Hong Kong 3Independent Researcher 4Southeast University 5Feeling AI
CVPR 2025
🏆️ Oral Presentation (Top 3.3%)
Also Spotlight in the 1st Workshop on Humanoid Agents at CVPR 2025
## 🏠 About
Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene interaction tasks. It excels at seamlessly unifying multiple foundational HSI skills within a single transformer network and flexibly adapting learned skills to challenging new tasks, including skill composition, object/terrain shape variation, and long-horizon task completion.
## 📹 Demo
Long-horizon Task Completion in a Complex Dynamic Environment
## 🔥 News
- **[2025-04-07]** Released full code. Please note to download the latest datasets and models from Hugging Face.
- **[2025-04-06]** Released three skill composition tasks with pre-trained models.
- **[2025-04-05]** TokenHSI has been selected as an oral paper at CVPR 2025! 🎉
- **[2025-04-03]** Released long-horizon task completion with a pre-trained model.
- **[2025-04-01]** We just updated the Getting Started section. You can play TokenHSI now!
- **[2025-03-31]** We've released the codebase and checkpoint for the foundational skill learning part.
## 📝 TODO List
- [x] Release foundational skill learning
- [x] Release policy adaptation - skill composition
- [x] Release policy adaptation - object shape variation
- [x] Release policy adaptation - terrain shape variation
- [x] Release policy adaptation - long-horizon task completion
## 📖 Getting Started
### Dependencies
Follow the following instructions:
1. Create new conda environment and install pytroch
```
conda create -n tokenhsi python=3.8
conda activate tokenhsi
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
```
2. Install [IsaacGym Preview 4](https://developer.nvidia.com/isaac-gym)
```
cd IsaacGym_Preview_4_Package/isaacgym/python
pip install -e .
# add your conda env path to ~/.bashrc
export LD_LIBRARY_PATH="your_conda_env_path/lib:$LD_LIBRARY_PATH"
```
3. Install pytorch3d (optional, if you want to run the long-horizon task completion demo)
**We use pytorch3d to rapidly render height maps of dynamic objects for thousands of simulation environments.**
```
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install git+https://github.com/facebookresearch/pytorch3d.git@v0.7.7
```
4. Download [SMPL body models](https://smpl.is.tue.mpg.de/) and organize them as follows:
```
|-- assets
|-- body_models
|-- smpl
|-- SMPL_FEMALE.pkl
|-- SMPL_MALE.pkl
|-- SMPL_NEUTRAL.pkl
|-- ...
|-- lpanlib
|-- tokenhsi
```
### Motion & Object Data
We provide two methods to generate the motion and object data.
* Download pre-processed data from [Hugging Face](https://huggingface.co/datasets/lianganimation/TokenHSI). Please follow the instruction in the dataset page.
* Generate data from source:
1. Download [AMASS (SMPL-X Neutral)](https://amass.is.tue.mpg.de/), [SAMP](https://samp.is.tue.mpg.de/), and [OMOMO](https://github.com/lijiaman/omomo_release).
2. Modify dataset paths in ```tokenhsi/data/dataset_cfg.yaml``` file.
```
# Motion datasets, please use your own paths
amass_dir: "/YOUR_PATH/datasets/AMASS"
samp_pkl_dir: "/YOUR_PATH/datasets/samp"
omomo_dir: "/YOUR_PATH/datasets/OMOMO/data"
```
3. We still need to download the pre-processed data from [Hugging Face](https://huggingface.co/datasets/lianganimation/TokenHSI). But now we only require the object data.
4. Run the following script:
```
bash tokenhsi/scripts/gen_data.sh
```
### Checkpoints
Download checkpoints from [Hugging Face](https://huggingface.co/lianganimation/TokenHSI). Please follow the instruction in the model page.
## 🕹️ Play TokenHSI!
* Single task policy trained with AMP
* Path-following
```
# test
sh tokenhsi/scripts/single_task/traj_test.sh
# train
sh tokenhsi/scripts/single_task/traj_train.sh
```
* Sitting
```
# test
sh tokenhsi/scripts/single_task/sit_test.sh
# train
sh tokenhsi/scripts/single_task/sit_train.sh
```
* Climbing
```
# test
sh tokenhsi/scripts/single_task/climb_test.sh
# train
sh tokenhsi/scripts/single_task/climb_train.sh
```
* Carrying
```
# test
sh tokenhsi/scripts/single_task/carry_test.sh
# train
sh tokenhsi/scripts/single_task/carry_train.sh
```
* TokenHSI's unified transformer policy
* Foundational Skill Learning
```
# test
sh tokenhsi/scripts/tokenhsi/stage1_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage1_eval.sh carry # we need to specify a task to eval, e.g., traj, sit, climb, or carry.
# train
sh tokenhsi/scripts/tokenhsi/stage1_train.sh
```
If you successfully run the test command, you will see:
* Policy Adaptation - Skill Composition
* Traj + Carry
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_train.sh
```
If you successfully run the test command, you will see:
* Sit + Carry
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_train.sh
```
If you successfully run the test command, you will see:
* Climb + Carry
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_train.sh
```
If you successfully run the test command, you will see:
* Policy Adaptation - Object Shape Variation
* Carrying: Box-2-Chair
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_object_chair_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_object_chair_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_object_chair_train.sh
```
If you successfully run the test command, you will see:
* Carrying: Box-2-Table
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_object_table_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_object_table_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_object_table_train.sh
```
If you successfully run the test command, you will see:
* Policy Adaptation - Terrain Shape Variation
* Path-following
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_train.sh
```
If you successfully run the test command, you will see:
* Carrying
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_train.sh
```
If you successfully run the test command, you will see:
* Policy Adaptation - Long-horizon Task Completion
```
# test
sh tokenhsi/scripts/tokenhsi/stage2_longterm_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage2_longterm_eval.sh
# train
sh tokenhsi/scripts/tokenhsi/stage2_longterm_train.sh
```
### Viewer Shortcuts
| Keyboard | Function |
| ---- | --- |
| F | focus on humanoid |
| Right Click + WASD | change view port |
| Shift + Right Click + WASD | change view port fast |
| K | visualize lines |
| L | record screenshot, press again to stop recording|
The recorded screenshots are saved in ``` output/imgs/ ```. You can use ``` lpanlib/others/video.py ``` to generate mp4 video from the recorded images.
```
python lpanlib/others/video.py --imgs_dir output/imgs/example_path --delete_imgs
```
## 🔗 Citation
If you find our work helpful, please cite:
```bibtex
@inproceedings{pan2025tokenhsi,
title={TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization},
author={Pan, Liang and Yang, Zeshi and Dou, Zhiyang and Wang, Wenjia and Huang, Buzhen and Dai, Bo and Komura, Taku and Wang, Jingbo},
booktitle={CVPR},
year={2025},
}
@inproceedings{pan2024synthesizing,
title={Synthesizing physically plausible human motions in 3d scenes},
author={Pan, Liang and Wang, Jingbo and Huang, Buzhen and Zhang, Junyu and Wang, Haofan and Tang, Xu and Wang, Yangang},
booktitle={2024 International Conference on 3D Vision (3DV)},
pages={1498--1507},
year={2024},
organization={IEEE}
}
```
Please also consider citing the following papers that inspired TokenHSI.
```bibtex
@article{tessler2024maskedmimic,
title={Maskedmimic: Unified physics-based character control through masked motion inpainting},
author={Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal and Peng, Xue Bin},
journal={ACM Transactions on Graphics (TOG)},
volume={43},
number={6},
pages={1--21},
year={2024},
publisher={ACM New York, NY, USA}
}
@article{he2024hover,
title={Hover: Versatile neural whole-body controller for humanoid robots},
author={He, Tairan and Xiao, Wenli and Lin, Toru and Luo, Zhengyi and Xu, Zhenjia and Jiang, Zhenyu and Kautz, Jan and Liu, Changliu and Shi, Guanya and Wang, Xiaolong and others},
journal={arXiv preprint arXiv:2410.21229},
year={2024}
}
```
## 👏 Acknowledgements and 📚 License
This repository builds upon the following awesome open-source projects:
- [ASE](https://github.com/nv-tlabs/ASE): Contributes to the physics-based character control codebase
- [Pacer](https://github.com/nv-tlabs/pacer): Contributes to the procedural terrain generation and trajectory following task
- [rl_games](https://github.com/Denys88/rl_games): Contributes to the reinforcement learning code
- [OMOMO](https://github.com/lijiaman/omomo_release)/[SAMP](https://samp.is.tue.mpg.de/)/[AMASS](https://amass.is.tue.mpg.de/)/[3D-Front](https://arxiv.org/abs/2011.09127): Used for the reference dataset construction
- [InterMimic](https://github.com/Sirui-Xu/InterMimic): Used for the github repo readme design
This codebase is released under the [MIT License](LICENSE).
Please note that it also relies on external libraries and datasets, each of which may be subject to their own licenses and terms of use.
## 🌟 Star History