# TokenHSI **Repository Path**: monkeycc/TokenHSI ## Basic Information - **Project Name**: TokenHSI - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-03 - **Last Updated**: 2025-12-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Liang Pan^1,2 · Zeshi Yang ³ · Zhiyang Dou² · Wenjia Wang² · Buzhen Huang⁴ · Bo Dai^2,5 · Taku Komura² · Jingbo Wang¹
¹Shanghai AI Lab ²The University of Hong Kong ³Independent Researcher ⁴Southeast University ⁵Feeling AI
CVPR 2025
🏆️ Oral Presentation (Top 3.3%)
Also Spotlight in the 1st Workshop on Humanoid Agents at CVPR 2025

## 🏠 About

Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene interaction tasks. It excels at seamlessly unifying multiple foundational HSI skills within a single transformer network and flexibly adapting learned skills to challenging new tasks, including skill composition, object/terrain shape variation, and long-horizon task completion.
## 📹 Demo

Long-horizon Task Completion in a Complex Dynamic Environment

## 🔥 News - **[2025-04-07]** Released full code. Please note to download the latest datasets and models from Hugging Face. - **[2025-04-06]** Released three skill composition tasks with pre-trained models. - **[2025-04-05]** TokenHSI has been selected as an oral paper at CVPR 2025! 🎉 - **[2025-04-03]** Released long-horizon task completion with a pre-trained model. - **[2025-04-01]** We just updated the Getting Started section. You can play TokenHSI now! - **[2025-03-31]** We've released the codebase and checkpoint for the foundational skill learning part. ## 📝 TODO List - [x] Release foundational skill learning - [x] Release policy adaptation - skill composition - [x] Release policy adaptation - object shape variation - [x] Release policy adaptation - terrain shape variation - [x] Release policy adaptation - long-horizon task completion ## 📖 Getting Started ### Dependencies Follow the following instructions: 1. Create new conda environment and install pytroch ``` conda create -n tokenhsi python=3.8 conda activate tokenhsi conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirements.txt ``` 2. Install [IsaacGym Preview 4](https://developer.nvidia.com/isaac-gym) ``` cd IsaacGym_Preview_4_Package/isaacgym/python pip install -e . # add your conda env path to ~/.bashrc export LD_LIBRARY_PATH="your_conda_env_path/lib:$LD_LIBRARY_PATH" ``` 3. Install pytorch3d (optional, if you want to run the long-horizon task completion demo) **We use pytorch3d to rapidly render height maps of dynamic objects for thousands of simulation environments.** ``` conda install -c fvcore -c iopath -c conda-forge fvcore iopath pip install git+https://github.com/facebookresearch/pytorch3d.git@v0.7.7 ``` 4. Download [SMPL body models](https://smpl.is.tue.mpg.de/) and organize them as follows: ``` |-- assets |-- body_models |-- smpl |-- SMPL_FEMALE.pkl |-- SMPL_MALE.pkl |-- SMPL_NEUTRAL.pkl |-- ... |-- lpanlib |-- tokenhsi ``` ### Motion & Object Data We provide two methods to generate the motion and object data. * Download pre-processed data from [Hugging Face](https://huggingface.co/datasets/lianganimation/TokenHSI). Please follow the instruction in the dataset page. * Generate data from source: 1. Download [AMASS (SMPL-X Neutral)](https://amass.is.tue.mpg.de/), [SAMP](https://samp.is.tue.mpg.de/), and [OMOMO](https://github.com/lijiaman/omomo_release). 2. Modify dataset paths in ```tokenhsi/data/dataset_cfg.yaml``` file. ``` # Motion datasets, please use your own paths amass_dir: "/YOUR_PATH/datasets/AMASS" samp_pkl_dir: "/YOUR_PATH/datasets/samp" omomo_dir: "/YOUR_PATH/datasets/OMOMO/data" ``` 3. We still need to download the pre-processed data from [Hugging Face](https://huggingface.co/datasets/lianganimation/TokenHSI). But now we only require the object data. 4. Run the following script: ``` bash tokenhsi/scripts/gen_data.sh ``` ### Checkpoints Download checkpoints from [Hugging Face](https://huggingface.co/lianganimation/TokenHSI). Please follow the instruction in the model page. ## 🕹️ Play TokenHSI! * Single task policy trained with AMP * Path-following ``` # test sh tokenhsi/scripts/single_task/traj_test.sh # train sh tokenhsi/scripts/single_task/traj_train.sh ``` * Sitting ``` # test sh tokenhsi/scripts/single_task/sit_test.sh # train sh tokenhsi/scripts/single_task/sit_train.sh ``` * Climbing ``` # test sh tokenhsi/scripts/single_task/climb_test.sh # train sh tokenhsi/scripts/single_task/climb_train.sh ``` * Carrying ``` # test sh tokenhsi/scripts/single_task/carry_test.sh # train sh tokenhsi/scripts/single_task/carry_train.sh ``` * TokenHSI's unified transformer policy * Foundational Skill Learning ``` # test sh tokenhsi/scripts/tokenhsi/stage1_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage1_eval.sh carry # we need to specify a task to eval, e.g., traj, sit, climb, or carry. # train sh tokenhsi/scripts/tokenhsi/stage1_train.sh ``` If you successfully run the test command, you will see:

* Policy Adaptation - Skill Composition * Traj + Carry ``` # test sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_comp_traj_carry_train.sh ``` If you successfully run the test command, you will see:

* Sit + Carry ``` # test sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_comp_sit_carry_train.sh ``` If you successfully run the test command, you will see:

* Climb + Carry ``` # test sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_comp_climb_carry_train.sh ``` If you successfully run the test command, you will see:

* Policy Adaptation - Object Shape Variation * Carrying: Box-2-Chair ``` # test sh tokenhsi/scripts/tokenhsi/stage2_object_chair_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_object_chair_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_object_chair_train.sh ``` If you successfully run the test command, you will see:

* Carrying: Box-2-Table ``` # test sh tokenhsi/scripts/tokenhsi/stage2_object_table_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_object_table_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_object_table_train.sh ``` If you successfully run the test command, you will see:

* Policy Adaptation - Terrain Shape Variation * Path-following ``` # test sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_terrain_traj_train.sh ``` If you successfully run the test command, you will see:

* Carrying ``` # test sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_terrain_carry_train.sh ``` If you successfully run the test command, you will see:

* Policy Adaptation - Long-horizon Task Completion ``` # test sh tokenhsi/scripts/tokenhsi/stage2_longterm_test.sh # eval sh tokenhsi/scripts/tokenhsi/stage2_longterm_eval.sh # train sh tokenhsi/scripts/tokenhsi/stage2_longterm_train.sh ``` ### Viewer Shortcuts | Keyboard | Function | | ---- | --- | | F | focus on humanoid | | Right Click + WASD | change view port | | Shift + Right Click + WASD | change view port fast | | K | visualize lines | | L | record screenshot, press again to stop recording| The recorded screenshots are saved in ``` output/imgs/ ```. You can use ``` lpanlib/others/video.py ``` to generate mp4 video from the recorded images. ``` python lpanlib/others/video.py --imgs_dir output/imgs/example_path --delete_imgs ``` ## 🔗 Citation If you find our work helpful, please cite: ```bibtex @inproceedings{pan2025tokenhsi, title={TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization}, author={Pan, Liang and Yang, Zeshi and Dou, Zhiyang and Wang, Wenjia and Huang, Buzhen and Dai, Bo and Komura, Taku and Wang, Jingbo}, booktitle={CVPR}, year={2025}, } @inproceedings{pan2024synthesizing, title={Synthesizing physically plausible human motions in 3d scenes}, author={Pan, Liang and Wang, Jingbo and Huang, Buzhen and Zhang, Junyu and Wang, Haofan and Tang, Xu and Wang, Yangang}, booktitle={2024 International Conference on 3D Vision (3DV)}, pages={1498--1507}, year={2024}, organization={IEEE} } ``` Please also consider citing the following papers that inspired TokenHSI. ```bibtex @article{tessler2024maskedmimic, title={Maskedmimic: Unified physics-based character control through masked motion inpainting}, author={Tessler, Chen and Guo, Yunrong and Nabati, Ofir and Chechik, Gal and Peng, Xue Bin}, journal={ACM Transactions on Graphics (TOG)}, volume={43}, number={6}, pages={1--21}, year={2024}, publisher={ACM New York, NY, USA} } @article{he2024hover, title={Hover: Versatile neural whole-body controller for humanoid robots}, author={He, Tairan and Xiao, Wenli and Lin, Toru and Luo, Zhengyi and Xu, Zhenjia and Jiang, Zhenyu and Kautz, Jan and Liu, Changliu and Shi, Guanya and Wang, Xiaolong and others}, journal={arXiv preprint arXiv:2410.21229}, year={2024} } ``` ## 👏 Acknowledgements and 📚 License This repository builds upon the following awesome open-source projects: - [ASE](https://github.com/nv-tlabs/ASE): Contributes to the physics-based character control codebase - [Pacer](https://github.com/nv-tlabs/pacer): Contributes to the procedural terrain generation and trajectory following task - [rl_games](https://github.com/Denys88/rl_games): Contributes to the reinforcement learning code - [OMOMO](https://github.com/lijiaman/omomo_release)/[SAMP](https://samp.is.tue.mpg.de/)/[AMASS](https://amass.is.tue.mpg.de/)/[3D-Front](https://arxiv.org/abs/2011.09127): Used for the reference dataset construction - [InterMimic](https://github.com/Sirui-Xu/InterMimic): Used for the github repo readme design This codebase is released under the [MIT License](LICENSE). Please note that it also relies on external libraries and datasets, each of which may be subject to their own licenses and terms of use. ## 🌟 Star History