# WorldDrive
**Repository Path**: tj1652045/WorldDrive
## Basic Information
- **Project Name**: WorldDrive
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-30
- **Last Updated**: 2026-04-30
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Bridging Scene Generation and Planning:
Driving with World Model via Unifying Vision and Motion Representation
Xingtai Gui
1, Meijie Zhang
2, Tianyi Yan
1, Wencheng Han
1, Jiahao Gong
2, Feiyang Tan
2, Cheng-zhong Xu
1, Jianbing Shen
1
>
1SKL-IOTSC, CIS, University of Macau,
2Afari Intelligent Drive
---
## News
**[2026.4.23]** Release the Planner Training script\
**[2026.3.17]** Release the Arxiv Paper\
**[2026.3.15]** Release the WorldDrive Evaluation and Visualization script\
**[2026.3.14]** Release the WorldDrive Project!
## Table of Contents
- [News](#news)
- [Table of Contents](#table-of-contents)
- [Abstract](#abstract)
- [Overview](#overview)
- [Getting Started](#getting-started)
- [Checkpoint](#checkpoint)
- [Quick Evaluation](#quick-evaluation)
- [Visualize WorldDrive](#Visulize-WorldDrive)
- [Quick Training](#quick-training)
- [Contact](#contact)
- [Acknowledgement](#acknowledgement)
- [Citation](#citation)
---
## Abstract
End-to-end autonomous driving aims to generate safe and plausible planning policies from raw sensor input, and constructing an effective scene representation is a critical challenge. Driving world models have shown great potential in learning rich representations by predicting the future evolution of a driving scene. However, existing driving world models primarily focus on visual scene representation, and motion representation is not explicitly designed to be planner-shared and inheritable, leaving a schism between the optimization of visual scene generation and the requirements of precise motion planning. We present WorldDrive, a holistic framework that couples scene generation and real-time planning via unifying vision and motion representation. We first introduce a Trajectory-aware Driving World Model, which conditions on a trajectory vocabulary to enforce consistency between visual dynamics and motion intentions, enabling the generation of diverse and plausible future scenes conditioned on a specific trajectory. We transfer the vision and motion encoders to a downstream Multi-modal Planner, ensuring the driving policy operates on mature representations pre-optimized by scene generation. A simple interaction between motion representation, visual representation, and ego status can generate high-quality, multi-modal trajectories. Furthermore, to exploit the world modelβs foresight, we propose a Future-aware Rewarder, which distills future latent representation from the frozen world model to evaluate and select optimal trajectories in real-time. Extensive experiments on the NAVSIM, NAVSIM-v2, and nuScenes benchmarks demonstrate that WorldDrive achieves state-of-the-art planning performance among vision-only methods while maintaining high-fidelity action-controlled video generation capabilities, providing strong evidence for the effectiveness of unifying vision and motion representation for robust autonomous driving.
---
## Overview
---
## Getting Started
We provide detailed guides to help you quickly set up, and evaluate WorldDrive:
- [Getting started from NAVSIM environment preparation](https://github.com/autonomousvision/navsim?tab=readme-ov-file#getting-started-)
- [Preparation of WorldDrive environment](docs/Installation.md)
- [WorldDrive Training and Evaluation](docs/Train_Eval.md)
## Checkpoint
π [Checkpoint](https://huggingface.co/tabguigui/WorldDrive/tree/main)
```bash
# worlddrive_stage1_train.ckpt planner checkpoint
# worlddrive_stage2_train.ckpt planner with future-aware rewarder checkpoint
# worldtraj_stage1_1024_tadwm.pkl TA-DWM pretrain checkpoint
```
## Quick Evaluation
### Multi-modal Planner
#### Step1: cache dataset(3D causal VAE latents)
Download the pretrained 3D Causal VAE from offical CogvideoX-2B HF\
π [CogvideoX-2B VAE](https://huggingface.co/zai-org/CogVideoX-2b/tree/main)
```bash
sh scripts/cache/run_caching_trajworld_eval.sh # navtest for eval
```
#### Step2: evaluate planner
```bash
# download worlddrive_stage1_train.ckpt
sh scripts/evaluation/run_worlddrive_planner_pdm_score_evaluation_stage1.sh
```
#### Step3: evaluate planner with future-aware rewarder
```bash
# download worlddrive_stage2_train.ckpt
sh scripts/evaluation/run_worlddrive_planner_pdm_score_evaluation_stage2.sh
```
## Visulize WorldDrive
Generate planning result and corresponding future scene
```bash
sh scripts/visualization/worlddrive_visual.sh
```
---
## Quick Training
### Multi-modal Planner Training
#### Step1: cache dataset(3D causal VAE latents)
Download the anchor and corresponding formated PDMS\
π [Anchors](https://huggingface.co/tabguigui/WorldDrive/tree/main)
```bash
sh scripts/cache/run_caching_trajworld.sh # navtrain
```
#### Step2: download ta-dwm checkpoint
Download the corresponding ta-dwm checkpoint training on NAVSIM (*worldtraj_stage1_1024_tadwm*) or use the checkpoint training from [ta-dwm training](docs/Train_Eval.md).\
π [TA-DWM Model](https://huggingface.co/tabguigui/WorldDrive/tree/main)
#### Step3: train planner
```bash
sh scripts/training/run_worlddrive_planner.sh
```
## Contact
If you have any questions, please contact Xingtai via email (tabgui324@gmail.com)
## Acknowledgement
We thank the research community for their valuable support. WorldDrive is built upon the following outstanding open-source projects: \
[diffusers](https://github.com/huggingface/diffusers) \
[WoTE](https://github.com/liyingyanUCAS/WoTE)(End-to-End Driving with Online Trajectory Evaluation via BEV World Model (ICCV2025)) \
[Epona](https://github.com/Kevin-thu/Epona)(Epona: Autoregressive Diffusion World Model for Autonomous Driving) \
[Recogdrive](https://github.com/xiaomi-research/recogdrive)(A Reinforced Cognitive Framework for End-to-End Autonomous Driving) \
## Citation
If you find WorldDrive is useful in your research or applications, please consider giving us a star π.