# panacea **Repository Path**: rilllove/panacea ## Basic Information - **Project Name**: panacea - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-06-01 - **Last Updated**: 2025-06-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Panacea: Panoramic and Controllable Video Generation for Autonomous Driving **Official Repository of Panacea.** > [Paper] [**Panacea: Panoramic and Controllable Video Generation for Autonomous Driving**](https://arxiv.org/abs/2311.16813), Yuqing Wen1*†, Yucheng Zhao2*,Yingfei Liu2*, Fan Jia2, Yanhui Wang1, Chong Luo1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive
*Equal Contribution, This work was done during the internship at MEGVII, Corresponding Author. >[Paper] [**Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving**](https://arxiv.org/abs/2408.07605), Yuqing Wen1*†, Yucheng Zhao2*,Yingfei Liu2*, Binyuan Huang4*, Fan Jia2, Yanhui Wang1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive, 4Wuhan University
*Equal Contribution, This work was done during the internship at MEGVII, Corresponding Author. > [WebPage] https://panacea-ad.github.io/ # ### News * **`Aug. 15th, 2024`:** We release an enhanced version of Panacea, named Panancea+, which has improved performance and comprehensive validation on multiple datasets and tasks. For more details, please refer to the paper Panacea+[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2408.07605). * **`Aug. 15th, 2024`:** We release the checkpoint and inference scripts for stage 2 of Panacea+, you can use it to generate multi-view video samples based on BEV layout sequences. * **`Apr. 18th, 2024`:** We release our Gen-nuScenes dataset generated by Panacea. Please check the `metrics/` folder to use it. * **`Apr. 18th, 2024`:** We release the BEV-perception evaluation codes based on StreamPETR[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2303.11926). Please check the `metrics/` folder and follow the `metrics/README.md` for detailed evaluation. # Getting Started Please follow our documentation step by step. ## Environment Setup Following the instruction from: [**Environment Setup.**](./docs/generation_environment.md) ## Prepare dataset Prepare real dataset following the instruction from [**Data Preparation.**](./metrics/StreamPETR/docs/data_preparation.md) Remember to put the dataset under the path *data/nuscenes* ## Download pretrained checkpoint Download the weights of the second stage from [panaceaplus_40k_deepspeed.ckpt](https://huggingface.co/orangewen/panacea_ckpts/resolve/main/panaceaplus_40k_deepspeed.ckpt?download=true) Put it to folder *checkpoints/* ## Inference *--split*: to specify train or val sets *--use_last_frame=true* means use the last frame as conditional image. Run the following command to inference stage 2 on the whole training/val set of nuscenes. ```bash python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 inference.py --base configs/inference_nuscenes.yaml --ckptpath --ckpt checkpoints/panaceaplus_40k_deepspeed.ckpt --split train --use_last_frame true --name EXP_NAME --bs 1 ```

Generating Multi-View and Controllable Videos for Autonoumous Driving

Overview of Panacea. (a). The diffusion training process of Panacea, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.

The two-stage inference pipeline of Panacea. Its two-stage process begins by creating multi-view images with BEV layouts, followed by using these images, along with subsequent BEV layouts, to facilitate the generation of following frames.

🎬   BEV-guided Video Generation   🎬

Controllable multi-view video generation. Panacea is able to generate realistic, controllable videos with good temporal and view consistensy.

🎞   Attribute Controllable Video Generation   🎞

Video generation with variable attribute controls, such as weather, time, and scene, which allows Panacea to simulate a variety of rare driving scenarios, including extreme weather conditions such as rain and snow, thereby greatly enhancing the diversity of the data.

🔥   Benefiting Autonomous Driving   🔥

(a). Panoramic video generation based on BEV (Bird’s-Eye-View) layout sequence facilitates the establishment of a synthetic video dataset, which enhances perceptual tasks. (b). Producing panoramic videos with conditional images and BEV layouts can effectively elevate image-only datasets to video datasets, thus enabling the advancement of video-based perception techniques.

BibTex

                
@inproceedings{wen2024panacea,
  title={Panacea: Panoramic and controllable video generation for autonomous driving},
  author={Wen, Yuqing and Zhao, Yucheng and Liu, Yingfei and Jia, Fan and Wang, Yanhui and Luo, Chong and Zhang, Chi and Wang, Tiancai and Sun, Xiaoyan and Zhang, Xiangyu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6902--6912},
  year={2024}
}
@misc{wen2024panaceapanoramiccontrollablevideo,
      title={Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving}, 
      author={Yuqing Wen and Yucheng Zhao and Yingfei Liu and Binyuan Huang and Fan Jia and Yanhui Wang and Chi Zhang and Tiancai Wang and Xiaoyan Sun and Xiangyu Zhang},
      year={2024},
      eprint={2408.07605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.07605}, 
}
}

Contact

Feel free to contact us at wenyuqing AT mail.ustc.edu.cn or wangtiancai AT megvii.com

# Acknowledgement This code builds on [Stability-AI](https://github.com/Stability-AI/generative-models), [ControlNet](https://github.com/lllyasviel/ControlNet) and [StreamPETR](https://github.com/exiawsh/StreamPETR). Thanks for open-sourcing!