# EX-4D **Repository Path**: fengshenmeng/EX-4D ## Basic Information - **Project Name**: EX-4D - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-03 - **Last Updated**: 2025-07-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh
EX-4D Logo [📄 Paper](https://arxiv.org/abs/2506.05554) | [🎥 Homepage](https://tau-yihouxiang.github.io/projects/EX-4D/EX-4D.html) | [💻 Code](https://github.com/tau-yihouxiang/EX-4D)
## 🌟 Highlights - **🎯 Extreme Viewpoint Synthesis**: Generate high-quality 4D videos with camera movements ranging from -90° to 90° - **🔧 Depth Watertight Mesh**: Novel geometric representation that models both visible and occluded regions - **⚡ Lightweight Architecture**: Only 1% trainable parameters (140M) of the 14B video diffusion backbone - **🎭 No Multi-view Training**: Innovative masking strategy eliminates the need for expensive multi-view datasets - **🏆 State-of-the-art Performance**: Outperforms existing methods, especially on extreme camera angles ## 🎬 Demo Results
EX-4D Demo Results
*EX-4D transforms monocular videos into camera-controllable 4D experiences with physically consistent results under extreme viewpoints.* ## 🏗️ Framework Overview
EX-4D Architecture
Our framework consists of three key components: 1. **🔺 Depth Watertight Mesh Construction**: Creates a robust geometric prior that explicitly models both visible and occluded regions 2. **🎭 Simulated Masking Strategy**: Generates effective training data from monocular videos without multi-view datasets 3. **⚙️ Lightweight LoRA Adapter**: Efficiently integrates geometric information with pre-trained video diffusion models ## 🚀 Quick Start ### Installation ```bash # Clone the repository git clone https://github.com/tau-yihouxiang/EX-4D.git cd EX-4D # Create conda environment conda create -n ex4d python=3.10 conda activate ex4d # Install PyTorch (2.x recommended) pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124 # Install Nvdiffrast pip install git+https://github.com/NVlabs/nvdiffrast.git # Install dependencies and diffsynth pip install -e . # Install depthcrafter for depth estimation. (Follow DepthCrafter's installing instruction for checkpoints preparation.) git clone https://github.com/Tencent/DepthCrafter.git ``` ### Download Pretrained Model ```bash huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./models/Wan-AI huggingface-cli download yihouxiang/EX-4D --local-dir ./models/EX-4D ``` ### Example Usage #### 1. DW-Mesh Reconstruction ```bash python recon.py --input_video examples/flower/input.mp4 --cam 30 (/60/90/180) --output_dir examples/flower ``` #### 2. EX-4D Generation (48GB VRAM required) ```bash python generate.py --color_video examples/flower/render_180.mp4 --mask_video examples/flower/mask_180.mp4 --output_video examples/output.mp4 ```

Input Video

Output Video
### User Study Results - **70.7%** of participants preferred EX-4D over baseline methods - Superior performance in physical consistency and extreme viewpoint quality - Significant improvement as camera angles become more extreme ## 🎯 Applications - **🎮 Gaming**: Create immersive 3D game cinematics from 2D footage - **🎬 Film Production**: Generate novel camera angles for post-production - **🥽 VR/AR**: Create free-viewpoint video experiences - **📱 Social Media**: Generate dynamic camera movements for content creation - **🏢 Architecture**: Visualize spaces from multiple viewpoints ## ⚠️ Limitations - **Depth Dependency**: Performance relies on monocular depth estimation quality - **Computational Cost**: Requires significant computation for high-resolution videos - **Reflective Surfaces**: Challenges with reflective or transparent materials ## 🔮 Future Work - [ ] Real-time inference optimization (3DGS / 4DGS) - [ ] Support for higher resolutions (1K, 2K) - [ ] Neural mesh refinement techniques ## 🙏 Acknowledgments We would like to thank the [DiffSynth-Studio v1.1.1](https://github.com/modelscope/DiffSynth-Studio/tree/v1.1.1) project for providing the foundational diffusion framework. ## 📚 Citation If you find our work useful, please consider citing: ```bibtex @misc{hu2025ex4dextremeviewpoint4d, title={EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh}, author={Tao Hu and Haoyang Peng and Xiao Liu and Yuewen Ma}, year={2025}, eprint={2506.05554}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.05554}, } ```