# Awesome-World-Models **Repository Path**: AI4EarthLab/Awesome-World-Models ## Basic Information - **Project Name**: Awesome-World-Models - **Description**: No description available - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-25 - **Last Updated**: 2025-09-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Awesome World Models for Robotics [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) This repository provides a curated list of **papers for World Models for General Video Generation, Embodied AI, and Autonomous Driving**. Template from [Awesome-LLM-Robotics](https://github.com/GT-RIPL/Awesome-LLM-Robotics) and [Awesome-World-Model](https://github.com/LMD0311/Awesome-World-Model)
#### Contributions are welcome! Please feel free to submit [pull requests](https://github.com/leofan90/Awesome-World-Models/blob/main/how-to-PR.md) or reach out via [email](mailto:chunkaifan-changetoat-stu-changetodot-pku--changetodot-changetoedu-changetocn) to add papers!
If you find this repository useful, please consider [citing](#citation) and giving this list a star ⭐. Feel free to share it with others! --- ## Overview - [Foundation paper of World Model](#foundation-paper-of-world-model) - [Blog or Technical Report](#blog-or-technical-report) - [Surveys](#surveys) - [Benchmarks & Evaluation](#benchmarks-&-evaluation) - [General World Models](#general-world-models) - [World Models for Embodied AI](#world-models-for-embodied-ai) - [World Models for VLA](#world-models-for-VLA) - [World Models for Autonomous Driving](#world-models-for-autonomous-driving) - [Citation](#citation) --- ## Foundation paper of World Model * World Models, **`NIPS 2018 Oral`**. [[Paper](https://arxiv.org/abs/1803.10122)] [[Website](https://worldmodels.github.io/)] ## Blog or Technical Report * **`Matrix-Game 2.0`**, Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. [[Paper](https://arxiv.org/abs/2508.13009)] [[Website](https://matrix-game-v2.github.io/)] * **`Matrix-3D`**, Matrix-3D: Omnidirectional Explorable 3D World Generation. [[Paper](https://arxiv.org/abs/2508.08086)] [[Website](https://matrix-3d.github.io)] * **`HunyuanWorld 1.0`**, HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. [[Paper](https://arxiv.org/abs/2507.21809)] [[Website](https://3d-models.hunyuan.tencent.com/world/)] [[Code](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)] * What Does it Mean for a Neural Network to Learn a "World Model"?. [[Paper](https://arxiv.org/abs/2507.21513)] * **`Matrix-Game`**, Matrix-Game: Interactive World Foundation Model. [[Paper](https://arxiv.org/abs/2506.18701)] [[Code](https://github.com/SkyworkAI/Matrix-Game)] * **`Cosmos-Drive-Dreams`**, Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models. [[Paper](https://arxiv.org/abs/2506.09042)] [[Website](https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams)] * **`GAIA-2`**, GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving. [[Paper](https://arxiv.org/abs/2503.20523)] [[Website](https://wayve.ai/thinking/gaia-2)] * **`Cosmos`**, Cosmos World Foundation Model Platform for Physical AI. [[Paper](https://arxiv.org/abs/2501.03575)] [[Website](https://www.nvidia.com/en-us/ai/cosmos/)] [[Code](https://github.com/NVIDIA/Cosmos)] * **`1X Technologies`**, 1X World Model. [[Blog](https://www.1x.tech/discover/1x-world-model)] * **`Runway`**, Introducing General World Models. [[Blog](https://runwayml.com/research/introducing-general-world-models)] * **`Wayve`**, Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy. [[Paper](https://arxiv.org/pdf/2309.17080)] [[Blog](https://wayve.ai/thinking/introducing-gaia1/)] * **`Yann LeCun`**, A Path Towards Autonomous Machine Intelligence. [[Paper](https://openreview.net/pdf?id=BZ5a1r-kVsf)] ## Surveys * "3D and 4D World Modeling: A Survey", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.07996)] * "Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges", **`arXiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.09561)] * "A Survey: Learning Embodied Intelligence from Physical Simulators and World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.00917)] [[Code](https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey)] * "Embodied AI Agents: Modeling the World", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.22355)] * "From 2D to 3D Cognition: A Brief Survey of General World Models", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.20134)] * "A Survey on World Models Grounded in Acoustic Physical Information", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.13833)] * "Exploring the Evolution of Physics Cognition in Video Generation: A Survey", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.21765)] [[Code](https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation)] * "World Models in Artificial Intelligence: Sensing, Learning, and Reasoning Like a Child", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.15168)] * "Simulating the Real World: A Unified Survey of Multimodal Generative Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.04641)] [[Code](https://github.com/ALEEEHU/World-Simulator)] * "Four Principles for Physically Interpretable World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02143)] * "The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.10498)] [[Code](https://github.com/LMD0311/Awesome-World-Model)] * "A Survey of World Models for Autonomous Driving", **`TPAMI`**. [[Paper](https://arxiv.org/abs/2501.11260)] * "Understanding World or Predicting Future? A Comprehensive Survey of World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.14499)] * "World Models: The Safety Perspective", **`ISSRE WDMD`**. [[Paper](https://arxiv.org/abs/2411.07690)] * "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.02914)] * "From Efficient Multimodal Models to World Models: A Survey", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.00118)] * "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.06886)] [[Code](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)] * "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.03520)] [[Code](https://github.com/GigaAI-research/General-World-Models-Survey)] * "World Models for Autonomous Driving: An Initial Survey", **`TIV`**. [[Paper](https://arxiv.org/abs/2403.02622)] * "A survey on multimodal large language models for autonomous driving", **`WACVW 2024`**. [[Paper](https://arxiv.org/abs/2311.12320)] [[Code](https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving)] --- ## Benchmarks & Evaluation * **OmniWorld**: "OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.12201)] [[Website](https://yangzhou24.github.io/OmniWorld/)] * "Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2508.01922)] * **WM-ABench**: "Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation", **`ACL 2025(Findings)`**. [[Paper](https://arxiv.org/abs/2506.21876)] [[Website](https://wm-abench.maitrix.org/)] * **UNIVERSE**: "Adapting Vision-Language Models for Evaluating World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.17967)] * **WorldPrediction**: "WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.04363)] * "Toward Memory-Aided World Models: Benchmarking via Spatial Consistency", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.22976)] [[Datasets](https://huggingface.co/datasets/kevinLian/LoopNav)] [[Code](https://github.com/Kevin-lkw/LoopNav)] * **SimWorld**: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2503.13952)] [[Code](https://github.com/Li-Zn-H/SimWorld)] * **EWMBench**: "EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.09694)] [[Code](https://github.com/AgibotTech/EWMBench)] * "Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments", **`arxiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.08122)] * **WorldModelBench**: "WorldModelBench: Judging Video Generation Models As World Models", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2502.20694)] [[Website](https://worldmodelbench-team.github.io/)] * **Text2World**: "Text2World: Benchmarking Large Language Models for Symbolic World Model Generation", **`arxiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.13092)] [[Website](https://text-to-world.github.io/)] * **ACT-Bench**: "ACT-Bench: Towards Action Controllable World Models for Autonomous Driving", **`arxiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.05337)] * **WorldSimBench**: "WorldSimBench: Towards Video Generation Models as World Simulators", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.18072)] [[Website](https://iranqin.github.io/WorldSimBench.github.io/)] * **EVA**: "EVA: An Embodied World Model for Future Video Anticipation", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2410.15461)] [[Website](https://sites.google.com/view/eva-publi)] * **AeroVerse**: "AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models", **`arxiv 2024.08`**. [[Paper](https://arxiv.org/pdf/2408.15511)] * **CityBench**: "CityBench: Evaluating the Capabilities of Large Language Model as World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.13945)] [[Code](https://github.com/tsinghua-fib-lab/CityBench)] * "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models", **`NIPS 2023`**. [[Paper](https://arxiv.org/abs/2311.09064)] --- ## General World Models * "World Modeling with Probabilistic Structure Integration", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.09737)] * "One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.07945)] [[Code](https://github.com/opendilab/LightZero)] * **LatticeWorld**: "LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.05263)] * "Planning with Reasoning using Vision Language World Model", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.02722)] * "Social World Models", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.00559)] * "Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.20294)] * **HERO**: "HERO: Hierarchical Extrapolation and Refresh for Efficient World Models", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.17588)] * "Scalable RF Simulation in Generative 4D Worlds", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.12176)] * "Finite Automata Extraction: Low-data World Model Learning as Programs from Gameplay Video", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.11836)] * "Visuomotor Grasping with World Models for Surgical Robots", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.11200)] * "In-Context Reinforcement Learning via Communicative World Models", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06659)] [[Code](https://github.com/fernando-ml/CORAL )] * **PIGDreamer**: "PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2508.02159)] * **SimuRA**: "SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.23773)] * "Back to the Features: DINO as a Foundation for Video World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.19468)] * **Yume**: "Yume: An Interactive World Generation Model", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.17744)] [[Website](https://stdstu12.github.io/YUME-Project/)] [[Code](https://github.com/stdstu12/YUME)] * "LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.15521)] * "Safety Certification in the Latent space using Control Barrier Functions and World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.13871)] * "Assessing adaptive world models in machines with novel games", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.12821)] * "Graph World Model", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2507.10539)] [[Website](https://github.com/ulab-uiuc/GWM)] * **MobiWorld**: "MobiWorld: World Models for Mobile Wireless Network", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.09462)] * "Continual Reinforcement Learning by Planning with Online World Models", **`ICML 2025 Spotlight`**. [[Paper](https://arxiv.org/abs/2507.09177)] * **AirScape**: "AirScape: An Aerial Generative World Model with Motion Controllability", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.08885)] [[Website]( https://embodiedcity.github.io/AirScape/)] * **Geometry Forcing**: "Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.07982)] [[Website](https://GeometryForcing.github.io)] * **Martian World Models**: "Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.07978)] [[Website](https://marsgenai.github.io)] * "What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2507.06952)] * "Critiques of World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.05169)] * "When do World Models Successfully Learn Dynamical Systems?", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04898)] * **WebSynthesis**: "WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04370)] * "Accurate and Efficient World Modeling with Masked Latent Transformers", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04075)] * **Dyn-O**: "Dyn-O: Building Structured World Models with Object-Centric Representations", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.03298)] * **NavMorph**: "NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments", **`ICCV 2025`**. [[Paper](https://arxiv.org/abs/2506.19055)] [[Code](https://github.com/Feliciaxyao/NavMorph)] * "A “Good” Regulator May Provide a World Model for Intelligent Systems", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23032)] * **Xray2Xray**: "Xray2Xray: World Model from Chest X-rays with Volumetric Context", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.19055)] * **MATWM**: "Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.18537)] * "Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.16584)] * "Efficient Generation of Diverse Cooperative Agents with World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.07450)] * **WorldLLM**: "WorldLLM: Improving LLMs' world modeling using curiosity-driven theory-making", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06725)] * "LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06355)] * "Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06006)] * "Video World Models with Long-term Spatial Memory", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.05284)] [[Website](https://spmem.github.io/)] * **DSG-World**: "DSG-World: Learning a 3D Gaussian World Model from Dual State Videos", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.05217)] * "Safe Planning and Policy Optimization via World Model Learning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.04828)] * **FOLIAGE**: "FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.03173)] * "Linear Spatial World Models Emerge in Large Language Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.02996)] [[Code](https://github.com/matthieu-perso/spatial_world_models)] * **Simple, Good, Fast**: "Simple, Good, Fast: Self-Supervised World Models Free of Baggage", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2506.02612)] [[Code](https://github.com/jrobine/sgf)] * **Medical World Model**: "Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.02327)] * "General agents need world models", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2506.01622)] * "Learning Abstract World Models with a Group-Structured Latent Space", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01529)] * **DeepVerse**: "DeepVerse: 4D Autoregressive Video Generation as a World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01103)] * "World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.00417)] * **Dyna-Think**: "Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.00320)] * **StateSpaceDiffuser**: "StateSpaceDiffuser: Bringing Long Context to Diffusion World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.22246)] * "Learning World Models for Interactive Video Generation", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.21996)] * "Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.21906)] * "Long-Context State-Space Video World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.20171)] [[Website](https://ryanpo.com/ssm_wm)] * "Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16422)] * "World Models as Reference Trajectories for Rapid Motor Adaptation", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.15589)] * "Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.13709)] * "Building spatial world models from sparse transitional episodic memories", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.13696)] * **PoE-World**: "PoE-World: Compositional World Modeling with Products of Programmatic Experts", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.10819)] [[Website](https://topwasu.github.io/poe-world)] * "Explainable Reinforcement Learning Agents Using World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.08073)] * **seq-JEPA**: "seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.03176)] * "Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.02228)] * "Learning Local Causal World Models with State Space Models and Attention", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.02074)] * **WebEvolver**: "WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.21024)] * **WALL-E 2.0**: "WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.15785)] [[Code](https://github.com/elated-sawyer/WALL-E)] * **ViMo**: "ViMo: A Generative Visual GUI World Model for App Agent", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.13936)] * "Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning", **`SIGIR 2025`**. [[Paper](https://arxiv.org/abs/2504.13643)] * **CheXWorld**: "CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.13820)] [[Code](https://github.com/LeapLabTHU/CheXWorld)] * **EchoWorld**: "EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.13065)] [[Code](https://github.com/LeapLabTHU/EchoWorld)] * "Adapting a World Model for Trajectory Following in a 3D Game", **`ICLR 2025 Workshop on World Models`**. [[Paper](https://arxiv.org/abs/2504.12299)] * **MineWorld**: "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.07257)] [[Website](https://aka.ms/mineworld)] * **MoSim**: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.07095)] * "Improving World Models using Deep Supervision with Linear Probes", **`ICLR 2025 Workshop on World Models`**. [[Paper](https://arxiv.org/abs/2504.03861)] * "Decentralized Collective World Model for Emergent Communication and Coordination", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.03353)] * "Adapting World Models with Latent-State Dynamics Residuals", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.02252)] * "Can Test-Time Scaling Improve World Foundation Model?", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.24320)] [[Code](https://github.com/Mia-Cong/SWIFT.git)] * "Synthesizing world models for bilevel planning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.20124)] * "Long-context autoregressive video modeling with next-frame prediction", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.19325)] [[Code](https://github.com/showlab/FAR)] [[Website](https://farlongctx.github.io/)] * **Aether**: "Aether: Geometric-Aware Unified World Modeling", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.18945)] [[Website](https://aether-world.github.io/)] * **FUSDREAMER**: "FUSDREAMER: Label-efficient Remote Sensing World Model for Multimodal Data Classification", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13814)] [[Website](https://github.com/Cimy-wang/FusDreamer)] * "Inter-environmental world modeling for continuous and compositional dynamics", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.09911)] * **Disentangled World Models**: "Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.08751)] * "Revisiting the Othello World Model Hypothesis", **`ICLR World Models Workshop`**. [[Paper](https://arxiv.org/abs/2503.04421)] * "Learning Transformer-based World Models with Contrastive Predictive Coding", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.04416)] * "Surgical Vision World Model", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02904)] * "World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02552)] * **WMNav**: "WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02247)] [[Website](https://b0b8k1ng.github.io/WMNav/)] * **SENSEI**: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.01584)] [[Website](https://sites.google.com/view/sensei-paper)] * "Learning Actionable World Models for Industrial Process Control", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.00713)] * "Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.00713)] * "Discrete Codebook World Models for Continuous Control", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2503.00653)] * **Multimodal Dreaming**: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.21142)] * "Generalist World Model Pre-Training for Efficient Reinforcement Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.19544)] * "Learning To Explore With Predictive World Model Via Self-Supervised Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.13200)] * **M^3**: "M^3: A Modular World Model over Streams of Tokens", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.11537)] * "When do neural networks learn world models?", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.09297)] * "Pre-Trained Video Generative Models as World Simulators", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.07825)] * **DMWM**: "DMWM: Dual-Mind World Model with Long-Term Imagination", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.07591)] * **EvoAgent**: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.05907)] * "Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.05857)] * "Generating Symbolic World Models via Test-time Scaling of Large Language Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.04728)] [[Website](https://vmlpddl.github.io/)] * "Improving Transformer World Models for Data-Efficient RL", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.01591)] * "Trajectory World Models for Heterogeneous Environments", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.01366)] * "Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.00466)] * "Objects matter: object-centric world models improve reinforcement learning in visually complex environments", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.16443)] * **GLAM**: "GLAM: Global-Local Variation Awareness in Mamba-based World Model", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.11949)] * **GAWM**: "GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.10116)] * "Generative Emergent Communication: Large Language Model is a Collective World Model", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.00226)] * "Towards Unraveling and Improving Generalization in World Models", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.00195)] * "Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.12870)] * "Transformers Use Causal World Models in Maze-Solving Tasks", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.11867)] * "Causal World Representation in the GPT Model", **`NIPS 2024 Workshop`**. [[Paper](https://arxiv.org/abs/2412.07446)] * **Owl-1**: "Owl-1: Omni World Model for Consistent Long Video Generation", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.09600)] * "Navigation World Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.03572)] [[Website](https://www.amirbar.net/nwm/)] * "Evaluating World Models with LLM for Decision Making", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.08794)] * **LLMPhy**: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.08027)] * **WebDreamer**: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.06559)] [[Code](https://github.com/OSU-NLP-Group/WebDreamer)] * "Scaling Laws for Pre-training Agents and World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.04434)] * **DINO-WM**: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.04983)] [[Website](https://dino-wm.github.io/)] * "Learning World Models for Unconstrained Goal Navigation", **`NIPS 2024`**. [[Paper](https://arxiv.org/abs/2411.02446)] * "How Far is Video Generation from World Model: A Physical Law Perspective", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.02385)] [[Website](https://phyworld.github.io/)] [[Code](https://github.com/phyworld/phyworld)] * **Adaptive World Models**: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity", **`NIPS 2024 Workshop Adaptive Foundation Models`**. [[Paper](https://arxiv.org/abs/2411.01342)] * **LLMCWM**: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.19923)] [[Code](https://github.com/j0hngou/LLMCWM/)] * "Reward-free World Models for Online Imitation Learning", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.14081)] * "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.13232)] * **AVID**: "AVID: Adapting Video Diffusion Models to World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.12822)] [[Code](https://github.com/microsoft/causica/tree/main/research_experiments/avid)] * **SMAC**: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2410.02664)] * **OSWM**: "One-shot World Models Using a Transformer Trained on a Synthetic Prior", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.14084)] * "Making Large Language Models into World Models with Precondition and Effect Knowledge", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.12278)] * "Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.11816)] * **MoReFree**: "World Models Increase Autonomy in Reinforcement Learning", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.09807)] [[Project](https://sites.google.com/view/morefree)] * **UrbanWorld**: "UrbanWorld: An Urban World Model for 3D City Generation", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.11965)] * **PWM**: "PWM: Policy Learning with Large World Models", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.02466)] [[Code](https://www.imgeorgiev.com/pwm/)] * "Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.02446)] * **GenRL**: "GenRL: Multimodal foundation world models for generalist embodied agents", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.18043)] [[Code](https://github.com/mazpie/genrl)] * **DLLM**: "World Models with Hints of Large Language Models for Goal Achieving", **`arXiv 2024.06`**. [[Paper](http://arxiv.org/pdf/2406.07381)] * "Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.15275)] * **CoDreamer**: "CoDreamer: Communication-Based Decentralised World Models", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.13600)] * **Pandora**: "Pandora: Towards General World Model with Natural Language Actions and Video States", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.09455)] [[Code](https://github.com/maitrix-org/Pandora)] * **EBWM**: "Cognitively Inspired Energy-Based World Models", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.08862)] * "Evaluating the World Model Implicit in a Generative Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.03689)] [[Code](https://github.com/keyonvafa/world-model-evaluation)] * "Transformers and Slot Encoding for Sample Efficient Physical World Modelling", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.20180)] [[Code](https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm)] * **Puppeteer**: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.18418)] [[Code](https://nicklashansen.com/rlpuppeteer)] * **BWArea Model**: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.17039)] * **WKM**: "Agent Planning with World Knowledge Model", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.14205)] [[Code](https://github.com/zjunlp/WKM)] * **Diamond**: "Diffusion for World Modeling: Visual Details Matter in Atari", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.12399)] [[Code](https://github.com/eloialonso/diamond)] * "Compete and Compose: Learning Independent Mechanisms for Modular World Models", **`arXiv 2024.04`**. [[Paper](https://arxiv.org/abs/2404.15109)] * "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.10967)] [[Code](https://github.com/sai-prasanna/dreaming_of_many_worlds)] * **V-JEPA**: "V-JEPA: Video Joint Embedding Predictive Architecture", **`Meta AI`**. [[Blog](https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/)] [[Paper](https://ai.meta.com/research/publications/revisiting-feature-prediction-for-learning-visual-representations-from-video/)] [[Code](https://github.com/facebookresearch/jepa)] * **IWM**: "Learning and Leveraging World Models in Visual Representation Learning", **`Meta AI`**. [[Paper](https://arxiv.org/abs/2403.00504)] * **Genie**: "Genie: Generative Interactive Environments", **`DeepMind`**. [[Paper](https://arxiv.org/abs/2402.15391)] [[Blog](https://sites.google.com/view/genie-2024/home)] * **Sora**: "Video generation models as world simulators", **`OpenAI`**. [[Technical report](https://openai.com/research/video-generation-models-as-world-simulators)] * **LWM**: "World Model on Million-Length Video And Language With RingAttention", **`arXiv 2024.02`**. [[Paper](https://arxiv.org/abs/2402.08268)] [[Code](https://github.com/LargeWorldModel/LWM)] * "Planning with an Ensemble of World Models", **`OpenReview`**. [[Paper](https://openreview.net/forum?id=cvGdPXaydP)] * **WorldDreamer**: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens", **`arXiv 2024.01`**. [[Paper](https://arxiv.org/abs/2401.09985)] [[Code](https://github.com/JeffWang987/WorldDreamer)] * **CWM**: "Understanding Physical Dynamics with Counterfactual World Modeling", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03523.pdf)] [[Code](https://neuroailab.github.io/cwm-physics/)] * **Δ-IRIS**: "Efficient World Models with Context-Aware Tokenization", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2406.19320)] [[Code](https://github.com/vmicheli/delta-iris)] * **LLM-Sim**: "Can Language Models Serve as Text-Based World Simulators?", **`ACL`**. [[Paper](https://arxiv.org/abs/2406.06485)] [[Code](https://github.com/cognitiveailab/GPT-simulator)] * **AD3**: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2403.09976)] * **MAMBA**: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2403.09859)] [[Code](https://github.com/zoharri/mamba)] * **R2I**: "Mastering Memory Tasks with World Models", **`ICLR 2024`**. [[Paper](http://arxiv.org/pdf/2403.04253)] [[Website](https://recall2imagine.github.io/)] [[Code](https://github.com/chandar-lab/Recall2Imagine)] * **HarmonyDream**: "HarmonyDream: Task Harmonization Inside World Models", **`ICML 2024`**. [[Paper](https://openreview.net/forum?id=x0yIaw2fgk)] [[Code](https://github.com/thuml/HarmonyDream)] * **REM**: "Improving Token-Based World Models with Parallel Observation Prediction", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2402.05643)] [[Code](https://github.com/leor-c/REM)] * "Do Transformer World Models Give Better Policy Gradients?"", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2402.05290)] * **DreamSmooth**: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing", **`ICLR 2024`**. [[Paper](https://arxiv.org/pdf/2311.01450)] * **TD-MPC2**: "TD-MPC2: Scalable, Robust World Models for Continuous Control", **`ICLR 2024`**. [[Paper](https://arxiv.org/pdf/2310.16828)] [[Torch Code](https://github.com/nicklashansen/tdmpc2)] * **Hieros**: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2310.05167)] * **CoWorld**: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2305.15260)] --- ## World Models for Embodied AI * **PhysicalAgent**: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.13903)] * "Empowering Multi-Robot Cooperation via Sequential World Models", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.13095)] * "World Model Implanting for Test-time Adaptation of Embodied Agents", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2509.03956)] * "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/pdf/2508.20840)] [[Website](https://qiaosun22.github.io/PrimitiveWorld/)] * **GWM**: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation", **`ICCV 2025`**. [[Paper](https://arxiv.org/abs/2508.17600)] [[Website](https://gaussian-world-model.github.io/)] * "Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06990)] * "Bounding Distributional Shifts in World Modeling through Novelty Detection", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06096)] * **Genie Envisioner**: "Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.05635)] [[Website](https://genie-envisioner.github.io/)] * **DiWA**: "DiWA: Diffusion Policy Adaptation with World Models", **`CoRL 2025`**. [[Paper](https://arxiv.org/abs/2508.03645)] [[Code](https://diwa.cs.uni-freiburg.de)] * **CoEx**: "CoEx -- Co-evolving World-model and Exploration", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.22281)] * "Latent Policy Steering with Embodiment-Agnostic Pretrained World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.13340)] * **MindJourney**: "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.12508)] [[Website](https://umass-embodied-agi.github.io/MindJourney)] * **FOUNDER**: "FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2507.12496)] [[Website](https://sites.google.com/view/founder-rl)] * **EmbodieDreamer**: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/pdf/2507.05198)] [[Website](https://embodiedreamer.github.io/)] * **World4Omni**: "World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23919)] [[Website](https://world4omni.github.io/)] * **RoboScape**: "RoboScape: Physics-informed Embodied World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23135)] [[Code](https://github.com/tsinghua-fib-lab/RoboScape)] * **ParticleFormer**: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23126)] [[Website](https://particleformer.github.io/)] * **ManiGaussian++**: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.19842)] [[Code](https://github.com/April-Yz/ManiGaussian_Bimanual)] * **ReOI**: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.16565)] * **GAF**: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Mlanipulation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.14135)] [[Website](http://chaiying1.github.io/GAF.github.io/project_page/)] * "Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins", **`RSS 2025`**. [[Paper](https://arxiv.org/abs/2506.13761)] [[Website](https://prompting-with-the-future.github.io/)] * **V-JEPA 2 and V-JEPA 2-AC**: "V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.09985)] [[Website](https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/)] [[Code](https://github.com/facebookresearch/vjepa2)] * "Time-Aware World Model for Adaptive Prediction and Control", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2506.08441)] * **3DFlowAction**: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06199)] * **ORV**: "ORV: 4D Occupancy-centric Robot Video Generation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.03079)] [[Code](https://github.com/OrangeSodahub/ORV)] [[Website](https://orangesodahub.github.io/ORV/)] * **WoMAP**: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01600)] * "Sparse Imagination for Efficient Visual World Model Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01392)] * **Humanoid World Models**: "Humanoid World Models: Open World Foundation Models for Humanoid Robotics", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01182)] * "Evaluating Robot Policies in a World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.00613)] [[Website](https://world-model-eval.github.io)] * **OSVI-WM**: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.20425)] * **WorldEval**: "WorldEval: World Model as Real-World Robot Policies Evaluator", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.19017)] [[Website](https://worldeval.github.io)] * "Consistent World Models via Foresight Diffusion", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16474)] * **Vid2World**: "Vid2World: Crafting Video Diffusion Models to Interactive World Models", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.14357)] [[Website](http://knightnemo.github.io/vid2world/)] * **RLVR-World**: "RLVR-World: Training World Models with Reinforcement Learning", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.13934)] [[Website]( https://thuml.github.io/RLVR-World/)] [[Code](https://github.com/thuml/RLVR-World)] * **LaDi-WM**: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.11528)] * **FlowDreamer**: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.10075)] [[Website](https://sharinka0715.github.io/FlowDreamer/)] * "Occupancy World Model for Robots", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.05512)] * "Learning 3D Persistent Embodied World Models", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.05495)] * **TesserAct**: "TesserAct: Learning 4D Embodied World Models", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.20995)] [[Website](https://tesseractworld.github.io/)] * **PIN-WM**: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16693)] * "Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16680)] * **ManipDreamer**: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16464)] * **UWM**: "Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.02792)] [[Website](https://weirdlabuw.github.io/uwm/)] * "Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.20425)] * **AdaWorld**: "AdaWorld: Learning Adaptable World Models with Latent Actions", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.18938)] [[Website](https://adaptable-world-model.github.io/)] * **DyWA**: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.16806)] [[Website](https://pku-epic.github.io/DyWA/)] * "Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.12531)] [[Website](https://mkturkcan.github.io/suturingmodels/)] * "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.10480)] * **LUMOS**: "LUMOS: Language-Conditioned Imitation Learning with World Models", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2503.10370)] [[Website](http://lumos.cs.uni-freiburg.de/)] * "Object-Centric World Model for Language-Guided Manipulation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.06170)] * **DEMO^3**: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.01837)] [[Website](https://adrialopezescoriza.github.io/demo3/)] * "Accelerating Model-Based Reinforcement Learning with State-Space World Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.20168)] * "Learning Humanoid Locomotion with World Model Reconstruction", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.16230)] * "Strengthening Generative Robot Policies through Predictive World Modeling", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.00622)] [[Website](https://computationalrobotics.seas.harvard.edu/GPC)] * **Robotic World Model**: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.10100)] * **RoboHorizon**: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.06605)] * **Dream to Manipulate**: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.14957)] [[Website](https://leobarcellona.github.io/DreamToManipulate/)] * **WHALE**: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.05619)] * **VisualPredicator**: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.23156)] * "Multi-Task Interactive Robot Fleet Learning with Visual World Models", **`CoRL 2024`**. [[Paper](https://arxiv.org/abs/2410.22689)] [[Code](https://ut-austin-rpl.github.io/sirius-fleet/)] * **X-MOBILITY**: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.17491)] * **PIVOT-R**: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/pdf/2410.10394)] * **GLIMO**: "Grounding Large Language Models In Embodied Environment With Imperfect World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.02664)] * **EVA**: "EVA: An Embodied World Model for Future Video Anticipation", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.15461)] [[Website](https://sites.google.com/view/eva-publi)] * **PreLAR**: "PreLAR: World Model Pre-training with Learnable Action Representation", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03363.pdf)] [[Code](https://github.com/zhanglixuan0720/PreLAR)] * **WMP**: "World Model-based Perception for Visual Legged Locomotion", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.16784)] [[Project](https://wmp-loco.github.io/)] * **R-AIF**: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.14216)] * "Representing Positional Information in Generative World Models for Object Manipulation" **`arXiv 2024.09`** [[Paper](https://arxiv.org/abs/2409.12005)] * **DexSim2Real$^2$**: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.08750)] * **DWL**: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning", **`RSS 2024 (Best Paper Award Finalist)`**. [[Paper](https://arxiv.org/abs/2408.14472)] * "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.10788)] [[Website](https://embodied-gaussians.github.io/)] * **HRSSM**: "Learning Latent Dynamic Robust Representations for World Models", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2405.06263)] [[Code](https://github.com/bit1029public/HRSSM)] * **RoboDreamer**: "RoboDreamer: Learning Compositional World Models for Robot Imagination", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2404.12377)] [[Code](https://robovideo.github.io/)] * **COMBO**: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2404.10775)] [[Website](https://vis-www.cs.umass.edu/combo/)] [[Code](https://github.com/UMass-Foundation-Model/COMBO)] * **ManiGaussian**: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.08321)] [[Code](https://guanxinglu.github.io/ManiGaussian/)] --- ## World Models for VLA * **PAR**: "Physical Autoregressive Model for Robotic Manipulation without Action Pretraining", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.09822)] [[Website](https://songzijian1999.github.io/PAR_ProjectPage/)] * **DreamVLA**: "DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04447)] [[Code](https://github.com/Zhangwenyao1/DreamVLA)] [[Website](https://zhangwenyao1.github.io/DreamVLA/)] * **WorldVLA**: "WorldVLA: Towards Autoregressive Action World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.21539)] [[Code](https://github.com/alibaba-damo-academy/WorldVLA)] * **UniVLA**: "UniVLA: Unified Vision-Language-Action Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.19850)] [[Code](https://robertwyq.github.io/univla.github.)] * **MinD**: "MinD: Unified Visual Imagination and Control via Hierarchical World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.18897)] [[Website](https://manipulate-in-dream.github.io/)] * **FLARE**: "FLARE: Robot Learning with Implicit World Modeling", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.15659)] [[Code](https://github.com/NVIDIA/Isaac-GR00T)] [[Website](https://research.nvidia.com/labs/gear/flare)] * **DreamGen**: "DreamGen: Unlocking Generalization in Robot Learning through Video World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2505.12705)] [[Code](https://github.com/nvidia/GR00T-dreams)] * **CoT-VLA**: "CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2501.18867)] * **UP-VLA**: "UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2503.22020)] [[Code](https://github.com/CladernyJorn/UP-VLA)] * **3D-VLA**: "3D-VLA: A 3D Vision-Language-Action Generative World Model", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2403.09631)] --- ## World Models for Autonomous Driving ### Refer to https://github.com/LMD0311/Awesome-World-Model * **TeraSim-World**: "TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.13164)] [[Website](https://wjiawei.com/terasim-world-web/)] * "Enhancing Physical Consistency in Lightweight World Models", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.12437)] * **OccTENS**: "OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.03887)] * **IRL-VLA**: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model", **`arXiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06571)] [[Website](https://lidarcrafter.github.io)] [[Code](https://github.com/lidarcrafter/toolkit)] * **LiDARCrafter**: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences", **`arXiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.03692)] [[Website](https://lidarcrafter.github.io)] [[Code](https://github.com/lidarcrafter/toolkit)] * **FASTopoWM**: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.23325)] [[Code](https://github.com/YimingYang23/FASTopoWM)] * **Orbis**: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.13162)] [[Code](https://lmb-freiburg.github.io/orbis.github.io/)] * "World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.12762)] * **NRSeg**: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04002)] [[Code](https://github.com/lynn-yu/NRSeg)] * **World4Drive**: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model", **`ICCV2025`**. [[Paper](https://arxiv.org/abs/2507.00603)] [[Code](https://github.com/ucaszyp/World4Drive)] * **Epona**: "Epona: Autoregressive Diffusion World Model for Autonomous Driving", **`ICCV2025`**. [[Paper](https://arxiv.org/abs/2506.24113)] [[Code](https://kevin-thu.github.io/Epona/)] * "Towards foundational LiDAR world models with efficient latent flow matching", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23434)] * **SceneDiffuser++**: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2506.21976)] * **COME**: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.13260)] [[Code](https://github.com/synsin0/COME)] * **STAGE**: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.13138)] * **ReSim**: "ReSim: Reliable World Simulation for Autonomous Driving", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.09981)] [[Code](https://github.com/OpenDriveLab/ReSim)] [[Project Page](https://opendrivelab.com/ReSim)] * "Ego-centric Learning of Communicative World Models for Autonomous Driving", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.08149)] * **Dreamland**: "Dreamland: Controllable World Creation with Simulator and Generative Models", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.08006)] [[Project Page](https://metadriverse.github.io/dreamland/)] * **LongDWM**: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01546)] [[Project Page](https://wang-xiaodong1899.github.io/longdwm/)] * **GeoDrive**: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.22421)] [[Code](https://github.com/antonioo-c/GeoDrive)] * **FutureSightDrive**: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving", **`NeurIPS 2025`**. [[Paper](https://arxiv.org/abs/2505.17685)] [[Code](https://github.com/MIV-XJTU/FSDrive)] * **Raw2Drive**: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16394)] * **VL-SAFE**: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16377)] [[Project Page](https://ys-qu.github.io/vlsafe-website/)] * **PosePilot**: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.01729)] * "World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.01712)] * "Learning to Drive from a World Model", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.19077)] * **DriVerse**: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.18576)] * "End-to-End Driving with Online Trajectory Evaluation via BEV World Model", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.01941)] [[Code](https://github.com/liyingyanUCAS/WoTE)] * "Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.21232)] * **MiLA**: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.15875)] [[Project Page](https://github.com/xiaomi-mlab/mila.github.io)] * **SimWorld**: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13952)] [[Project Page](https://github.com/Li-Zn-H/SimWorld)] * **UniFuture**: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13587)] [[Project Page](https://github.com/dk-liang/UniFuture)] * **EOT-WM**: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.09215)] * "Temporal Triplane Transformers as Occupancy World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.07338)] * **InDRiVE**: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2503.05573)] * **MaskGWM**: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.11663)] * **Dream to Drive**: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.10012)] * "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2502.07309)] * "Dream to Drive with Predictive Individual World Model", **`IEEE TIV`**. [[Paper](https://arxiv.org/abs/2501.16733)] [[Code](https://github.com/gaoyinfeng/PIWM)] * **HERMES**: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.14729)] * **AdaWM**: "AdaWM: Adaptive World Model based Planning for Autonomous Driving", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2501.13072)] * **AD-L-JEPA**: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.04969)] * **DrivingWorld**: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.19505)] [[Code](https://github.com/YvanYin/DrivingWorld)] [[Project Page](https://huxiaotaostasy.github.io/DrivingWorld/index.html)] * **DrivingGPT**: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.18607)] [[Project Page](https://rogerchern.github.io/DrivingGPT/)] * "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.13772)] * **GEM**: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.11198)] [[Project Page](https://vita-epfl.github.io/GEM.github.io/)] * **GaussianWorld**: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.04380)] [[Code](https://github.com/zuosc19/GaussianWorld)] * **Doe-1**: "Doe-1: Closed-Loop Autonomous Driving with Large World Model", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.09627)] [[Project Page](https://wzzheng.net/Doe/)] [[Code](https://github.com/wzzheng/Doe)] * "Pysical Informed Driving World Model", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.08410)] [[Project Page](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html)] * **InfiniCube**: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.03934)] [[Project Page](https://research.nvidia.com/labs/toronto-ai/infinicube/)] * **InfinityDrive**: "InfinityDrive: Breaking Time Limits in Driving World Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.01522)] [[Project Page](https://metadrivescape.github.io/papers_project/InfinityDrive/page.html)] * **ReconDreamer**: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.19548)] [[Project Page](https://recondreamer.github.io/)] * **Imagine-2-Drive**: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2411.10171)] [[Project Page](https://anantagrg.github.io/Imagine-2-Drive.github.io/)] * **DynamicCity**: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes", **`ICLR 2025 Spotlight`**. [[Paper](https://arxiv.org/abs/2410.18084)] [[Project Page](https://dynamic-city.github.io)] [[Code](https://github.com/3DTopia/DynamicCity)] * **DriveDreamer4D**: "World Models Are Effective Data Machines for 4D Driving Scene Representation", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.13571)] [[Project Page](https://drivedreamer4d.github.io/)] * **DOME**: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.10429)] [[Project Page](https://gusongen.github.io/DOME)] * **SSR**: "Does End-to-End Autonomous Driving Really Need Perception Tasks?", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.18341)] [[Code](https://github.com/PeidongLi/SSR)] * "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.16663)] * **LatentDriver**: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.15730)] [[Code](https://github.com/Sephirex-X/LatentDriver)] * **RenderWorld**: "World Model with Self-Supervised 3D Label", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.11356)] * **OccLLaMA**: "An Occupancy-Language-Action Generative World Model for Autonomous Driving", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.03272)] * **DriveGenVLM**: "Real-world Video Generation for Vision Language Model based Autonomous Driving", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.16647)] * **Drive-OccWorld**: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.14197)] * **CarFormer**: "Self-Driving with Learned Object-Centric Representations", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2407.15843)] [[Code](https://kuis-ai.github.io/CarFormer/)] * **BEVWorld**: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.05679)] [[Code](https://github.com/zympsyche/BevWorld)] * **TOKEN**: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.00959)] * **UMAD**: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.06370)] * **SimGen**: "Simulator-conditioned Driving Scene Generation", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.09386)] [[Code](https://metadriverse.github.io/simgen/)] * **AdaptiveDriver**: "Planning with Adaptive World Models for Autonomous Driving", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.10714)] [[Code](https://arunbalajeev.github.io/world_models_planning/world_model_paper.html)] * **UnO**: "Unsupervised Occupancy Fields for Perception and Forecasting", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2406.08691)] [[Code](https://waabi.ai/research/uno)] * **LAW**: "Enhancing End-to-End Autonomous Driving with Latent World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.08481)] [[Code](https://github.com/BraveGroup/LAW)] * **Delphi**: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.01349)] [[Code](https://github.com/westlake-autolab/Delphi)] * **OccSora**: "4D Occupancy Generation Models as World Simulators for Autonomous Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.20337)] [[Code](https://github.com/wzzheng/OccSora)] * **MagicDrive3D**: "Controllable 3D Generation for Any-View Rendering in Street Scenes", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.14475)] [[Code](https://gaoruiyuan.com/magicdrive3d/)] * **Vista**: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2405.17398)] [[Code](https://github.com/OpenDriveLab/Vista)] * **CarDreamer**: "Open-Source Learning Platform for World Model based Autonomous Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.09111)] [[Code](https://github.com/ucd-dare/CarDreamer)] * **DriveSim**: "Probing Multimodal LLMs as World Models for Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.05956)] [[Code](https://github.com/sreeramsa/DriveSim)] * **DriveWorld**: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2405.04390)] * **LidarDM**: "Generative LiDAR Simulation in a Generated World", **`arXiv 2024.04`**. [[Paper](https://arxiv.org/abs/2404.02903)] [[Code](https://github.com/vzyrianov/lidardm)] * **SubjectDrive**: "Scaling Generative Data in Autonomous Driving via Subject Control", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.19438)] [[Project](https://subjectdrive.github.io/)] * **DriveDreamer-2**: "LLM-Enhanced World Models for Diverse Driving Video Generation", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.06845)] [[Code](https://drivedreamer2.github.io/)] * **Think2Drive**: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2402.16720)] * **MARL-CCE**: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05085.pdf)] [[Code](https://github.com/qiaoguanren/MARL-CCE)] * **GenAD**: "Generalized Predictive Model for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2403.09630)] [[Data](https://github.com/OpenDriveLab/DriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)] * **GenAD**: "Generative End-to-End Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2402.11502)] [[Code](https://github.com/wzzheng/GenAD)] * **NeMo**: "Neural Volumetric World Models for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02571.pdf)] * **MARL-CCE**: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model", **`ECCV 2024`**. [[Code](https://github.com/qiaoguanren/MARL-CCE)] * **ViDAR**: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2312.17655)] [[Code](https://github.com/OpenDriveLab/ViDAR)] * **Drive-WM**: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.17918)] [[Code](https://github.com/BraveGroup/Drive-WM)] * **Cam4DOCC**: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.17663)] [[Code](https://github.com/haomo-ai/Cam4DOcc)] * **Panacea**: "Panoramic and Controllable Video Generation for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.16813)] [[Code](https://panacea-ad.github.io/)] * **OccWorld**: "Learning a 3D Occupancy World Model for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2311.16038)] [[Code](https://github.com/wzzheng/OccWorld)] * **Copilot4D**: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2311.01017)] * **DrivingDiffusion**: "Layout-Guided multi-view driving scene video generation with latent diffusion model", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2310.07771)] [[Code](https://github.com/shalfun/DrivingDiffusion)] * **SafeDreamer**: "Safe Reinforcement Learning with World Models", **`ICLR 2024`**. [[Paper](https://openreview.net/forum?id=tsE5HLYtYg)] [[Code](https://github.com/PKU-Alignment/SafeDreamer)] * **MagicDrive**: "Street View Generation with Diverse 3D Geometry Control", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2310.02601)] [[Code](https://github.com/cure-lab/MagicDrive)] * **DriveDreamer**: "Towards Real-world-driven World Models for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2309.09777)] [[Code](https://github.com/JeffWang987/DriveDreamer)] * **SEM2**: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model", **`TITS`**. [[Paper](https://ieeexplore.ieee.org/abstract/document/10538211/)] ---- ## Citation If you find this repository useful, please consider citing this list: ``` @misc{leo2024worldmodelspaperslist, title = {Awesome-World-Models}, author = {Leo Fan}, journal = {GitHub repository}, url = {https://github.com/leofan90/Awesome-World-Models}, year = {2024}, } ```