# Awesome-World-Models

**Repository Path**: AI4EarthLab/Awesome-World-Models

## Basic Information

- **Project Name**: Awesome-World-Models
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: BSD-3-Clause
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-25
- **Last Updated**: 2025-09-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Awesome World Models for Robotics [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

This repository provides a curated list of **papers for World Models for General Video Generation, Embodied AI, and Autonomous Driving**. Template from [Awesome-LLM-Robotics](https://github.com/GT-RIPL/Awesome-LLM-Robotics) and [Awesome-World-Model](https://github.com/LMD0311/Awesome-World-Model)<br>

#### Contributions are welcome! Please feel free to submit [pull requests](https://github.com/leofan90/Awesome-World-Models/blob/main/how-to-PR.md) or reach out via [email](mailto:chunkaifan-changetoat-stu-changetodot-pku--changetodot-changetoedu-changetocn) to add papers! <br>

If you find this repository useful, please consider [citing](#citation) and giving this list a star ⭐. Feel free to share it with others!

---
## Overview

  - [Foundation paper of World Model](#foundation-paper-of-world-model)
  - [Blog or Technical Report](#blog-or-technical-report)
  - [Surveys](#surveys)
  - [Benchmarks & Evaluation](#benchmarks-&-evaluation)
  - [General World Models](#general-world-models)
  - [World Models for Embodied AI](#world-models-for-embodied-ai)
  - [World Models for VLA](#world-models-for-VLA)  
  - [World Models for Autonomous Driving](#world-models-for-autonomous-driving)
  - [Citation](#citation)

---
## Foundation paper of World Model
* World Models, **`NIPS 2018 Oral`**. [[Paper](https://arxiv.org/abs/1803.10122)] [[Website](https://worldmodels.github.io/)] 

## Blog or Technical Report
* **`Matrix-Game 2.0`**, Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. [[Paper](https://arxiv.org/abs/2508.13009)] [[Website](https://matrix-game-v2.github.io/)]
* **`Matrix-3D`**, Matrix-3D: Omnidirectional Explorable 3D World Generation. [[Paper](https://arxiv.org/abs/2508.08086)] [[Website](https://matrix-3d.github.io)]
* **`HunyuanWorld 1.0`**, HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. [[Paper](https://arxiv.org/abs/2507.21809)] [[Website](https://3d-models.hunyuan.tencent.com/world/)] [[Code](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)] 
* What Does it Mean for a Neural Network to Learn a "World Model"?. [[Paper](https://arxiv.org/abs/2507.21513)]
* **`Matrix-Game`**, Matrix-Game: Interactive World Foundation Model. [[Paper](https://arxiv.org/abs/2506.18701)] [[Code](https://github.com/SkyworkAI/Matrix-Game)] 
* **`Cosmos-Drive-Dreams`**, Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models. [[Paper](https://arxiv.org/abs/2506.09042)] [[Website](https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams)] 
* **`GAIA-2`**, GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving. [[Paper](https://arxiv.org/abs/2503.20523)] [[Website](https://wayve.ai/thinking/gaia-2)]
* **`Cosmos`**, Cosmos World Foundation Model Platform for Physical AI. [[Paper](https://arxiv.org/abs/2501.03575)] [[Website](https://www.nvidia.com/en-us/ai/cosmos/)] [[Code](https://github.com/NVIDIA/Cosmos)]
* **`1X Technologies`**, 1X World Model. [[Blog](https://www.1x.tech/discover/1x-world-model)]
* **`Runway`**, Introducing General World Models. [[Blog](https://runwayml.com/research/introducing-general-world-models)]
* **`Wayve`**, Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy. [[Paper](https://arxiv.org/pdf/2309.17080)] [[Blog](https://wayve.ai/thinking/introducing-gaia1/)] 
* **`Yann LeCun`**, A Path Towards Autonomous Machine Intelligence. [[Paper](https://openreview.net/pdf?id=BZ5a1r-kVsf)]

## Surveys
* "3D and 4D World Modeling: A Survey", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.07996)]
* "Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges", **`arXiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.09561)]
* "A Survey: Learning Embodied Intelligence from Physical Simulators and World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.00917)] [[Code](https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey)]
* "Embodied AI Agents: Modeling the World", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.22355)] 
* "From 2D to 3D Cognition: A Brief Survey of General World Models", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.20134)] 
* "A Survey on World Models Grounded in Acoustic Physical Information", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.13833)] 
* "Exploring the Evolution of Physics Cognition in Video Generation: A Survey", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.21765)] [[Code](https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation)]
* "World Models in Artificial Intelligence: Sensing, Learning, and Reasoning Like a Child", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.15168)] 
* "Simulating the Real World: A Unified Survey of Multimodal Generative Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.04641)] [[Code](https://github.com/ALEEEHU/World-Simulator)]
* "Four Principles for Physically Interpretable World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02143)]
* "The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.10498)] [[Code](https://github.com/LMD0311/Awesome-World-Model)]
* "A Survey of World Models for Autonomous Driving", **`TPAMI`**. [[Paper](https://arxiv.org/abs/2501.11260)]
* "Understanding World or Predicting Future? A Comprehensive Survey of World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.14499)]
* "World Models: The Safety Perspective", **`ISSRE WDMD`**. [[Paper](https://arxiv.org/abs/2411.07690)]
* "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.02914)]
* "From Efficient Multimodal Models to World Models: A Survey", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.00118)]
* "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.06886)] [[Code](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)]
* "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.03520)] [[Code](https://github.com/GigaAI-research/General-World-Models-Survey)]
* "World Models for Autonomous Driving: An Initial Survey", **`TIV`**. [[Paper](https://arxiv.org/abs/2403.02622)]
* "A survey on multimodal large language models for autonomous driving", **`WACVW 2024`**. [[Paper](https://arxiv.org/abs/2311.12320)] [[Code](https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving)]

---
## Benchmarks & Evaluation
* **OmniWorld**: "OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.12201)] [[Website](https://yangzhou24.github.io/OmniWorld/)]
* "Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2508.01922)] 
* **WM-ABench**: "Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation", **`ACL 2025(Findings)`**. [[Paper](https://arxiv.org/abs/2506.21876)] [[Website](https://wm-abench.maitrix.org/)]
* **UNIVERSE**: "Adapting Vision-Language Models for Evaluating World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.17967)]
* **WorldPrediction**: "WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.04363)]
* "Toward Memory-Aided World Models: Benchmarking via Spatial Consistency", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.22976)] [[Datasets](https://huggingface.co/datasets/kevinLian/LoopNav)] [[Code](https://github.com/Kevin-lkw/LoopNav)]
* **SimWorld**: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene
Generation via World Model", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2503.13952)] [[Code](https://github.com/Li-Zn-H/SimWorld)]
* **EWMBench**: "EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.09694)] [[Code](https://github.com/AgibotTech/EWMBench)]
* "Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments", **`arxiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.08122)] 
* **WorldModelBench**: "WorldModelBench: Judging Video Generation Models As World Models", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2502.20694)] [[Website](https://worldmodelbench-team.github.io/)]
* **Text2World**: "Text2World: Benchmarking Large Language Models for Symbolic World Model Generation", **`arxiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.13092)] [[Website](https://text-to-world.github.io/)] 
* **ACT-Bench**: "ACT-Bench: Towards Action Controllable World Models for Autonomous Driving", **`arxiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.05337)]
* **WorldSimBench**: "WorldSimBench: Towards Video Generation Models as World Simulators", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.18072)] [[Website](https://iranqin.github.io/WorldSimBench.github.io/)] 
* **EVA**: "EVA: An Embodied World Model for Future Video Anticipation", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2410.15461)] [[Website](https://sites.google.com/view/eva-publi)] 
* **AeroVerse**: "AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models", **`arxiv 2024.08`**. [[Paper](https://arxiv.org/pdf/2408.15511)]
* **CityBench**: "CityBench: Evaluating the Capabilities of Large Language Model as World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.13945)] [[Code](https://github.com/tsinghua-fib-lab/CityBench)]
* "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models", **`NIPS 2023`**. [[Paper](https://arxiv.org/abs/2311.09064)]

---
## General World Models
* "World Modeling with Probabilistic Structure Integration", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.09737)]
* "One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.07945)] [[Code](https://github.com/opendilab/LightZero)]
* **LatticeWorld**: "LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.05263)]
* "Planning with Reasoning using Vision Language World Model", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.02722)]
* "Social World Models", **`arxiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.00559)]
* "Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.20294)]
* **HERO**: "HERO: Hierarchical Extrapolation and Refresh for Efficient World Models", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.17588)]
* "Scalable RF Simulation in Generative 4D Worlds", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.12176)]
* "Finite Automata Extraction: Low-data World Model Learning as Programs from Gameplay Video", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.11836)]
* "Visuomotor Grasping with World Models for Surgical Robots", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.11200)]
* "In-Context Reinforcement Learning via Communicative World Models", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06659)] [[Code](https://github.com/fernando-ml/CORAL )]
* **PIGDreamer**: "PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2508.02159)]
* **SimuRA**: "SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.23773)]
* "Back to the Features: DINO as a Foundation for Video World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.19468)] 
* **Yume**: "Yume: An Interactive World Generation Model", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.17744)] [[Website](https://stdstu12.github.io/YUME-Project/)] [[Code](https://github.com/stdstu12/YUME)]
* "LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.15521)]
* "Safety Certification in the Latent space using Control Barrier Functions and World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.13871)]
* "Assessing adaptive world models in machines with novel games", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.12821)]
* "Graph World Model", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2507.10539)] [[Website](https://github.com/ulab-uiuc/GWM)]
* **MobiWorld**: "MobiWorld: World Models for Mobile Wireless Network", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.09462)]
* "Continual Reinforcement Learning by Planning with Online World Models", **`ICML 2025 Spotlight`**. [[Paper](https://arxiv.org/abs/2507.09177)]
* **AirScape**: "AirScape: An Aerial Generative World Model with Motion Controllability", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.08885)] [[Website]( https://embodiedcity.github.io/AirScape/)]
* **Geometry Forcing**: "Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.07982)] [[Website](https://GeometryForcing.github.io)]
* **Martian World Models**: "Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.07978)] [[Website](https://marsgenai.github.io)]
* "What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2507.06952)]
* "Critiques of World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.05169)]
* "When do World Models Successfully Learn Dynamical Systems?", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04898)]
* **WebSynthesis**: "WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04370)]
* "Accurate and Efficient World Modeling with Masked Latent Transformers", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04075)]
* **Dyn-O**: "Dyn-O: Building Structured World Models with Object-Centric Representations", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.03298)]
* **NavMorph**: "NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments", **`ICCV 2025`**. [[Paper](https://arxiv.org/abs/2506.19055)] [[Code](https://github.com/Feliciaxyao/NavMorph)]
* "A “Good” Regulator May Provide a World Model for Intelligent Systems", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23032)]
* **Xray2Xray**: "Xray2Xray: World Model from Chest X-rays with Volumetric Context", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.19055)]
* **MATWM**: "Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.18537)]
* "Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.16584)]
* "Efficient Generation of Diverse Cooperative Agents with World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.07450)]
* **WorldLLM**: "WorldLLM: Improving LLMs' world modeling using curiosity-driven theory-making", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06725)]
* "LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06355)]
* "Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06006)]
* "Video World Models with Long-term Spatial Memory", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.05284)] [[Website](https://spmem.github.io/)]
* **DSG-World**: "DSG-World: Learning a 3D Gaussian World Model from Dual State Videos", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.05217)]
* "Safe Planning and Policy Optimization via World Model Learning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.04828)]
* **FOLIAGE**: "FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.03173)]
* "Linear Spatial World Models Emerge in Large Language Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.02996)] [[Code](https://github.com/matthieu-perso/spatial_world_models)]
* **Simple, Good, Fast**: "Simple, Good, Fast: Self-Supervised World Models Free of Baggage", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2506.02612)] [[Code](https://github.com/jrobine/sgf)]
* **Medical World Model**: "Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.02327)]
* "General agents need world models", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2506.01622)]
* "Learning Abstract World Models with a Group-Structured Latent Space", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01529)]
* **DeepVerse**: "DeepVerse: 4D Autoregressive Video Generation as a World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01103)]
* "World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.00417)]
* **Dyna-Think**: "Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.00320)]
* **StateSpaceDiffuser**: "StateSpaceDiffuser: Bringing Long Context to Diffusion World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.22246)]
* "Learning World Models for Interactive Video Generation", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.21996)] 
* "Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.21906)] 
* "Long-Context State-Space Video World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.20171)] [[Website](https://ryanpo.com/ssm_wm)]
* "Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16422)] 
* "World Models as Reference Trajectories for Rapid Motor Adaptation", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.15589)] 
* "Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.13709)] 
* "Building spatial world models from sparse transitional episodic memories", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.13696)] 
* **PoE-World**: "PoE-World: Compositional World Modeling with Products of Programmatic Experts", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.10819)] [[Website](https://topwasu.github.io/poe-world)]
* "Explainable Reinforcement Learning Agents Using World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.08073)] 
* **seq-JEPA**: "seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.03176)] 
* "Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.02228)] 
* "Learning Local Causal World Models with State Space Models and Attention", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.02074)] 
* **WebEvolver**: "WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.21024)] 
* **WALL-E 2.0**: "WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.15785)] [[Code](https://github.com/elated-sawyer/WALL-E)]
* **ViMo**: "ViMo: A Generative Visual GUI World Model for App Agent", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.13936)] 
* "Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning", **`SIGIR 2025`**. [[Paper](https://arxiv.org/abs/2504.13643)] 
* **CheXWorld**: "CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.13820)] [[Code](https://github.com/LeapLabTHU/CheXWorld)]
* **EchoWorld**: "EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.13065)] [[Code](https://github.com/LeapLabTHU/EchoWorld)]
* "Adapting a World Model for Trajectory Following in a 3D Game", **`ICLR 2025 Workshop on World Models`**. [[Paper](https://arxiv.org/abs/2504.12299)] 
* **MineWorld**: "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.07257)] [[Website](https://aka.ms/mineworld)]
* **MoSim**: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.07095)]
* "Improving World Models using Deep Supervision with Linear Probes", **`ICLR 2025 Workshop on World Models`**. [[Paper](https://arxiv.org/abs/2504.03861)]
* "Decentralized Collective World Model for Emergent Communication and Coordination", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.03353)]
* "Adapting World Models with Latent-State Dynamics Residuals", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.02252)]
* "Can Test-Time Scaling Improve World Foundation Model?", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.24320)] [[Code](https://github.com/Mia-Cong/SWIFT.git)]
* "Synthesizing world models for bilevel planning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.20124)] 
* "Long-context autoregressive video modeling with next-frame prediction", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.19325)] [[Code](https://github.com/showlab/FAR)] [[Website](https://farlongctx.github.io/)]
* **Aether**: "Aether: Geometric-Aware Unified World Modeling", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.18945)] [[Website](https://aether-world.github.io/)]
* **FUSDREAMER**: "FUSDREAMER: Label-efficient Remote Sensing World Model for Multimodal Data Classification", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13814)] [[Website](https://github.com/Cimy-wang/FusDreamer)]
* "Inter-environmental world modeling for continuous and compositional dynamics", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.09911)] 
* **Disentangled World Models**: "Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.08751)] 
* "Revisiting the Othello World Model Hypothesis", **`ICLR World Models Workshop`**. [[Paper](https://arxiv.org/abs/2503.04421)] 
* "Learning Transformer-based World Models with Contrastive Predictive Coding", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.04416)] 
* "Surgical Vision World Model", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02904)] 
* "World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02552)] 
* **WMNav**: "WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02247)] [[Website](https://b0b8k1ng.github.io/WMNav/)]
* **SENSEI**: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.01584)] [[Website](https://sites.google.com/view/sensei-paper)]
* "Learning Actionable World Models for Industrial Process Control", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.00713)]
* "Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.00713)]
* "Discrete Codebook World Models for Continuous Control", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2503.00653)]
* **Multimodal Dreaming**: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.21142)]
* "Generalist World Model Pre-Training for Efficient Reinforcement Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.19544)]
* "Learning To Explore With Predictive World Model Via Self-Supervised Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.13200)]
* **M^3**: "M^3: A Modular World Model over Streams of Tokens", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.11537)]
* "When do neural networks learn world models?", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.09297)]
* "Pre-Trained Video Generative Models as World Simulators", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.07825)]
* **DMWM**: "DMWM: Dual-Mind World Model with Long-Term Imagination", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.07591)]
* **EvoAgent**: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.05907)]
* "Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.05857)]
* "Generating Symbolic World Models via Test-time Scaling of Large Language Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.04728)] [[Website](https://vmlpddl.github.io/)]
* "Improving Transformer World Models for Data-Efficient RL", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.01591)]
* "Trajectory World Models for Heterogeneous Environments", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.01366)]
* "Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.00466)]
* "Objects matter: object-centric world models improve reinforcement learning in visually complex environments", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.16443)]
* **GLAM**: "GLAM: Global-Local Variation Awareness in Mamba-based World Model", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.11949)]
* **GAWM**: "GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.10116)]
* "Generative Emergent Communication: Large Language Model is a Collective World Model", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.00226)]
* "Towards Unraveling and Improving Generalization in World Models", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.00195)]
* "Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.12870)]
* "Transformers Use Causal World Models in Maze-Solving Tasks", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.11867)]
* "Causal World Representation in the GPT Model", **`NIPS 2024 Workshop`**. [[Paper](https://arxiv.org/abs/2412.07446)]
* **Owl-1**: "Owl-1: Omni World Model for Consistent Long Video Generation", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.09600)]
* "Navigation World Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.03572)] [[Website](https://www.amirbar.net/nwm/)]
* "Evaluating World Models with LLM for Decision Making", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.08794)] 
* **LLMPhy**: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.08027)] 
* **WebDreamer**: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.06559)] [[Code](https://github.com/OSU-NLP-Group/WebDreamer)]
* "Scaling Laws for Pre-training Agents and World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.04434)]
* **DINO-WM**: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.04983)] [[Website](https://dino-wm.github.io/)]
* "Learning World Models for Unconstrained Goal Navigation", **`NIPS 2024`**. [[Paper](https://arxiv.org/abs/2411.02446)]
* "How Far is Video Generation from World Model: A Physical Law Perspective", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.02385)] [[Website](https://phyworld.github.io/)] [[Code](https://github.com/phyworld/phyworld)]
* **Adaptive World Models**: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity", **`NIPS 2024 Workshop Adaptive Foundation Models`**. [[Paper](https://arxiv.org/abs/2411.01342)]
* **LLMCWM**: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.19923)] [[Code](https://github.com/j0hngou/LLMCWM/)]
* "Reward-free World Models for Online Imitation Learning", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.14081)]
* "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.13232)]
* **AVID**: "AVID: Adapting Video Diffusion Models to World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.12822)] [[Code](https://github.com/microsoft/causica/tree/main/research_experiments/avid)]
* **SMAC**: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2410.02664)]
* **OSWM**: "One-shot World Models Using a Transformer Trained on a Synthetic Prior", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.14084)]
* "Making Large Language Models into World Models with Precondition and Effect Knowledge", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.12278)]
* "Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.11816)]
* **MoReFree**: "World Models Increase Autonomy in Reinforcement Learning", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.09807)] [[Project](https://sites.google.com/view/morefree)]
* **UrbanWorld**: "UrbanWorld: An Urban World Model for 3D City Generation", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.11965)]
* **PWM**: "PWM: Policy Learning with Large World Models", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.02466)] [[Code](https://www.imgeorgiev.com/pwm/)]
* "Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.02446)]
* **GenRL**: "GenRL: Multimodal foundation world models for generalist embodied agents", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.18043)] [[Code](https://github.com/mazpie/genrl)]
* **DLLM**: "World Models with Hints of Large Language Models for Goal Achieving", **`arXiv 2024.06`**. [[Paper](http://arxiv.org/pdf/2406.07381)]
* "Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.15275)]
* **CoDreamer**: "CoDreamer: Communication-Based Decentralised World Models", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.13600)]
* **Pandora**: "Pandora: Towards General World Model with Natural Language Actions and Video States", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.09455)] [[Code](https://github.com/maitrix-org/Pandora)]
* **EBWM**: "Cognitively Inspired Energy-Based World Models", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.08862)]
* "Evaluating the World Model Implicit in a Generative Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.03689)] [[Code](https://github.com/keyonvafa/world-model-evaluation)]
* "Transformers and Slot Encoding for Sample Efficient Physical World Modelling", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.20180)] [[Code](https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm)]
* **Puppeteer**: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.18418)] [[Code](https://nicklashansen.com/rlpuppeteer)]
* **BWArea Model**: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.17039)]
* **WKM**: "Agent Planning with World Knowledge Model", **`arXiv 2024.05`**.  [[Paper](https://arxiv.org/abs/2405.14205)] [[Code](https://github.com/zjunlp/WKM)]
* **Diamond**: "Diffusion for World Modeling: Visual Details Matter in Atari", **`arXiv 2024.05`**.  [[Paper](https://arxiv.org/abs/2405.12399)] [[Code](https://github.com/eloialonso/diamond)]
* "Compete and Compose: Learning Independent Mechanisms for Modular World Models", **`arXiv 2024.04`**.  [[Paper](https://arxiv.org/abs/2404.15109)]
* "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization", **`arXiv 2024.03`**.  [[Paper](https://arxiv.org/abs/2403.10967)] [[Code](https://github.com/sai-prasanna/dreaming_of_many_worlds)]
* **V-JEPA**: "V-JEPA: Video Joint Embedding Predictive Architecture", **`Meta AI`**. [[Blog](https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/)] [[Paper](https://ai.meta.com/research/publications/revisiting-feature-prediction-for-learning-visual-representations-from-video/)] [[Code](https://github.com/facebookresearch/jepa)]
* **IWM**: "Learning and Leveraging World Models in Visual Representation Learning", **`Meta AI`**. [[Paper](https://arxiv.org/abs/2403.00504)] 
* **Genie**: "Genie: Generative Interactive Environments", **`DeepMind`**. [[Paper](https://arxiv.org/abs/2402.15391)] [[Blog](https://sites.google.com/view/genie-2024/home)]
* **Sora**: "Video generation models as world simulators", **`OpenAI`**. [[Technical report](https://openai.com/research/video-generation-models-as-world-simulators)]
* **LWM**: "World Model on Million-Length Video And Language With RingAttention", **`arXiv 2024.02`**.  [[Paper](https://arxiv.org/abs/2402.08268)] [[Code](https://github.com/LargeWorldModel/LWM)]
* "Planning with an Ensemble of World Models", **`OpenReview`**. [[Paper](https://openreview.net/forum?id=cvGdPXaydP)]
* **WorldDreamer**: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens", **`arXiv 2024.01`**. [[Paper](https://arxiv.org/abs/2401.09985)] [[Code](https://github.com/JeffWang987/WorldDreamer)]
* **CWM**: "Understanding Physical Dynamics with Counterfactual World Modeling", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03523.pdf)] [[Code](https://neuroailab.github.io/cwm-physics/)]
* **Δ-IRIS**: "Efficient World Models with Context-Aware Tokenization", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2406.19320)] [[Code](https://github.com/vmicheli/delta-iris)]
* **LLM-Sim**: "Can Language Models Serve as Text-Based World Simulators?", **`ACL`**. [[Paper](https://arxiv.org/abs/2406.06485)] [[Code](https://github.com/cognitiveailab/GPT-simulator)]
* **AD3**: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2403.09976)]
* **MAMBA**: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning", **`ICLR 2024`**.  [[Paper](https://arxiv.org/abs/2403.09859)] [[Code](https://github.com/zoharri/mamba)]
* **R2I**: "Mastering Memory Tasks with World Models", **`ICLR 2024`**. [[Paper](http://arxiv.org/pdf/2403.04253)] [[Website](https://recall2imagine.github.io/)] [[Code](https://github.com/chandar-lab/Recall2Imagine)]
* **HarmonyDream**: "HarmonyDream: Task Harmonization Inside World Models", **`ICML 2024`**. [[Paper](https://openreview.net/forum?id=x0yIaw2fgk)] [[Code](https://github.com/thuml/HarmonyDream)]
* **REM**: "Improving Token-Based World Models with Parallel Observation Prediction", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2402.05643)] [[Code](https://github.com/leor-c/REM)]
* "Do Transformer World Models Give Better Policy Gradients?"", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2402.05290)]
* **DreamSmooth**: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing", **`ICLR 2024`**. [[Paper](https://arxiv.org/pdf/2311.01450)]
* **TD-MPC2**: "TD-MPC2: Scalable, Robust World Models for Continuous Control", **`ICLR 2024`**. [[Paper](https://arxiv.org/pdf/2310.16828)] [[Torch Code](https://github.com/nicklashansen/tdmpc2)]
* **Hieros**: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2310.05167)]
* **CoWorld**: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2305.15260)]

---
## World Models for Embodied AI
* **PhysicalAgent**: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.13903)]
* "Empowering Multi-Robot Cooperation via Sequential World Models", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.13095)]
* "World Model Implanting for Test-time Adaptation of Embodied Agents", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2509.03956)]
* "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/pdf/2508.20840)] [[Website](https://qiaosun22.github.io/PrimitiveWorld/)]
* **GWM**: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation", **`ICCV 2025`**. [[Paper](https://arxiv.org/abs/2508.17600)] [[Website](https://gaussian-world-model.github.io/)]
* "Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06990)]
* "Bounding Distributional Shifts in World Modeling through Novelty Detection", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06096)]
* **Genie Envisioner**: "Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation", **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.05635)] [[Website](https://genie-envisioner.github.io/)]
* **DiWA**: "DiWA: Diffusion Policy Adaptation with World Models", **`CoRL 2025`**. [[Paper](https://arxiv.org/abs/2508.03645)] [[Code](https://diwa.cs.uni-freiburg.de)]
* **CoEx**: "CoEx -- Co-evolving World-model and Exploration", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.22281)]
* "Latent Policy Steering with Embodiment-Agnostic Pretrained World Models", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.13340)]
* **MindJourney**: "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.12508)] [[Website](https://umass-embodied-agi.github.io/MindJourney)]
* **FOUNDER**: "FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2507.12496)] [[Website](https://sites.google.com/view/founder-rl)]
* **EmbodieDreamer**: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/pdf/2507.05198)] [[Website](https://embodiedreamer.github.io/)]
* **World4Omni**: "World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23919)] [[Website](https://world4omni.github.io/)]
* **RoboScape**: "RoboScape: Physics-informed Embodied World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23135)] [[Code](https://github.com/tsinghua-fib-lab/RoboScape)]
* **ParticleFormer**: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23126)] [[Website](https://particleformer.github.io/)]
* **ManiGaussian++**: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.19842)] [[Code](https://github.com/April-Yz/ManiGaussian_Bimanual)]
* **ReOI**: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.16565)] 
* **GAF**: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Mlanipulation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.14135)] [[Website](http://chaiying1.github.io/GAF.github.io/project_page/)]
* "Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins", **`RSS 2025`**. [[Paper](https://arxiv.org/abs/2506.13761)] [[Website](https://prompting-with-the-future.github.io/)]
* **V-JEPA 2 and V-JEPA 2-AC**: "V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.09985)] [[Website](https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/)] [[Code](https://github.com/facebookresearch/vjepa2)]
* "Time-Aware World Model for Adaptive Prediction and Control", **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2506.08441)] 
* **3DFlowAction**: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.06199)] 
* **ORV**: "ORV: 4D Occupancy-centric Robot Video Generation", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.03079)] [[Code](https://github.com/OrangeSodahub/ORV)] [[Website](https://orangesodahub.github.io/ORV/)]
* **WoMAP**: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01600)] 
* "Sparse Imagination for Efficient Visual World Model Planning", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01392)]
* **Humanoid World Models**: "Humanoid World Models: Open World Foundation Models for Humanoid Robotics", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01182)] 
* "Evaluating Robot Policies in a World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.00613)] [[Website](https://world-model-eval.github.io)]
* **OSVI-WM**: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.20425)] 
* **WorldEval**: "WorldEval: World Model as Real-World Robot Policies Evaluator", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.19017)] [[Website](https://worldeval.github.io)]
* "Consistent World Models via Foresight Diffusion", **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16474)] 
* **Vid2World**: "Vid2World: Crafting Video Diffusion Models to Interactive World Models", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.14357)] [[Website](http://knightnemo.github.io/vid2world/)]
* **RLVR-World**: "RLVR-World: Training World Models with Reinforcement Learning", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.13934)] [[Website]( https://thuml.github.io/RLVR-World/)] [[Code](https://github.com/thuml/RLVR-World)]
* **LaDi-WM**: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.11528)]
* **FlowDreamer**: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.10075)] [[Website](https://sharinka0715.github.io/FlowDreamer/)]
* "Occupancy World Model for Robots", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.05512)]
* "Learning 3D Persistent Embodied World Models", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.05495)]
* **TesserAct**: "TesserAct: Learning 4D Embodied World Models", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.20995)] [[Website](https://tesseractworld.github.io/)]
* **PIN-WM**: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16693)] 
* "Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16680)] 
* **ManipDreamer**: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16464)] 
* **UWM**: "Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.02792)] [[Website](https://weirdlabuw.github.io/uwm/)]
* "Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.20425)] 
* **AdaWorld**: "AdaWorld: Learning Adaptable World Models with Latent Actions", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.18938)] [[Website](https://adaptable-world-model.github.io/)] 
* **DyWA**: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.16806)] [[Website](https://pku-epic.github.io/DyWA/)] 
* "Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.12531)] [[Website](https://mkturkcan.github.io/suturingmodels/)] 
* "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.10480)] 
* **LUMOS**: "LUMOS: Language-Conditioned Imitation Learning with World Models", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2503.10370)] [[Website](http://lumos.cs.uni-freiburg.de/)] 
* "Object-Centric World Model for Language-Guided Manipulation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.06170)] 
* **DEMO^3**: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.01837)] [[Website](https://adrialopezescoriza.github.io/demo3/)] 
* "Accelerating Model-Based Reinforcement Learning with State-Space World Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.20168)] 
* "Learning Humanoid Locomotion with World Model Reconstruction", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.16230)] 
* "Strengthening Generative Robot Policies through Predictive World Modeling", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.00622)] [[Website](https://computationalrobotics.seas.harvard.edu/GPC)] 
* **Robotic World Model**: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.10100)]
* **RoboHorizon**: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.06605)] 
* **Dream to Manipulate**: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.14957)] [[Website](https://leobarcellona.github.io/DreamToManipulate/)] 
* **WHALE**: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.05619)]
* **VisualPredicator**: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.23156)] 
* "Multi-Task Interactive Robot Fleet Learning with Visual World Models", **`CoRL 2024`**. [[Paper](https://arxiv.org/abs/2410.22689)] [[Code](https://ut-austin-rpl.github.io/sirius-fleet/)]
* **X-MOBILITY**: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.17491)]
* **PIVOT-R**: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/pdf/2410.10394)]
* **GLIMO**: "Grounding Large Language Models In Embodied Environment With Imperfect World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.02664)]
* **EVA**: "EVA: An Embodied World Model for Future Video Anticipation", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.15461)] [[Website](https://sites.google.com/view/eva-publi)] 
* **PreLAR**: "PreLAR: World Model Pre-training with Learnable Action Representation", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03363.pdf)] [[Code](https://github.com/zhanglixuan0720/PreLAR)]
* **WMP**: "World Model-based Perception for Visual Legged Locomotion", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.16784)] [[Project](https://wmp-loco.github.io/)]
* **R-AIF**: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.14216)]
* "Representing Positional Information in Generative World Models for Object Manipulation" **`arXiv 2024.09`** [[Paper](https://arxiv.org/abs/2409.12005)]
* **DexSim2Real$^2$**: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.08750)]
* **DWL**: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning", **`RSS 2024 (Best Paper Award Finalist)`**. [[Paper](https://arxiv.org/abs/2408.14472)]
* "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.10788)] [[Website](https://embodied-gaussians.github.io/)]
* **HRSSM**: "Learning Latent Dynamic Robust Representations for World Models", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2405.06263)] [[Code](https://github.com/bit1029public/HRSSM)]
* **RoboDreamer**: "RoboDreamer: Learning Compositional World Models for Robot Imagination", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2404.12377)] [[Code](https://robovideo.github.io/)]
* **COMBO**: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2404.10775)] [[Website](https://vis-www.cs.umass.edu/combo/)] [[Code](https://github.com/UMass-Foundation-Model/COMBO)]
* **ManiGaussian**: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", **`arXiv 2024.03`**.  [[Paper](https://arxiv.org/abs/2403.08321)] [[Code](https://guanxinglu.github.io/ManiGaussian/)]

---
## World Models for VLA
* **PAR**: "Physical Autoregressive Model for Robotic Manipulation without Action Pretraining",  **`arxiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.09822)] [[Website](https://songzijian1999.github.io/PAR_ProjectPage/)]
* **DreamVLA**: "DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge", **`arxiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04447)] [[Code](https://github.com/Zhangwenyao1/DreamVLA)] [[Website](https://zhangwenyao1.github.io/DreamVLA/)]
* **WorldVLA**: "WorldVLA: Towards Autoregressive Action World Model", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.21539)] [[Code](https://github.com/alibaba-damo-academy/WorldVLA)]
* **UniVLA**: "UniVLA: Unified Vision-Language-Action Model",  **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.19850)] [[Code](https://robertwyq.github.io/univla.github.)]
* **MinD**: "MinD: Unified Visual Imagination and Control via Hierarchical World Models", **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.18897)] [[Website](https://manipulate-in-dream.github.io/)]
* **FLARE**: "FLARE: Robot Learning with Implicit World Modeling",  **`arxiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.15659)] [[Code](https://github.com/NVIDIA/Isaac-GR00T)] [[Website](https://research.nvidia.com/labs/gear/flare)]
* **DreamGen**: "DreamGen: Unlocking Generalization in Robot Learning through Video World Models",  **`arxiv 2025.06`**. [[Paper](https://arxiv.org/abs/2505.12705)] [[Code](https://github.com/nvidia/GR00T-dreams)]
* **CoT-VLA**: "CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models",  **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2501.18867)]
* **UP-VLA**: "UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent",  **`ICML 2025`**. [[Paper](https://arxiv.org/abs/2503.22020)] [[Code](https://github.com/CladernyJorn/UP-VLA)]
* **3D-VLA**: "3D-VLA: A 3D Vision-Language-Action Generative World Model",  **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2403.09631)]

---
## World Models for Autonomous Driving
### Refer to https://github.com/LMD0311/Awesome-World-Model
* **TeraSim-World**: "TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.13164)] [[Website](https://wjiawei.com/terasim-world-web/)] 
* "Enhancing Physical Consistency in Lightweight World Models", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.12437)]
* **OccTENS**: "OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction", **`arXiv 2025.09`**. [[Paper](https://arxiv.org/abs/2509.03887)]
* **IRL-VLA**: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model", **`arXiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.06571)] [[Website](https://lidarcrafter.github.io)] [[Code](https://github.com/lidarcrafter/toolkit)]
* **LiDARCrafter**: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences", **`arXiv 2025.08`**. [[Paper](https://arxiv.org/abs/2508.03692)] [[Website](https://lidarcrafter.github.io)] [[Code](https://github.com/lidarcrafter/toolkit)]
* **FASTopoWM**: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.23325)] [[Code](https://github.com/YimingYang23/FASTopoWM)]
* **Orbis**: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.13162)] [[Code](https://lmb-freiburg.github.io/orbis.github.io/)]
* "World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.12762)]
* **NRSeg**: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models", **`arXiv 2025.07`**. [[Paper](https://arxiv.org/abs/2507.04002)] [[Code](https://github.com/lynn-yu/NRSeg)]
* **World4Drive**: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model", **`ICCV2025`**. [[Paper](https://arxiv.org/abs/2507.00603)] [[Code](https://github.com/ucaszyp/World4Drive)]
* **Epona**: "Epona: Autoregressive Diffusion World Model for Autonomous Driving", **`ICCV2025`**. [[Paper](https://arxiv.org/abs/2506.24113)] [[Code](https://kevin-thu.github.io/Epona/)]
* "Towards foundational LiDAR world models with efficient latent flow matching", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.23434)]
* **SceneDiffuser++**: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2506.21976)]
* **COME**: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.13260)] [[Code](https://github.com/synsin0/COME)]
* **STAGE**: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.13138)] 
* **ReSim**: "ReSim: Reliable World Simulation for Autonomous Driving", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.09981)] [[Code](https://github.com/OpenDriveLab/ReSim)] [[Project Page](https://opendrivelab.com/ReSim)]
* "Ego-centric Learning of Communicative World Models for Autonomous Driving", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.08149)] 
* **Dreamland**: "Dreamland: Controllable World Creation with Simulator and Generative Models", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.08006)] [[Project Page](https://metadriverse.github.io/dreamland/)] 
* **LongDWM**: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model", **`arXiv 2025.06`**. [[Paper](https://arxiv.org/abs/2506.01546)] [[Project Page](https://wang-xiaodong1899.github.io/longdwm/)] 
* **GeoDrive**: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.22421)] [[Code](https://github.com/antonioo-c/GeoDrive)] 
* **FutureSightDrive**: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving", **`NeurIPS 2025`**. [[Paper](https://arxiv.org/abs/2505.17685)] [[Code](https://github.com/MIV-XJTU/FSDrive)] 
* **Raw2Drive**: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16394)]
* **VL-SAFE**: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.16377)] [[Project Page](https://ys-qu.github.io/vlsafe-website/)] 
* **PosePilot**: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.01729)]
* "World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks", **`arXiv 2025.05`**. [[Paper](https://arxiv.org/abs/2505.01712)]
* "Learning to Drive from a World Model", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.19077)]
* **DriVerse**: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.18576)] 
* "End-to-End Driving with Online Trajectory Evaluation via BEV World Model", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.01941)] [[Code](https://github.com/liyingyanUCAS/WoTE)] 
* "Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.21232)]
* **MiLA**: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.15875)] [[Project Page](https://github.com/xiaomi-mlab/mila.github.io)] 
* **SimWorld**: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13952)] [[Project Page](https://github.com/Li-Zn-H/SimWorld)] 
* **UniFuture**: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13587)] [[Project Page](https://github.com/dk-liang/UniFuture)] 
* **EOT-WM**: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.09215)]
* "Temporal Triplane Transformers as Occupancy World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.07338)]
* **InDRiVE**: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2503.05573)]
* **MaskGWM**: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.11663)]
* **Dream to Drive**: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.10012)]
* "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2502.07309)]
* "Dream to Drive with Predictive Individual World Model", **`IEEE TIV`**. [[Paper](https://arxiv.org/abs/2501.16733)] [[Code](https://github.com/gaoyinfeng/PIWM)]
* **HERMES**: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.14729)] 
* **AdaWM**: "AdaWM: Adaptive World Model based Planning for Autonomous Driving", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2501.13072)] 
* **AD-L-JEPA**: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.04969)]  
* **DrivingWorld**: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.19505)] [[Code](https://github.com/YvanYin/DrivingWorld)] [[Project Page](https://huxiaotaostasy.github.io/DrivingWorld/index.html)] 
* **DrivingGPT**: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.18607)] [[Project Page](https://rogerchern.github.io/DrivingGPT/)]
* "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.13772)]
* **GEM**: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.11198)] [[Project Page](https://vita-epfl.github.io/GEM.github.io/)]
* **GaussianWorld**: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.04380)] [[Code](https://github.com/zuosc19/GaussianWorld)]
* **Doe-1**: "Doe-1: Closed-Loop Autonomous Driving with Large World Model", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.09627)] [[Project Page](https://wzzheng.net/Doe/)] [[Code](https://github.com/wzzheng/Doe)]
* "Pysical Informed Driving World Model", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.08410)] [[Project Page](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html)]
* **InfiniCube**: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.03934)] [[Project Page](https://research.nvidia.com/labs/toronto-ai/infinicube/)]
* **InfinityDrive**: "InfinityDrive: Breaking Time Limits in Driving World Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.01522)] [[Project Page](https://metadrivescape.github.io/papers_project/InfinityDrive/page.html)]
* **ReconDreamer**: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.19548)] [[Project Page](https://recondreamer.github.io/)]
* **Imagine-2-Drive**: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2411.10171)] [[Project Page](https://anantagrg.github.io/Imagine-2-Drive.github.io/)]
* **DynamicCity**: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes", **`ICLR 2025 Spotlight`**. [[Paper](https://arxiv.org/abs/2410.18084)] [[Project Page](https://dynamic-city.github.io)] [[Code](https://github.com/3DTopia/DynamicCity)]
* **DriveDreamer4D**: "World Models Are Effective Data Machines for 4D Driving Scene Representation", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.13571)] [[Project Page](https://drivedreamer4d.github.io/)]
* **DOME**: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.10429)] [[Project Page](https://gusongen.github.io/DOME)]
* **SSR**: "Does End-to-End Autonomous Driving Really Need Perception Tasks?", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.18341)] [[Code](https://github.com/PeidongLi/SSR)]
* "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.16663)]
* **LatentDriver**: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.15730)] [[Code](https://github.com/Sephirex-X/LatentDriver)]
* **RenderWorld**: "World Model with Self-Supervised 3D Label", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.11356)]
* **OccLLaMA**: "An Occupancy-Language-Action Generative World Model for Autonomous Driving", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.03272)]
* **DriveGenVLM**: "Real-world Video Generation for Vision Language Model based Autonomous Driving", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.16647)]
* **Drive-OccWorld**: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.14197)]
* **CarFormer**: "Self-Driving with Learned Object-Centric Representations", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2407.15843)] [[Code](https://kuis-ai.github.io/CarFormer/)]
* **BEVWorld**: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.05679)] [[Code](https://github.com/zympsyche/BevWorld)]
* **TOKEN**: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.00959)]
* **UMAD**: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.06370)]
* **SimGen**: "Simulator-conditioned Driving Scene Generation", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.09386)] [[Code](https://metadriverse.github.io/simgen/)]
* **AdaptiveDriver**: "Planning with Adaptive World Models for Autonomous Driving", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.10714)] [[Code](https://arunbalajeev.github.io/world_models_planning/world_model_paper.html)]
* **UnO**: "Unsupervised Occupancy Fields for Perception and Forecasting", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2406.08691)] [[Code](https://waabi.ai/research/uno)]
* **LAW**: "Enhancing End-to-End Autonomous Driving with Latent World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.08481)] [[Code](https://github.com/BraveGroup/LAW)]
* **Delphi**: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.01349)] [[Code](https://github.com/westlake-autolab/Delphi)]
* **OccSora**: "4D Occupancy Generation Models as World Simulators for Autonomous Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.20337)] [[Code](https://github.com/wzzheng/OccSora)]
* **MagicDrive3D**: "Controllable 3D Generation for Any-View Rendering in Street Scenes", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.14475)] [[Code](https://gaoruiyuan.com/magicdrive3d/)]
* **Vista**: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2405.17398)] [[Code](https://github.com/OpenDriveLab/Vista)]
* **CarDreamer**: "Open-Source Learning Platform for World Model based Autonomous Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.09111)] [[Code](https://github.com/ucd-dare/CarDreamer)]
* **DriveSim**: "Probing Multimodal LLMs as World Models for Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.05956)] [[Code](https://github.com/sreeramsa/DriveSim)]
* **DriveWorld**: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2405.04390)]
* **LidarDM**: "Generative LiDAR Simulation in a Generated World", **`arXiv 2024.04`**. [[Paper](https://arxiv.org/abs/2404.02903)] [[Code](https://github.com/vzyrianov/lidardm)]
* **SubjectDrive**: "Scaling Generative Data in Autonomous Driving via Subject Control", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.19438)] [[Project](https://subjectdrive.github.io/)]
* **DriveDreamer-2**: "LLM-Enhanced World Models for Diverse Driving Video Generation", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.06845)] [[Code](https://drivedreamer2.github.io/)]
* **Think2Drive**: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2402.16720)]
* **MARL-CCE**: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05085.pdf)] [[Code](https://github.com/qiaoguanren/MARL-CCE)]
* **GenAD**: "Generalized Predictive Model for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2403.09630)] [[Data](https://github.com/OpenDriveLab/DriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)]
* **GenAD**: "Generative End-to-End Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2402.11502)] [[Code](https://github.com/wzzheng/GenAD)]
* **NeMo**: "Neural Volumetric World Models for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02571.pdf)]
* **MARL-CCE**: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model", **`ECCV 2024`**. [[Code](https://github.com/qiaoguanren/MARL-CCE)]
* **ViDAR**: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2312.17655)] [[Code](https://github.com/OpenDriveLab/ViDAR)]
* **Drive-WM**: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.17918)] [[Code](https://github.com/BraveGroup/Drive-WM)]
* **Cam4DOCC**: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.17663)] [[Code](https://github.com/haomo-ai/Cam4DOcc)]
* **Panacea**: "Panoramic and Controllable Video Generation for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.16813)] [[Code](https://panacea-ad.github.io/)]
* **OccWorld**: "Learning a 3D Occupancy World Model for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2311.16038)] [[Code](https://github.com/wzzheng/OccWorld)]
* **Copilot4D**: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2311.01017)]
* **DrivingDiffusion**: "Layout-Guided multi-view driving scene video generation with latent diffusion model", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2310.07771)] [[Code](https://github.com/shalfun/DrivingDiffusion)]
* **SafeDreamer**: "Safe Reinforcement Learning with World Models", **`ICLR 2024`**. [[Paper](https://openreview.net/forum?id=tsE5HLYtYg)] [[Code](https://github.com/PKU-Alignment/SafeDreamer)]
* **MagicDrive**: "Street View Generation with Diverse 3D Geometry Control", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2310.02601)] [[Code](https://github.com/cure-lab/MagicDrive)]
* **DriveDreamer**: "Towards Real-world-driven World Models for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2309.09777)] [[Code](https://github.com/JeffWang987/DriveDreamer)]
* **SEM2**: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model", **`TITS`**. [[Paper](https://ieeexplore.ieee.org/abstract/document/10538211/)]


----
## Citation
If you find this repository useful, please consider citing this list:
```
@misc{leo2024worldmodelspaperslist,
    title = {Awesome-World-Models},
    author = {Leo Fan},
    journal = {GitHub repository},
    url = {https://github.com/leofan90/Awesome-World-Models},
    year = {2024},
}
```