Towards an Embodied, Physically-Consistent Generative World Model
WoW (World-Omniscient World Model) is a 14B-parameter generative world model trained on 2 million real-world robot interaction trajectories. It is designed for physically consistent imagination, reasoning, and action in robotics.
- We've updated the WoW-WAN2.1 Gradio ‘demo’ in the demo folder. A more user-friendly inference interface is now available. Just download the checkpoint and run the code to try it out!
- We release the DiT postraining checkpoints of WoW,includes DiT-2B based on Cosmos-Predict2, DiT-7B based on Cosmos-Predict1, and DiT-14B based on the Wan2.1
For Wan based models, follow the demo/README.md (recommand)
For Cosmos based DiT models:
pip install -r dit_models/wow-dit-2b/requires.txt- Run the Wan demo:
python demo/wan_infer_demo.py - Example: Inference with 2B DiT model
python scripts/infer_wow_dit_2b.py --help- For 7B model:
python scripts/infer_wow_dit_7b.py --help- For custom input or parameters, please refer to comments in the corresponding demo scripts.
We have released the following models and datasets on Hugging Face:
| Model Name | Parameters | Training Steps | Link |
|---|---|---|---|
| WoW-1-DiT-2B-600k | 2B | 600k | 🔗 Link |
| WoW-1-DiT-7B-600k | 7B | 600k | 🔗 Link |
| WoW-1-Wan-14B-600k | 14B | 600k | 🔗 Link |
| (🔥Recommand!)WoW-1-Wan-14B-2M | 14B | 2M | 🔗 Link |
| Wan-1-Wan-1.3B-2M | 1.3B | 2M | 🔗 Link |
| Dataset Name | Description | Link |
|---|---|---|
| WoW-1-Benchmark-Samples | Evaluation set for physical consistency and causal reasoning (WoWBench). | 📄 Link |
WoW is being released in phases to promote transparency and collaboration. Below is the current open-source progress:
- Paper released on arXiv:2509.22642
- Project website launched: wow-world-model.github.io
- Model Weights (2B, 7B, 14B WoW-DiT)
- Inference Scripts & Colab Demo
- Baseline Inverse Dynamics Model
- Baseline Model Weights (SVD, CogVideoX, Cosmos1&2)
- 3D-Flow-Mask Inverse Dynamics Model
- Training Pipeline
- SOPHIA Framework Code
- WoWBench benchmark design & evaluation metrics
- Continuous release of real/simulated trajectory data
- Expansion to multimodal inputs (audio, tactile, etc.)
- Universal fine-tuning API for downstream tasks
- Community challenges and leaderboard
- Submit issues or feature requests
- Improve code or documentation
- Run experiments and submit results
- Contribute real-world robot data
- Email: [email protected]
- Project website: wow-world-model.github.io
If you use WoW in your research, please cite:
@article{chi2025wow,
title={WoW: Towards a World omniscient World model Through Embodied Interaction},
author={Chi, Xiaowei and Jia, Peidong and Fan, Chun-Kai and Ju, Xiaozhu and Mi, Weishi and Qin, Zhiyuan and Zhang, Kevin and Tian, Wanxin and Ge, Kuangzhi and Li, Hao and others},
journal={arXiv preprint arXiv:2509.22642},
year={2025}
}