Shuai Yuan,1
Yantai Yang,1, 2
Xiaotian Yang,1
Xupeng Zhang,1
Zhonghao Zhao,1
Lingming Zhang,
Zhipeng Zhang1 ✉
1AutoLab, School of Artificial Intelligence, Shanghai Jiao Tong University
2Anyverse Dynamics
✉ Corresponding Author
Achieving higher reconstruction quality and more accurate camera pose estimation using thousands of frames input.
- [Jan 6 , 2026] Paper release.
- [Jan 6 , 2026] Code release.
- Welcome to check out our previous collaborative work FastVGGT.
We propose InfiniteVGGT, a causal visual geometry transformer that utilizes a training-free rolling memory mechanism to enable stable, infinite-horizon streaming, and introduce the Long3D benchmark to rigorously evaluate long-term continuous 3D geometry performance. Our main contributions are summarized as follows:
- An unbounded memory architecture InfiniteVGGT for continuous 3D geometry understanding, built on a novel, dynamic, and interpretable explicit memory system.
- State-of-the-art performance on long-sequence benchmarks and a unique capability for robust, infinite-horizon reconstruction without memory overflow.
- The Long3D benchmark, a new dataset for the rigorous evaluation of long-term performance, addressing a critical gap in the field.
- Clone InfiniteVGGT
git clone https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT.git
cd InfiniteVGGT- Create conda environment
conda create -n infinitevggt python=3.11 cmake=3.14.0
conda activate infinitevggt - Install requirements
pip install -r requirements.txt
conda install 'llvm-openmp<16'- Download the StreamVGGT pretrained checkpoint and place it to ./ckpt directory.
# Run on your own data
python run_inference.py --input_dir path/to/your/images_dir
# Run long sequence and store the result to directory for each frame
python run_inference.py \
--input_dir path/to/your/images_dir \
--frame_cache_dir path/to/your/results_perframe_dir \
--no_cache_resultsWe provide demo code based on the NRGBD dataset. You can run it using the following command:
python demo_viser.py \
--seq_path path/to/nrgbd/image_sequence \
--frame_interval 10 \
--gt_path path/to/nrgbd/gt_camera \ (Optional)- Release the Dataset.
We would like to acknowledge the following open-source projects that served as a foundation for our implementation:
DUSt3R CUT3R VGGT Point3R StreamVGGT FastVGGT TTT3R
Many thanks to these authors!
If you incorporate our work into your research, please cite:
@misc{yuan2026infinitevggt,
title={InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams},
author={Shuai Yuan and Yantai Yang and Xiaotian Yang and Xupeng Zhang and Zhonghao Zhao and Lingming Zhang and Zhipeng Zhang},
journal={arXiv preprint arXiv:2601.02281},
year={2026}
}


