Skip to content

AutoLab-SAI-SJTU/InfiniteVGGT

Repository files navigation

Logo InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Autolab Logo    Shuai Yuan,1   Yantai Yang,1, 2   Xiaotian Yang,1   Xupeng Zhang,1  
Zhonghao Zhao,1   Lingming Zhang,   Zhipeng Zhang1 ✉  

1AutoLab, School of Artificial Intelligence, Shanghai Jiao Tong University  
2Anyverse Dynamics

Corresponding Author

Paper PDF Hugging Face

Achieving higher reconstruction quality and more accurate camera pose estimation using thousands of frames input.

📰 News

  • [Jan 6 , 2026] Paper release.
  • [Jan 6 , 2026] Code release.

🔍 Recommendation

  • Welcome to check out our previous collaborative work FastVGGT.

📖 Overview

We propose InfiniteVGGT, a causal visual geometry transformer that utilizes a training-free rolling memory mechanism to enable stable, infinite-horizon streaming, and introduce the Long3D benchmark to rigorously evaluate long-term continuous 3D geometry performance. Our main contributions are summarized as follows:

  1. An unbounded memory architecture InfiniteVGGT for continuous 3D geometry understanding, built on a novel, dynamic, and interpretable explicit memory system.
  2. State-of-the-art performance on long-sequence benchmarks and a unique capability for robust, infinite-horizon reconstruction without memory overflow.
  3. The Long3D benchmark, a new dataset for the rigorous evaluation of long-term performance, addressing a critical gap in the field.

🌍 Installation

  1. Clone InfiniteVGGT
git clone https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT.git
cd InfiniteVGGT
  1. Create conda environment
conda create -n infinitevggt python=3.11 cmake=3.14.0
conda activate infinitevggt 
  1. Install requirements
pip install -r requirements.txt
conda install 'llvm-openmp<16'
  1. Download the StreamVGGT pretrained checkpoint and place it to ./ckpt directory.

▶️ Run Inference

# Run on your own data
python run_inference.py --input_dir path/to/your/images_dir

# Run long sequence and store the result to directory for each frame
python run_inference.py \
    --input_dir path/to/your/images_dir \
    --frame_cache_dir path/to/your/results_perframe_dir \
    --no_cache_results

🚀 Run Demo

We provide demo code based on the NRGBD dataset. You can run it using the following command:

python demo_viser.py  \
    --seq_path path/to/nrgbd/image_sequence \
    --frame_interval 10 \
    --gt_path path/to/nrgbd/gt_camera \ (Optional)

📋 Checklist

  • Release the Dataset.

🙏 Acknowledgement

We would like to acknowledge the following open-source projects that served as a foundation for our implementation:

DUSt3R CUT3R VGGT Point3R StreamVGGT FastVGGT TTT3R

Many thanks to these authors!

📜 Citation

If you incorporate our work into your research, please cite:

@misc{yuan2026infinitevggt,
        title={InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams}, 
        author={Shuai Yuan and Yantai Yang and Xiaotian Yang and Xupeng Zhang and Zhonghao Zhao and Lingming Zhang and Zhipeng Zhang},
        journal={arXiv preprint arXiv:2601.02281},
        year={2026}
}

About

The official implementation of InfiniteVGGT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages