# FlashVGGT
**Repository Path**: gotoeasy/FlashVGGT
## Basic Information
- **Project Name**: FlashVGGT
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-06-07
- **Last Updated**: 2026-06-08
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention
CVPR 2026
Zipeng Wang
ยท
Dan Xu
https://github.com/user-attachments/assets/3347dbe0-f3c0-48d3-9611-1516b59fbf94
TL;DR: Accelerate VGGT by spatially resampling keys and values for global attention.
## Updates
- [05/02/2026] Evaluation code is released.
- [05/02/2026] Training code (both single-forward and streaming settings) is released.
- [05/02/2026] Code and checkpoints for FlashVGGT are released.
## Overview
Instead of applying dense global attention across all tokens, FlashVGGT compresses spatial information from each frame into a compact set of descriptor tokens. Global attention is then computed as cross-attention between the full set of image tokens and this smaller descriptor set, significantly reducing computational overhead. Moreover, the compactness of the descriptors enables online inference over long sequences via a chunk-recursive mechanism that reuses cached descriptors from previous chunks.
## Installation
### Environment Setup
First, you should clone the repository and create an anaconda environment.
```bash
git clone https://github.com/wzpscott/FlashVGGT.git
cd FlashVGGT
conda create -n flashvggt python=3.10 -y
conda activate flashvggt
```
Then, You can use the following command to install the dependencies.
```bash
pip install -r requirements.txt
```
You can also install FlashVGGT as a package.
```bash
pip install -e . --no-deps
```
### Checkpoints
You can download the checkpoints for single-forward and streaming variants of FlashVGGT from the [HuggingFace](https://huggingface.co/ZipW/FlashVGGT). You should download the checkpoints to the `ckpts` folder.
```bash
# Create the checkpoints directory
mkdir -p ckpts
# Download the standard model
huggingface-cli download ZipW/FlashVGGT flashvggt.pt --local-dir ckpts
# Download the streaming model
huggingface-cli download ZipW/FlashVGGT flashvggt_stream.pt --local-dir ckpts
```
## Quick Start
We provide a demo script `demo_o3d.py` to visualize the 3D reconstruction results as point clouds using Open3D. The output is a `.ply` file that can be easily visualized with most 3D viewers.
### Usage Examples
#### Standard FlashVGGT Inference:
To run the standard FlashVGGT model on a folder of images:
```bash
python demo_o3d.py \
--model FlashVGGT \
--image_folder ./examples/garden/ \
--output_dir outputs/
```
#### Streaming FlashVGGT Inference:
To run the streaming variant (FlashVGGTStream) which is optimized for longer sequences:
```bash
python demo_o3d.py \
--model FlashVGGTStream \
--image_folder ./examples/garden/ \
--chunksize 10 \
--output_dir outputs/
```
Key Arguments
- `--model`: Choose between `FlashVGGT` (single-forward) and `FlashVGGTStream` (streaming inference). Default is `FlashVGGT`.
- `--image_folder`: Path to the directory containing input images. Default is `./examples/garden/`.
- `--output_dir`: Directory where the generated `.ply` point cloud will be saved. Default is `outputs/`.
- `--chunksize`: Frame chunk size for `FlashVGGTStream` streaming inference. Default is `10`.
- `--max_points`: Maximum number of points to include in the output point cloud. Default is `1000000`.
- `--conf_threshold`: Percentage of low-confidence points to filter out (0-100). Default is `40.0`.
- `--kv_downfactor`: KV downfactor for attention compression. Default is `4`.
- `--keyframe_every`: Keyframe interval for the standard FlashVGGT model. Default is `200`.
## Training
The training code for FlashVGGT (both single-forward and streaming settings) is available in the `training` branch. Please refer to the [Training README](https://github.com/wzpscott/FlashVGGT/blob/training/training/README.md) for detailed instructions on installation, dataset preparation, and training commands.
## Evaluation
The evaluation code is based on [MonST3R](https://github.com/Junyi42/monst3r/blob/main/data/evaluation_script.md) and [CUT3R](https://github.com/CUT3R/CUT3R).
You can use the following command to evaluate the model.
```bash
python eval.py --config-name dense_recon num_frames=100 save_name=dense_recon_100
python eval.py --config-name dense_recon num_frames=500 save_name=dense_recon_500
python eval.py --config-name dense_recon num_frames=1000 save_name=dense_recon_1000
```
The evaluation results are saved in the `eval/logs/dense_recon` folder.
## Acknowledgements
Our code is based on the following awesome repositories:
- [VGGT](https://github.com/facebookresearch/vggt)
- [FastVGGT](https://github.com/mystorm16/FastVGGT)
- [StreamVGGT](https://github.com/wzzheng/streamvggt)
- [CUT3R](https://github.com/CUT3R/CUT3R)
- [TTT3R](https://github.com/Inception3D/TTT3R)
We thank the authors for releasing their code!
## Citation
If you find our work useful, please cite:
```bibtex
@inproceedings{wang2025flashvggt,
title={FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention},
author={Wang, Zipeng and Xu, Dan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}
```