VGGT: Visual Geometry Grounded Transformer
Published on: 2025-05-31 19:59:26
@inproceedings { wang2025vggt , title = { VGGT: Visual Geometry Grounded Transformer } , author = { Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David } , booktitle = { Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition } , year = { 2025 } }
Overview
Visual Geometry Grounded Transformer (VGGT, CVPR 2025) is a feed-forward neural network that directly infers all key 3D attributes of a scene, including extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds.
Quick Start
First, clone this repository to your local machine, and install the dependencies (torch, torchvision, numpy, Pillow, and huggingface_hub).
git clone [email protected]:facebookresearch/vggt.git cd vggt pip install -r requirements.txt
Alternatively, you can install VGGT as a package (click here for details).
Now, try the model with just
... Read full article.