What if I told you that you can create stunning 3D gaussian splatting videos from just 3 images in under a minute? Seems too good to be true. But just wait a minute, thanks to NVIDIA Research’s InstantSplat now you can! Their newly open source code can easily be ran on a Windows or Linux machine. In my hands-on video below, I walk you through the ins and outs of making your first InstantSplat. As a bonus, I also created a handy browser interface via Gradio so you don’t have to deal with command line once it’s set up!
What You Need to Run InstantSplat
Before you get started, ensure you have the right setup:
- Windows or Linux PC (my instructions are for windows, but the original project instructions are for Linux)
- A decent NVIDIA GPU (RTX 30xx or better)
- My setup in this video: Dell Precision 3680 Workstation w/Intel i9 core 14900k, NVIDIA RTX 6000 Ada, 64GB RAM (over spec’d for this project)
Getting Started
Setting up the project is easy. Ensure you have Visual Studio installed with CUDA Toolkit 11.8 or newer (if you don’t have these already installed, ensure you install VS before CUDA Toolkit). Next, follow my step by step prompt instructions to set up InstantSplat. Once you have everything set up, all you do is run a simple command that runs the GUI in your browser.
You can find the complete installation and usage directions on my GitHub page: https://github.com/jonstephens85/InstantSplat_Windows
How InstantSplat Works
When created a 3D Gaussian splatting scene, the first step is determining the camera positions in 3D space from each image and generating a sparse point cloud. Traditional SfM workflows such as COLMAP or RealityCapture requires many images to accomplish this. InstantSplat directly transforms 2D images into 3D poses by using a machine learning model called MASt3R. This model predicts the depth (how far objects are from the camera) and aligns images correctly.
Next, InstantSplat represents the scene as a set of 3D Gaussians. The gaussians are placed in space to match the reference images and are adjusted so that when viewed from any angle, they recreate the original scene. Since the input images don’t have precise camera positions from Mast3r, InstantSplat further refines them by comparing the predicted images to the real ones and adjusting until they match.
If parts of the scene are repeated (like leaves in different images), InstantSplat removes duplicates and keeps only the necessary details to speed up the process. InstantSplats continuously learns and corrects itself using a feedback loop by comparing the rendered 3D model to the original images and tweaks the scene to make it look more realistic.
Since the scene only uses 3 -12 input images, the project goes lightning fast and at the end it renders stunningly realistic videos!
Why Use InstantSplat
Aside from the “cool factor” when you run the project the first time, it leaves one thinking what could this be used for? Here are a few ideas I came up with:
- You want to create a nice smooth product shot but you don’t have a camera with a gimbal
- You took a few photos of scene on a trip and want to turn it into 3D
- Mars rover shots in 3D! (Yes, this was actually done already with InstantSplat!