We introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS.
The Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of the Splatter Image is its surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image.
We further extend the method to incorporate more than one image as input, which we do by adding cross-attention views. Owning to the speed of the renderer (588 FPS), furthermore, we can easily generate entire images during training, to optimize perceptual metrics like LPIPS. Furthermore, we use a single GPU for training. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics.
The Splatter Image method uses Gaussian Splatting as the underlying 3D representation, taking advantage of its rendering quality and speed. It works by applying an image-to-image neural network to the input view and obtain, as output, another image that holds the parameters of one coloured 3D Gaussian per pixel. The resulting Gaussian mixture can be rendered very quickly into an arbitrary view of the object by using Gaussian Splatting. Remarkably, the 3D Gaussians in the resulting `Splatter Image' provide high-quality 360 degree reconstructions even when compared to much slower methods.
The key challenge in using 3D Gaussians for monocular reconstruction is to design a network that takes an image of an object as input and produces as output a corresponding Gaussian mixture that represent all sides of it. Our key insight is that, while a Gaussian mixture is a set, i.e., an unordered collection, it can still be stored in an ordered data structure. Splatter Image takes advantage of this fact by using a 2D image as container for the 3D Gaussians, so that each pixel contains in turn the parameters of one Gaussian, including its opacity, shape, and colour.
The Splatter Image represents full 360 degrees of an object despite using a 2D data structure. It represents unobserved object parts by allocating background pixels to appropriate 3D locations (third row) to predict occluded elements like wheels (left) or chair legs (middle). Alternatively, it predicts offsets in the foreground pixels to represent occluded chair parts (right).
@article{szymanowicz2024splatter_image,
author = {Szymanowicz, Stanislaw and Rupprecht, Christian and Vedaldi, Andrea},
title = {Splatter Image: Ultra-Fast Single-View 3D Reconstruction},
journal = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024},
}