We propose an efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. There are two key contributions coupled with the proposed system. The first is to adaptively and explicitly allocate sparse voxels to different levels of detail within scenes, faithfully reproducing scene details with 655363 grid resolution while achieving high rendering frame rates. Second, we customize a rasterizer for efficient adaptive sparse voxels rendering. We render voxels in the correct depth order by using ray direction-dependent Morton ordering, which avoids the well-known popping artifact found in Gaussian splatting. Our method improves the previous neural-free voxel model by over 4db PSNR and more than 10x FPS speedup, achieving state-of-the-art comparable novel-view synthesis results. Additionally, our voxel representation is seamlessly compatible with grid-based 3D processing techniques such as Volume Fusion, Voxel Pooling, and Marching Cubes, enabling a wide range of future extensions and applications.
Our scene representation is a hybrid of primitive and volumetric model. (a) Primitive component. We explicitly allocate voxels primitives to cover different scene level-of-details under an Octree layout. Note that we do not replicate a traditional Octree data structure with parent-child pointers or linear Octree. We only keep voxels at the Octree leaf nodes without any ancestor nodes. (b) Volumetric component. Inside a voxel is a volumetric (trilinear) density field and a (constant) spherical harmonic field. We sample K points on the ray-voxel intersection segment to compute the intensity contribution from the voxel to the pixel with numerical integration.
Adaptive level-of-details is crucial to scalability and quality. Sparse voxels with uniform voxel size can not scale up.
We sort voxels by ray direction-dependent Morton order, which ensures correct primitive blending order with mathematical proof. Sorting by primtive centers like 3DGS can produce inaccurate rendering. The inaccurate sorting causes 3DGS popping artifact (see left video below) while we don't have this issue (right video).
Fusing 2D modalities into the trained sparse voxels is efficient. The grid points simply take the weighted sum from the 2D views following classical volume fusing method. We show several examples in the following.
Rendered depths → Sparse-voxel SDF → Mesh
Image segmentation by Segformer → Sparse-voxel semantic field
Check it in jupyter notebook here.
Vision foundation model feature by RADIOv2.5 → Voxel pooling → Sparse-voxel foudation feature field
Check it in jupyter notebook here.
Dense CLIP feature by LangSplat → Voxel pooling → Sparse-voxel language field
Data structure
Sparse field
Scalability
Novel scene representations
LinPrim, RadiantFoam, MeshSplats are recent methods that use very different and cool approach for reconstruction and novel-view synthesis tasks.
@article{Sun2024SVR,
title={Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering},
author={Cheng Sun and Jaesung Choe and Charles Loop and Wei-Chiu Ma and Yu-Chiang Frank Wang},
journal={ArXiv},
year={2024},
volume={abs/2412.04459},
}