Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

1 Nvidia   2 Cornell University   3 National Taiwan University
CVPR 2025
Updates
  • Mar 18, 2025. Revise literature review. Support depthanythingv2 relative depth loss and mast3r metric depth loss for a better geometry.
  • Mar 8, 2025. Support ScanNet++ dataset. Check official benchmark for our results on the 3rd-party hidden set evaluation over 50 indoor scenes. Our short report may be helpful if you want to work on scannet or indoor environement.

Overview

We propose an efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. There are two key contributions coupled with the proposed system. The first is to adaptively and explicitly allocate sparse voxels to different levels of detail within scenes, faithfully reproducing scene details with 655363 grid resolution while achieving high rendering frame rates. Second, we customize a rasterizer for efficient adaptive sparse voxels rendering. We render voxels in the correct depth order by using ray direction-dependent Morton ordering, which avoids the well-known popping artifact found in Gaussian splatting. Our method improves the previous neural-free voxel model by over 4db PSNR and more than 10x FPS speedup, achieving state-of-the-art comparable novel-view synthesis results. Additionally, our voxel representation is seamlessly compatible with grid-based 3D processing techniques such as Volume Fusion, Voxel Pooling, and Marching Cubes, enabling a wide range of future extensions and applications.

Adaptive Sparse Voxel Representation and Rendering

Our scene representation is a hybrid of primitive and volumetric model. (a) Primitive component. We explicitly allocate voxels primitives to cover different scene level-of-details under an Octree layout. Note that we do not replicate a traditional Octree data structure with parent-child pointers or linear Octree. We only keep voxels at the Octree leaf nodes without any ancestor nodes. (b) Volumetric component. Inside a voxel is a volumetric (trilinear) density field and a (constant) spherical harmonic field. We sample K points on the ray-voxel intersection segment to compute the intensity contribution from the voxel to the pixel with numerical integration.



Adaptive level-of-details is crucial to scalability and quality. Sparse voxels with uniform voxel size can not scale up.



We sort voxels by ray direction-dependent Morton order, which ensures correct primitive blending order with mathematical proof. Sorting by primtive centers like 3DGS can produce inaccurate rendering. The inaccurate sorting causes 3DGS popping artifact (see left video below) while we don't have this issue (right video).

Novel-view Synthesis Results

Ours
3DGS
Ours
3DGS
Ours
3DGS
Ours
3DGS
Ours
3DGS
Ours
3DGS

Adaptive Sparse Voxel Fusion

Fusing 2D modalities into the trained sparse voxels is efficient. The grid points simply take the weighted sum from the 2D views following classical volume fusing method. We show several examples in the following.

Rendered depths → Sparse-voxel SDF → Mesh



Image segmentation by Segformer → Sparse-voxel semantic field
Check it in jupyter notebook here.

Render view
3D fused semantic field
2D Segformer predction


Vision foundation model feature by RADIOv2.5 → Voxel pooling → Sparse-voxel foudation feature field
Check it in jupyter notebook here.

Render view
3D fused RADIO feature field
2D RADIO predction


Dense CLIP feature by LangSplat → Voxel pooling → Sparse-voxel language field

Related work

Data structure

  • Sparse Voxel Octrees uses an Octree to manage the sparse voxels.
  • VDB uses a shallow tree with wide branching factor to manage the sparse volumes. Extensions include OpenVDB, NanoVDB, NeuralVDB, and the recent fVDB
  • Our method does not use any advanced data structure.
We simply store all the non-empty voxels of different size in a 1D array. Our rasterizer ensures everything is correctly ordered when rendering. One limitation is that we do not support efficient search (ie., given a point, find its covering voxel). Future work needs to sort the voxel and implement a binary search for this purpose.

Sparse field

  • Global-sparse-local-dense strategy allocates a tiny dense 3D grid to each sparse volume. Classical methods like Voxel hasing, Bundle fusion, and Open3D uses this strategy to do sensor fusion.
  • Implicit sparse neural field like NSVF and NGLOD employs MLP network to map the latent feature stored in the Sparse Voxel Octree to the target attributes.
  • Our method allocate just a single voxel at leaf nodes with explicit density and color parameters, like Plenoxels.

Scalability

  • Uniform-level sparse voxels allocates voxel to a given target level. NSVF, NGLOD, SPC, Plenoxels follow this design, which encouters scalability issue when using higher grid resolution for better quality.
  • Our method allocates voxel on different Octree levels. We only keep the leaf nodes in a 1D array without advanced data structure.
For now, our maximum Octree levels is 16 which implies 655363 maximum grid resolution. Extention to more Octree level should be simple. The increase in rendering time should be minor: 3 extra bits to sort per additional level.

Novel scene representations

LinPrim, RadiantFoam, MeshSplats are recent methods that use very different and cool approach for reconstruction and novel-view synthesis tasks.

BibTeX

@article{Sun2024SVR,
  title={Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering},
  author={Cheng Sun and Jaesung Choe and Charles Loop and Wei-Chiu Ma and Yu-Chiang Frank Wang},
  journal={ArXiv},
  year={2024},
  volume={abs/2412.04459},
}