Skip to content

HuaiyuanXu/3D-Occupancy-Perception

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

240 Commits
 
 
 
 

Repository files navigation

image

Huaiyuan Xu . Junliang Chen . Shiyu Meng . Yi Wang . Lap-Pui Chau*

arXiv PDF

We research 3D Occupancy Perception for Autonomous Driving

This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!😉😉😉

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.

✨You are welcome to provide us your work with a topic related to 3D occupancy for autonomous driving (involving not only perception, but also applications)!

If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.

✨Highlight

[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.

[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.

[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.

🔥 News

  • [2024-09-03] This survey got accepted by Information Fusion (Impact factor: 14.7).
  • [2024-07-21] More representative works and benchmarking comparisons have been incorporated, bringing the total to 192 literature references.
  • [2024-05-18] More figures have been added to the survey. We reorganize the occupancy-based applications.
  • [2024-05-08] The first version of the survey is available on arXiv. We curate this repository.

Introduction

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception.

Summary of Contents

Methods: A Survey

LiDAR-Centric Occupancy Perception

Year Venue Paper Title Link
2026 arXiv TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction Code
2026 arXiv LiFlow: Flow Matching for 3D LiDAR Scene Completion Code
2025 arXiv Octree Latent Diffusion for Semantic 3D Scene Generation and Completion -
2025 arXiv Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion Code
2024 NeurIPS TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight Code
2024 CVPR PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Best paper award candidate) Project Page
2024 IROS LiDAR-based 4D Occupancy Completion and Forecasting Project Page
2024 arXiv Towards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model -
2024 arXiv OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity Project Page
2024 arXiv DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models -
2024 arXiv MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction -
2023 T-IV Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders Code
2023 arXiv PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction Code
2021 T-PAMI Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data -
2021 AAAI Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion Code
2020 CoRL S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds -
2020 3DV LMSCNet: Lightweight Multiscale 3D Semantic Completion Code

Vision-Centric Occupancy Perception

Year Venue Paper Title Link
2026 CVPR Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving -
2026 T-IP Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion Code
2026 AAAI Towards 3D Object-Centric Feature Learning for Semantic Scene Completion -
2026 AAAI Towards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion -
2026 arXiv M2-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs Code
2026 arXiv VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction -
2026 arXiv Rebenchmarking Unsupervised Monocular 3D Occupancy Prediction -
2026 arXiv SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction Code
2025 T-PAMI SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations Code
2025 ICCV ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction Code
2025 ICCV MAESTRO: Task-Relevant Optimization via Adaptive Feature Enhancement and Suppression for Multi-task 3D Perception Code
2025 ICCV GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting Project Page
2025 ICCV Semantic Causality-Aware Vision-Based 3D Occupancy Prediction Code
2025 ICCV Occupancy Learning with Spatiotemporal Memory Project Page
2025 ICCV GS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian Splatting Project Page
2025 ICCV Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion Code
2025 ICCV Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Project Page
2025 CVPR VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction Project Page
2025 CVPR Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction Project Page
2025 CVPR GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Code
2025 CVPR 3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation Project Page
2025 T-RO Particle-based Instance-aware Semantic Occupancy Mapping in Dynamic Environments -
2025 T-ITS GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision -
2025 AAAI VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion (Oral) Code
2025 AAAI Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion Code
2025 AAAI ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction Project Page
2025 AAAI ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder Code
2025 AAAI LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba -
2025 AAAI Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance -
2025 ICRA OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction Code
2025 ICRA Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving -
2025 AAAIW A Spatiotemporal Approach to Tri-Perspective Representation for 3D Semantic Occupancy Prediction Project Page
2025 arXiv HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction -
2025 arXiv VOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene Completion Code
2025 arXiv Enhancing 3D Semantic Scene Completion with a Refinement Module Project Page
2025 arXiv VG3T: Visual Geometry Grounded Gaussian Transformer -
2025 arXiv SuperQuadricOcc: Multi-Layer Gaussian Approximation of Superquadrics for Real-Time Self-Supervised Occupancy Estimation -
2025 arXiv QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy -
2025 arXiv ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation -
2025 arXiv EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models -
2025 arXiv ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting -
2025 arXiv SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion Code
2025 arXiv DA-Occ: Efficient 3D Voxel Occupancy Prediction via Directional 2D for Geometric Structure Preservation -
2025 arXiv Unleashing Semantic and Geometric Priors for 3D Scene Completion -
2025 arXiv DA-Occ: Efficient 3D Voxel Occupancy Prediction via Directional 2D for Geometric Structure Preservation -
2025 arXiv GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction -
2025 arXiv VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions Code
2025 arXiv FMOcc: TPV-Driven Flow Matching for 3D Occupancy Prediction with Selective State Space Model -
2025 arXiv Out-of-Distribution Semantic Occupancy Prediction Code
2025 arXiv GraphGSOcc: Semantic and Geometric Graph Transformer for 3D Gaussian Splating-based Occupancy Prediction -
2025 arXiv QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction Project Page
2025 arXiv ODG: Occupancy Prediction Using Dual Gaussians -
2025 arXiv S2GO: Streaming Sparse Gaussian Occupancy Prediction -
2025 arXiv VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection Project Page
2025 arXiv SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail Voxels Code
2025 arXiv See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction Code
2025 arXiv STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction Code
2025 arXiv LMPOcc: 3D Semantic Occupancy Prediction Utilizing Long-Term Memory Prior from Historical Traversals -
2025 arXiv Inverse++: Vision-Centric 3D Semantic Occupancy Prediction Assisted with 3D Object Detection Code
2025 arXiv Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction Code
2025 arXiv SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion Code
2025 arXiv L2COcc: Lightweight Camera-Centric Semantic Scene Completion via Distillation of LiDAR Model Project Page
2025 arXiv SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World Code
2025 arXiv OccLinker: Deflickering Occupancy Networks through Lightweight Spatio-Temporal Correlation -
2025 arXiv Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation -
2025 arXiv Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations -
2025 arXiv TT-Occ: Test-Time Compute for Self-Supervised Occupancy via Spatio-Temporal Gaussian Splatting Code
2025 arXiv AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting -
2025 arXiv GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow -
2025 arXiv Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance -
2025 arXiv GaussRender: Learning 3D Occupancy with Gaussian Rendering Code
2025 arXiv Event-aided Semantic Scene Completion Code
2024 NeurIPS OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries Code
2024 NeurIPS Context and Geometry Aware Voxel Transformer for Semantic Scene Completion (Spotlight paper) Code
2024 NeurIPS OPUS: Occupancy Prediction Using a Sparse Set Code
2024 ECCV ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers Code
2024 ECCV CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction Code
2024 ECCV VEON: Vocabulary-Enhanced Occupancy Prediction Code
2024 ECCV Fully Sparse 3D Occupancy Prediction Code
2024 ECCV GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction Project Page
2024 ECCV Occupancy as Set of Points Code
2024 ECCV Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion Code
2024 CVPR LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction -
2024 CVPR Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion -
2024 CVPR Symphonize 3D Semantic Scene Completion with Contextual Instance Queries Code
2024 CVPR SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction Project Page
2024 CVPR SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction Project Page
2024 CVPR PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation Code
2024 CVPR Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation Code
2024 CVPR COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction Code
2024 CVPR Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles Project Page
2024 CVPR Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Code
2024 CVPR Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation Project Page
2024 CVPR DriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving -
2024 T-IP Camera-based 3D Semantic Scene Completion with Sparse Guidance Network Code
2024 CoRL Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction Project Page
2024 IJCAI Label-efficient Semantic Scene Completion with Scribble Annotations Code
2024 IJCAI Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion Code
2024 ICRA The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition Project Page
2024 ICRA RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision Code
2024 ICRA MonoOcc: Digging into Monocular Semantic Occupancy Prediction Code
2024 ICRA FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View -
2024 AAAI Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving Code
2024 AAAI One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception -
2024 RA-L HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction -
2024 RA-L UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction Code
2024 AAIML SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints Project Page
2024 3DV PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving -
2024 IROS SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views Code
2024 arXiv GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting -
2024 arXiv GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction Code
2024 arXiv GaussianAD: Gaussian-Centric End-to-End Autonomous Driving Project Page
2024 arXiv Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction -
2024 arXiv Fast Occupancy Network -
2024 arXiv Lightweight Spatial Embedding for Vision-based 3D Occupancy Prediction -
2024 arXiv GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction Code
2024 arXiv Language Driven Occupancy Prediction Code
2024 arXiv GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Code
2024 arXiv ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera -
2024 arXiv ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning -
2024 arXiv Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction Code
2024 arXiv AdaOcc: Adaptive-Resolution Occupancy Prediction -
2024 arXiv MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering Code
2024 arXiv VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction -
2024 arXiv UniVision: A Unified Framework for Vision-Centric 3D Perception Code
2024 arXiv LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering -
2024 arXiv Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement -
2024 arXiv α-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion -
2024 arXiv Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center Code
2024 arXiv BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network Code
2024 arXiv OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow -
2024 arXiv OccFiner: Offboard Occupancy Refinement with Hybrid Propagation -
2024 arXiv InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction Code
2023 CVPR VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion Code
2023 CVPR Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction Project Page
2023 NeurIPS POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images Project Page
2023 NeurIPS Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving Project Page
2023 ICCV SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving Project Page
2023 ICCV Scene as Occupancy Code
2023 ICCV OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction Code
2023 ICCV NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space Code
2023 T-IV 3DOPFormer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance Enhancement Code
2023 arXiv OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction -
2023 arXiv OVO: Open-Vocabulary Occupancy Code
2023 arXiv OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments Project Page
2023 arXiv OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion Code
2023 arXiv FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin Code
2023 arXiv FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation Code
2023 arXiv DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion -
2023 arXiv A Simple Framework for 3D Occupancy Estimation in Autonomous Driving Code
2023 arXiv UniWorld: Autonomous Driving Pre-training via World Models Code
2022 CVPR MonoScene: Monocular 3D Semantic Scene Completion Project Page

Radar-Centric Occupancy Perception

Year Venue Paper Title Link
2025 arXiv 4D-ROLLS: 4D Radar Occupancy Learning via LiDAR Supervision Code
2024 NeurIPS RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar -

Multi-Modal Occupancy Perception

Year Venue Paper Title Code
2025 CVPR OccMamba: Semantic Occupancy Prediction with State Space Models Code
2025 IROS A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding Code
2025 IROS REOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction -
2025 arXiv DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning Code
2025 arXiv OccLE: Label-Efficient 3D Semantic Occupancy Prediction -
2025 arXiv GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention Project Page
2025 arXiv OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction Code
2025 arXiv MinkOcc: Towards real-time label-efficient semantic occupancy prediction -
2025 arXiv OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting -
2025 arXiv MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies Code
2025 arXiv DORACAMOM: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception -
2024 ECCV OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving Project Page
2024 RA-L Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction Project Page
2024 arXiv MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation -
2024 arXiv PVP: Polar Representation Boost for 3D Semantic Occupancy Prediction -
2024 arXiv Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation Code
2024 arXiv OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction -
2024 arXiv DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction Code
2024 arXiv LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera Project Page
2024 arXiv OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction -
2024 arXiv EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network Code
2024 arXiv Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution -
2024 arXiv OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction -
2024 arXiv Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception -
2023 ICCV OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception Code

3D Occupancy Datasets

Dataset Year Venue Modality # of Classes Flow Link
UniOcc 2025 ICCV Camera 10, 15, 17 ✔️ Intro.
OpenScene 2024 CVPR 2024 Challenge Camera - ✔️ Intro.
Cam4DOcc 2024 CVPR Camera+LiDAR 2 ✔️ Intro.
Occ3D 2024 NeurIPS Camera 14 (Occ3D-Waymo), 16 (Occ3D-nuScenes) Intro.
OpenOcc 2023 ICCV Camera 16 Intro.
OpenOccupancy 2023 ICCV Camera+LiDAR 16 Intro.
SurroundOcc 2023 ICCV Camera 16 Intro.
OCFBench 2023 arXiv LiDAR -(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes) Intro.
SSCBench 2023 arXiv Camera 19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo) Intro.
SemanticKITT 2019 ICCV Camera+LiDAR 19(Semantic Scene Completion task) Intro.

Occupancy-based Applications

Indoor Ego-Centric

Specific Task Year Venue Paper Title Link
Indoor Occupancy Prediction 2026 CVPR Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes Code
Indoor Occupancy Prediction 2026 CVPR Generalizing Visual Geometry Priors to Sparse Gaussian Occupancy Prediction Code
Indoor Occupancy Prediction 2026 arXiv Parameter-Free Adaptive Multi-Scale Channel-Spatial Attention Aggregation framework for 3D Indoor Semantic Scene Completion Toward Assisting Visually Impaired -
Indoor Occupancy Prediction 2025 RA-L Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation Code
Indoor Semantic Scene Completion 2025 arXiv TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion -
Indoor Occupancy Prediction 2025 arXiv SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion -
Indoor Occupancy Prediction 2025 arXiv YouTube-Occ: Learning Indoor 3D Semantic Occupancy Prediction from YouTube Videos -

Robotics

Specific Task Year Venue Paper Title Link
Occupancy for Mobile Robots 2025 arXiv MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots -
Humanoid Occupancy 2025 arXiv Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots Project Page
Video Generation 2025 arXiv ORV: 4D Occupancy-centric Robot Video Generation Project Page
World Model 2025 arXiv Occupancy World Model for Robots -
Perception 2025 arXiv RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots -

Segmentation

Specific Task Year Venue Paper Title Link
3D Panoptic Segmentation 2024 CVPR PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation Code
BEV Segmentation 2024 CVPRW OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks Code

Detection

Specific Task Year Venue Paper Title Link
3D Object Detection 2025 ICONIP Collaborative Perceiver: Elevating Vision-based 3D Object Detection via Local Density-Aware Spatial Occupancy Code
3D Object Detection 2024 NeurIPS Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection Code
3D Object Detection 2024 CVPR Learning Occupancy for Monocular 3D Object Detection Code
3D Object Detection 2024 AAAI SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection Code
3D Object Detection 2024 arXiv UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height -

Tracking

Specific Task Year Venue Paper Title Link
Object Tracking 2025 ICRA TrackOcc: Camera-based 4D Panoptic Occupancy Tracking Code

Dynamic Perception

Specific Task Year Venue Paper Title Link
3D Flow Prediction 2026 RA-L SelfOccFlow: Towards end-to-end self-supervised 3D Occupancy Flow prediction -
3D Flow Prediction 2024 CVPR Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Code
3D Flow Prediction 2024 arXiv Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction Project Page

Generation

Specific Task Year Venue Paper Title Link
Scene Generation 2025 T-PAMI OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation -
Multimodal Scene Generation 2025 CVPR UniScene: Unified Occupancy-centric Driving Scene Generation Project Page
Scene Generation 2025 arXiv GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation Project Page
Multimodal Scene Generation 2025 arXiv Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method Code
Scene Generation 2024 ECCV Pyramid Diffusion for Fine 3D Large Scene Generation (Oral paper) Code
Scene Generation 2024 CVPR SemCity: Semantic Scene Generation with Triplane Diffusion Code
Scene Generation 2024 arXiv InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models Project Page
Scene Generation 2024 arXiv SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs Project Page

Navigation

Specific Task Year Venue Paper Title Link
Navigation 2026 arXiv SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation Project Page
Navigation 2025 arXiv OmniNWM: Omniscient Driving Navigation World Models Project Page
Navigation for Air-Ground Robots 2024 RA-L HE-Nav: A High-Performance and Efficient Navigation System for Aerial-Ground Robots in Cluttered Environments Project Page
Navigation for Air-Ground Robots 2024 ICRA AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments Code
Navigation for Air-Ground Robots 2024 arXiv OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model Project Page

World Models

Specific Task Year Venue Paper Title Link
4D Occupancy Forecasting 2026 arXiv ForecastOcc: Vision-based Semantic Occupancy Forecasting Project Page
4D Occupancy Forecasting and Generation 2025 ICCV I2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting Code
4D Occupancy Forecasting 2025 CVPR Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting Code
4D Occupancy Forecasting 2025 ICLR OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework Code
4D Occupancy Generation 2025 ICLR DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes (Spotlight) Project Page
4D Occupancy Forecasting and Motion Planing 2025 ICLR Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving Code
4D Occupancy Forecasting and Generation 2025 AAAI Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving Project Page
4D Occupancy Forecasting and Motion Planing 2025 ICRA RenderWorld: World Model with Self-Supervised 3D Label -
4D Occupancy Forecasting, Motion Planing, and Scene Understanding 2025 ICRA Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models -
4D Occupancy Forecasting 2025 arXiv OccSTeP: Benchmarking 4D Occupancy Spatio-Temporal Persistence Code
4D Occupancy Forecasting 2025 arXiv SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model Code
4D Occupancy Forecasting and Motion Planing 2025 arXiv SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries Code
4D Occupancy Forecasting 2025 arXiv COME: Adding Scene-Centric Forecasting Control to Occupancy World Model Code
4D Occupancy Forecasting and Motion Planing 2025 arXiv Temporal Triplane Transformers as Occupancy World Models -
4D Occupancy Forecasting 2025 arXiv LEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation -
4D Occupancy Forecasting and Motion Planing 2024 ECCV OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving Project Page
4D Occupancy Forecasting 2024 CVPR UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Oral paper) Project Page
4D Representation Learning Framework 2024 CVPR DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving -
4D Occupancy Forecasting 2024 CVPR Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Code
4D Occupancy Forecasting 2024 AAAI Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence Project Page
4D Occupancy Forecasting and Motion Planing 2024 arXiv An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training -
4D Occupancy Forecasting and Generation 2024 arXiv DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model Project Page
4D Occupancy Forecasting 2024 arXiv FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving -
4D Occupancy Forecasting, Motion Planing, and Reasoning 2024 arXiv OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving -
4D Occupancy Generation 2024 arXiv OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving Project Page
4D Occupancy Forecasting 2023 CVPR Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting Project Page

Unified Autonomous Driving Algorithm Framework

Specific Tasks Year Venue Paper Title Link
Occupancy Forecasting, Reasoning 2026 arXiv SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning Project Page
Occupancy Prediction, 3D Object Detection, Segmentation 2025 AAAI M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving Code
Occupancy Prediction, Occupancy Forecasting, Planning, and Understanding 2025 arXiv DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning Code
Occupancy Prediction and Planning 2025 arXiv OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision -
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning 2024 CVPR DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving -
Occupancy Prediction, 3D Object Detection 2024 RA-L UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving Code
Occupancy Prediction, 3D Object Detection, HD map reconstruction 2024 arXiv GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Code
Occupancy Forecasting, Motion Planning 2024 arXiv Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving -
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning 2023 ICCV Scene as Occupancy Code

Cite The Survey

If you find our survey and repository useful for your research project, please consider citing our paper:

@misc{xu2024survey,
      title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective}, 
      author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
      year={2024},
      eprint={2405.05173},
      archivePrefix={arXiv}
}

Contact

If you have any questions, please feel free to get in touch:

If you are interested in joining us as a Ph.D. student to research computer vision, machine learning, please feel free to contact Professor Chau:

About

[Information Fusion 2025] A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors