Exploiting Continuous Motion Clues for Vision-Based Occupancy Prediction

Haoran Xu; Peixi Peng; Xinyi Zhang; Guang Tan; Yaokun Li; Shuaixian Wang; Luntong Li

2025 AAAI AAAI 2025

Exploiting Continuous Motion Clues for Vision-Based Occupancy Prediction

Abstract

Abstract Occupancy networks aim to reconstruct the surroundings with occupied semantic voxels. However, frequent object occlusions often occur in dynamic real-world scenarios, which cannot be captured by independent frames. Most existing occupancy networks generate results without explicitly considering past occupancy states and continuous visual changes over time, limiting their temporal accuracy. We tackle it by treating the task from a new continuous updating perspective, which considers historical data and continuous motion clues. We propose a new approach termed Continuous Motion clue exploitation for Occupancy Prediction (CMOP), which incorporates three key designs: (i) Propagator: which forecasts future occupancy states based on historical data; (ii) Tracker: which updates the occupancy on a per-frame basis using dynamic visual motion information; and (iii) Fuser: which aggregates results from the Propagator and Tracker into more robust and accurate occupancy results. Experiments on several benchmarks demonstrate that CMOP outperforms state-of-the-art baselines.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — semantic voxel

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoran Xu , Peixi Peng , Xinyi Zhang , Guang Tan , Yaokun Li , Shuaixian Wang , Luntong Li

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Scene Understanding Computer Vision > Domain-Specific > Autonomous Driving Deep Learning > Learning Types > Deep Learning

Keywords

temporal modeling scene understanding motion estimation autonomous driving 3d perception occupancy prediction 3d occupancy semantic voxel

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025