StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

Zheng Zhang; Lihe Yang; Tianyu Yang; Chaohui Yu; Xiaoyang Guo; Yixing Lao; Hengshuang Zhao

2025 ICCV ICCV 2025

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

Abstract

Recent advances in monocular depth estimation significantly improve robustness and accuracy. However, relative depth models exhibit flickering and 3D inconsistency in video data, limiting 3D reconstruction applications. We introduce StableDepth, a scene-consistent and scale-invariant depth estimation method achieving scene-level 3D consistency. Our dual-decoder architecture learns from large-scale unlabeled video data, enhancing generalization and reducing flickering. Unlike previous methods requiring full video sequences, StableDepth enables online inference at 13x faster speed, achieving significant improvements across benchmarks with comparable temporal consistency to video diffusion-based estimators.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — scene-consistent depth

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zheng Zhang , Lihe Yang , Tianyu Yang , Chaohui Yu , Xiaoyang Guo , Yixing Lao , Hengshuang Zhao

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Depth Estimation Deep Learning > Learning Types > Deep Learning Computer Vision > Core AI > Computer Vision

Keywords

3d reconstruction monocular depth estimation scene consistency video depth scene-consistent depth scale-invariant depth dual-decoder architecture

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025