FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

Gene Chou; Wenqi Xian; Guandao Yang; Mohamed Abdelfattah; Bharath Hariharan; Noah Snavely; Ning Yu; Paul Debevec

2025 ICCV ICCV 2025

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

Abstract

A versatile video depth estimation model should be consistent and accurate across frames, produce high-resolution depth maps, and support real-time streaming. We propose a method, FlashDepth, that satisfies all three requirements, performing depth estimation for a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these capabilities are enabled with relatively little data and training. We validate our approach across multiple unseen datasets against state-of-the-art depth models, and find that our method outperforms them in terms of boundary sharpness and speed by a significant margin, while maintaining competitive accuracy. We hope our model will enable various applications that require high-resolution depth, such as visual effects editing, and online decision-making, such as robotics. We release all code and model weights at https://github.com/Eyeline-Research/FlashDepth.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Gene Chou , Wenqi Xian , Guandao Yang , Mohamed Abdelfattah , Bharath Hariharan , Noah Snavely , Ning Yu , Paul Debevec

Topics

Machine Learning > Application Areas > Efficient Computing Computer Vision > Analysis > Depth Estimation Computer Vision > Processing > Video Processing

Keywords

depth estimation video processing high resolution real-time streaming streaming video

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025