Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Zhengfei Kuang; Tianyuan Zhang; Kai Zhang; Hao Tan; Sai Bi; Yiwei Hu; Zexiang Xu; Milos Hasan; Gordon Wetzstein; Fujun Luan

2025 CVPR CVPR 2025

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Abstract

We present Buffer Anytime, a framework for estimation of depth and normal maps (which we call geometric buffers) from video that eliminates the need for paired video--depth and video--normal training data. Instead of relying on large-scale annotated video datasets, we demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints. Our zero-shot training strategy combines state-of-the-art image estimation models based on optical flow smoothness through a hybrid loss function, implemented via a lightweight temporal attention architecture. Applied to leading image models like Depth Anything V2 and Marigold-E2E-FT, our approach significantly improves temporal consistency while maintaining accuracy. Experiments show that our method not only outperforms image-based approaches but also achieves results comparable to state-of-the-art video models trained on large-scale paired video datasets, despite using no such paired video data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhengfei Kuang , Tianyuan Zhang , Kai Zhang , Hao Tan , Sai Bi , Yiwei Hu , Zexiang Xu , Milos Hasan , Gordon Wetzstein , Fujun Luan

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Depth Estimation Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Zero-Shot Learning Computer Vision > Processing > Motion Estimation Computer Vision > Processing > Depth Estimation

Keywords

zero-shot learning depth estimation optical flow temporal consistency video processing normal estimation

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025