MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding

Huu-Tai Phung; Zong-Lin Gao; Yi-Chen Yao; Kuan-Wei Ho; Yi-Hsin Chen; Yu-Hsiang Lin; Alessandro Gnutti; Wen-Hsiao Peng

2025 ICCV ICCV 2025

MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding

Abstract

This work, termed MH-LVC, presents a multi-hypothesis temporal prediction scheme that employs long- and short-term reference frames in a conditional residual video coding framework. Recent temporal context mining approaches to conditional video coding offer superior coding performance. However, the need to store and access a large amount of implicit contextual information extracted from past decoded frames in decoding a video frame poses a challenge due to excessive memory access. Our MH-LVC overcomes this issue by storing multiple long- and short-term reference frames but limiting the number of reference frames used at a time for temporal prediction to two. Our decoded frame buffer management allows the encoder to flexibly utilize the long-term key frames to mitigate temporal cascading errors and the short-term reference frames to minimize prediction errors. Moreover, our buffering scheme enables the temporal prediction structure to be adapted to individual input videos. While this flexibility is common in traditional video codecs, it has not been fully explored for learned video codecs. Extensive experiments show that the proposed method outperforms VTM-17.0 under the low-delay B configuration in terms of PSNR-RGB across commonly used test datasets, and performs comparably to the state-of-the-art learned codecs (e.g. DCVC-FM) while requiring less decoded frame buffer and similar decoding time.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — decoded frame buffer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Huu-Tai Phung , Zong-Lin Gao , Yi-Chen Yao , Kuan-Wei Ho , Yi-Hsin Chen , Yu-Hsiang Lin , Alessandro Gnutti , Wen-Hsiao Peng

Topics

Deep Learning > Techniques Computer Vision > Generation > Video Generation Computer Vision > Processing > Video Processing Deep Learning > Optimization & Theory > Optimization Deep Learning > Learning Types > Deep Learning

Keywords

reference frame video compression video coding temporal prediction residual coding multi-hypothesis prediction learned video codec learned video coding decoded frame buffer

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025