Combining Frame and GOP Embeddings for Neural Video Representation

Jens Eirik Saethre; Roberto Azevedo; Christopher Schroers

2024 CVPR CVPR 2024

Combining Frame and GOP Embeddings for Neural Video Representation

Abstract

Implicit neural representations (INRs) were recently proposed as a new video compression paradigm with existing approaches performing on par with HEVC. However such methods only perform well in limited settings e.g. specific model sizes fixed aspect ratios and low-motion videos. We address this issue by proposing T-NeRV a hybrid video INR that combines frame-specific embeddings with GOP-specific features providing a lever for content-specific fine-tuning. We employ entropy-constrained training to jointly optimize our model for rate and distortion and demonstrate that T-NeRV can thereby automatically adjust this lever during training effectively fine-tuning itself to the target content. We evaluate T-NeRV on the UVG dataset where it achieves state-of-the-art results on the video representation task outperforming previous works by up to 3dB PSNR on challenging high-motion sequences. Further our method improves on the compression performance of previous methods and is the first video INR to outperform HEVC on all UVG sequences.

🌉 Interdisciplinary Bridge — Computer Science and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — gop embedding

🐣 Hot Topic Early Bird — rate-distortion optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jens Eirik Saethre , Roberto Azevedo , Christopher Schroers

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Models > Generative Models Computer Vision > Processing > Video Processing Computer Science > Applications > Computer Graphics Deep Learning > Learning Types > Representation Learning Deep Learning > Techniques > Representation Learning

Keywords

embedding learning entropy coding rate distortion neural representation implicit neural representation video frame video representation rate-distortion optimization video compression frame embedding gop embedding entropy-constrained training

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024