M2T: Masking Transformers Twice for Faster Decoding

Fabian Mentzer; Eirikur Agustson; Michael Tschannen

2023 ICCV ICCV 2023

M2T: Masking Transformers Twice for Faster Decoding

Abstract

We show how bidirectional transformers trained for masked token prediction can be applied to neural image compression to achieve state-of-the-art results. Such models were previously used for image_generation_ by progressive sampling groups of masked tokens according to uncertainty-adaptive schedules. Unlike these works, we demonstrate that predefined, deterministic schedules perform as well or better for image compression. This insight allows us to use masked attention during training in addition to masked inputs, and activation caching during inference, to significantly speed up our models (4x higher inference speed) at a small increase in bitrate.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — bitrate optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Fabian Mentzer , Eirikur Agustson , Michael Tschannen

Topics

Deep Learning > Architectures > Transformers Computer Vision > Processing > Image Restoration

Keywords

masked token prediction bidirectional transformer neural image compression activation caching bitrate optimization

Download PDF

Related papers

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework 2023

Periodically Exchange Teacher-Student for Source-Free Object Detection 2023

Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations 2023

Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles 2023

3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation 2023