MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Yury Zemlyanskiy; Michiel De Jong; Luke Vilnis; Santiago Ontanon; William Cohen; Sumit Sanghai; Joshua Ainslie

2024 NAACL NAACL 2024

MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Abstract

AbstractRetrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN (de Jong et al., 2023a) pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yury Zemlyanskiy , Michiel De Jong , Luke Vilnis , Santiago Ontanon , William Cohen , Sumit Sanghai , Joshua Ainslie

Topics

Deep Learning > Models > Generative Models Deep Learning > Models > Variational Inference Machine Learning > Application Areas > Model Compression

Keywords

vector quantization variational autoencoder retrieval augmentation representation compression memory-augmented model

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024