Retrieving Multimodal Information for Augmented Generation: A Survey

Ruochen Zhao; Hailin Chen; Weishi Wang; Fangkai Jiao; Xuan Long Do; Chengwei Qin; Bosheng Ding; Xiaobao Guo; Minzhi Li; Xingxuan Li; Shafiq Joty

2023 EMNLP EMNLP 2023

Retrieving Multimodal Information for Augmented Generation: A Survey

Abstract

AbstractAs Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs’ generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods’ applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

📈 Trend Setter — Multi-Modal Learning

🧭 Keyword Pioneer — augmented generation

🐣 Hot Topic Early Bird — multimodal retrieval

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ruochen Zhao , Hailin Chen , Weishi Wang , Fangkai Jiao , Xuan Long Do , Chengwei Qin , Bosheng Ding , Xiaobao Guo , Minzhi Li , Xingxuan Li , Shafiq Joty

Topics

Artificial Intelligence > Core AI > Foundation Models Deep Learning > Architectures > Transformers Natural Language Processing > Generation > Language Modeling Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models Natural Language Processing > Generation > Retrieval-Augmented Generation Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Retrieval-Augmented Generation Deep Learning > Models > Multi-Modal Learning

Keywords

multimodal learning multimodal retrieval generative model retrieval-augmented generation knowledge retrieval cross-modal reasoning large language model augmented generation multimodal knowledge

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023