2022 EMNLP EMNLP 2022

Data-Efficient Auto-Regressive Document Retrieval for Fact Verification

Abstract

AbstractDocument retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification. Sources of textual knowledge such as Wikipedia articles condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given an input instance. However, this method requires supervision in the form of human annotation to label which Wikipedia pages contain appropriate context. This paper introduces a distant-supervision method that does not require any annotation train auto-regressive retrievers that attain competitive R-Precision and Recall in a zero-shot setting. Furthermore we show that with task-specific supervised fine-tuning, auto-regressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision using less than 1/4 of the annotated data. We release all code and models

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — auto-regressive retriever
🐣 Hot Topic Early Bird — document retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors