Simple Entity-Centric Questions Challenge Dense Retrievers

Christopher Sciavolino; Zexuan Zhong; Jinhyuk Lee; Danqi Chen

2021 EMNLP EMNLP 2021

Simple Entity-Centric Questions Challenge Dense Retrievers

Abstract

AbstractOpen-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples. However, in this paper, we demonstrate current dense models are not yet the holy grail of retrieval. We first construct EntityQuestions, a set of simple, entity-rich questions based on facts from Wikidata (e.g., “Where was Arve Furset born?”), and observe that dense retrievers drastically under-perform sparse methods. We investigate this issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training. We discuss two simple solutions towards addressing this critical problem. First, we demonstrate that data augmentation is unable to fix the generalization problem. Second, we argue a more robust passage encoder helps facilitate better question adaptation using specialized question encoders. We hope our work can shed light on the challenges in creating a robust, universal dense retriever that works well across different input distributions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — retrieval generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Christopher Sciavolino , Zexuan Zhong , Jinhyuk Lee , Danqi Chen

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Question Answering Computer Science > Applications > Information Retrieval Machine Learning > Learning Types > Transfer Learning Artificial Intelligence > Core AI > Information Retrieval Deep Learning > Learning Types > Retrieval-Augmented Generation

Keywords

domain adaptation data augmentation information retrieval retrieval generalization dense retrieval open-domain question answering sparse retrieval entity-rich question question adaptation passage encoder entity-centric question

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021