Reference-based Weak Supervision for Answer Sentence Selection using Web Data

Vivek Krishnamurthy; Thuy Vu; Alessandro Moschitti

2021 EMNLP EMNLP 2021

Reference-based Weak Supervision for Answer Sentence Selection using Web Data

Abstract

AbstractAnswer Sentence Selection (AS2) models are core components of efficient retrieval-based Question Answering (QA) systems. We present the Reference-based Weak Supervision (RWS), a fully automatic large-scale data pipeline that harvests high-quality weakly- supervised answer sentences from Web data, only requiring a question-reference pair as input. We evaluated the quality of the RWS-derived data by training TANDA models, which are the state of the art for AS2. Our results show that the data consistently bolsters TANDA on three different datasets. In particular, we set the new state of the art for AS2 to P@1=90.1%, and MAP=92.9%, on WikiQA. We record similar performance gains of RWS on a much larger dataset named Web-based Question Answering (WQA).

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Vivek Krishnamurthy , Thuy Vu , Alessandro Moschitti

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Question Answering Machine Learning > Learning Paradigms > Weakly Supervised Learning

Keywords

transfer learning question answering weak supervision data pipeline answer sentence selection web datum retrieval-based question answering

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021