Let’s Stop Incorrect Comparisons in End-to-end Relation Extraction!

Bruno Taillé; Vincent Guigue; Geoffrey Scoutheeten; Patrick Gallinari

2020 EMNLP EMNLP 2020

Let’s Stop Incorrect Comparisons in End-to-end Relation Extraction!

Abstract

AbstractDespite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the most common mistake’s impact and evaluate it leads to overestimating the final RE performance by around 5% on ACE05. We also seize this opportunity to study the unexplored ablations of two recent developments: the use of language model pretraining (specifically BERT) and span-level NER. This meta-analysis emphasizes the need for rigor in the report of both the evaluation setting and the dataset statistics. We finally call for unifying the evaluation setting in end-to-end RE.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — span-based ner

🐣 Hot Topic Early Bird — language model pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bruno Taillé , Vincent Guigue , Geoffrey Scoutheeten , Patrick Gallinari

Topics

Natural Language Processing > Applications > Information Extraction Machine Learning > Optimization & Theory > Evaluation Machine Learning > Learning Types > Evaluation

Keywords

benchmark evaluation relation extraction model evaluation named entity recognition evaluation methodology empirical evaluation model comparison benchmark dataset language model pretraining bert pretraining span-based ner

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020