Multilingual Entity and Relation Extraction Dataset and Model

Alessandro Seganti; Klaudia Firląg; Helena Skowronska; Michał Satława; Piotr Andruszkiewicz

2021 EACL EACL 2021

Multilingual Entity and Relation Extraction Dataset and Model

Abstract

AbstractWe present a novel dataset and model for a multilingual setting to approach the task of Joint Entity and Relation Extraction. The SMiLER dataset consists of 1.1 M annotated sentences, representing 36 relations, and 14 languages. To the best of our knowledge, this is currently both the largest and the most comprehensive dataset of this type. We introduce HERBERTa, a pipeline that combines two independent BERT models: one for sequence classification, and the other for entity tagging. The model achieves micro F1 81.49 for English on this dataset, which is close to the current SOTA on CoNLL, SpERT.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Alessandro Seganti , Klaudia Firląg , Helena Skowronska , Michał Satława , Piotr Andruszkiewicz

Topics

Natural Language Processing > Applications > Information Extraction Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

multilingual nlp relation extraction named entity recognition sequence classification entity extraction

Download PDF

Related papers

Joint Coreference Resolution and Character Linking for Multiparty Conversation 2021

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering 2021

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO 2021

Representations for Question Answering from Documents with Tables and Text 2021

Gender and Racial Fairness in Depression Research using Social Media 2021