IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs

Reem Suwaileh; Tamer Elsayed; Muhammad Imran

2023 EMNLP EMNLP 2023

IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs

Abstract

AbstractExtracting and disambiguating geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, locating incidents for planning rescue activities and affected people for evacuation. Nevertheless, the dearth of resources and tools hinders the development and evaluation of Location Mention Disambiguation (LMD) models in the disaster management domain. Consequently, the LMD task is greatly understudied, especially for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-D, the largest to date English and the first Arabic public LMD datasets. Additionally, we introduce a modified hierarchical evaluation framework that offers a lenient and nuanced evaluation of LMD systems. We further benchmark IDRISI-D datasets using representative baselines and show the competitiveness of BERT-based models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Data Science & Analytics and Natural Language Processing

🧭 Keyword Pioneer — location mention disambiguation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Reem Suwaileh , Tamer Elsayed , Muhammad Imran

Topics

Natural Language Processing > Understanding > Named Entity Recognition Natural Language Processing > Applications > Information Extraction Natural Language Processing > Resources & Methods > Multilingual NLP Artificial Intelligence > Core AI > Information Retrieval Data Science & Analytics > Applications > Social Media Analysis

Keywords

social media analysis benchmark dataset social media bert-based model arabic natural language processing disaster management location mention disambiguation geolocation extraction location disambiguation

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023