2022
ACL
ACL 2022
Towards Responsible Natural Language Annotation for the Varieties of Arabic
Abstract
AbstractWhen building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, multidialectal languages. This work is informed by a study on Arabic annotation of social media content.
🌉
Interdisciplinary Bridge
— Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— dialectal nlp
🐣
Hot Topic Early Bird
— dataset creation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Weakly Supervised Learning
Natural Language Processing > Resources & Methods > Multilingual NLP
Natural Language Processing > Resources & Methods > Text Representation
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Learning Paradigms > Transfer Learning
Natural Language Processing > Applications > Natural Language Understanding