2021
EMNLP
EMNLP 2021
Evaluation of Summarization Systems across Gender, Age, and Race
Abstract
AbstractSummarization systems are ultimately evaluated by human annotators and raters. Usually, annotators and raters do not reflect the demographics of end users, but are recruited through student populations or crowdsourcing platforms with skewed demographics. For two different evaluation scenarios – evaluation against gold summaries and system output ratings – we show that summary evaluation is sensitive to protected attributes. This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— annotator demographics
🐣
Hot Topic Early Bird
— summarization evaluation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Responsible AI
Machine Learning > Application Areas > Fairness
Natural Language Processing > Generation > Summarization
Artificial Intelligence > Core AI > Fairness
Natural Language Processing > Applications > Summarization
Machine Learning > Learning Types > Fairness