Learning Subjective Label Distributions via Sociocultural Descriptors

Mohammed Fayiz Parappan; Ricardo Henao

2025 EMNLP EMNLP 2025

Learning Subjective Label Distributions via Sociocultural Descriptors

Abstract

AbstractSubjectivity in NLP tasks, _e.g._, toxicity classification, has emerged as a critical challenge precipitated by the increased deployment of NLP systems in content-sensitive domains. Conventional approaches aggregate annotator judgements (labels), ignoring minority perspectives, and overlooking the influence of the sociocultural context behind such annotations. We propose a framework where subjectivity in binary labels is modeled as an empirical distribution accounting for the variation in annotators through human values extracted from sociocultural descriptors using a language model. The framework also allows for downstream tasks such as population and sociocultural group-level majority label prediction. Experiments on three toxicity datasets covering human-chatbot conversations and social media posts annotated with diverse annotator pools demonstrate that our approach yields well-calibrated toxicity distribution predictions across binary toxicity labels, which are further used for majority label prediction across cultural subgroups, improving over existing methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — subjective label

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohammed Fayiz Parappan , Ricardo Henao

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Applications > Text Classification Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Artificial Intelligence > Core AI > Fairness Machine Learning > Learning Types > Classification

Keywords

probabilistic modeling annotator aggregation language model toxicity classification label distribution annotator disagreement annotation bia cultural factor subjective label sociocultural descriptor

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025