2022
EMNLP
EMNLP 2022
Conditional Language Models for Community-Level Linguistic Variation
Abstract
AbstractCommunity-level linguistic variation is a core concept in sociolinguistics. In this paper, we use conditioned neural language models to learn vector representations for 510 online communities. We use these representations to measure linguistic variation between commu-nities and investigate the degree to which linguistic variation corresponds with social connections between communities. We find that our sociolinguistic embeddings are highly correlated with a social network-based representation that does not use any linguistic input.
🌉
Interdisciplinary Bridge
— Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— sociolinguistic embedding
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Representation Learning
Machine Learning > Core Methods > Embedding Learning
Natural Language Processing > Generation > Language Modeling
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Learning Types > Representation Learning
Natural Language Processing > Resources & Methods > Language Modeling