2017
ACL
ACL 2017
Unsupervised Text Segmentation Based on Native Language Characteristics
Abstract
AbstractMost work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.
🧭
Keyword Pioneer
— unsupervised text segmentation
🐣
Hot Topic Early Bird
— language model
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
Authors
Topics
Machine Learning > Learning Types > Unsupervised Learning
Machine Learning > Optimization & Theory > Bayesian Inference
Natural Language Processing > Resources & Methods > Text Representation
Artificial Intelligence > Bayesian & Probabilistic > Bayesian Inference
Machine Learning > Bayesian & Probabilistic > Bayesian Inference
Natural Language Processing > Applications > Text Processing