2022
ACL
ACL 2022
TopWORDS-Seg: Simultaneous Text Segmentation and Word Discovery for Open-Domain Chinese Texts via Bayesian Inference
Abstract
AbstractProcessing open-domain Chinese texts has been a critical bottleneck in computational linguistics for decades, partially because text segmentation and word discovery often entangle with each other in this challenging scenario. No existing methods yet can achieve effective text segmentation and word discovery simultaneously in open domain. This study fills in this gap by proposing a novel method called TopWORDS-Seg based on Bayesian inference, which enjoys robust performance and transparent interpretation when no training corpus and domain vocabulary are available. Advantages of TopWORDS-Seg are demonstrated by a series of experimental studies.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— open-domain analysis
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning
Machine Learning > Learning Types > Unsupervised Learning
Natural Language Processing > Understanding > Syntax
Interdisciplinary > Linguistics > Computational Linguistics
Artificial Intelligence > Bayesian & Probabilistic > Bayesian Inference
Machine Learning > Bayesian & Probabilistic > Bayesian Inference
Artificial Intelligence > Core AI > Language