2020
EMNLP
EMNLP 2020
Truecasing German user-generated conversational text
Abstract
AbstractTrue-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display. In this paper, we investigate truecasing as an in- trinsic task and present several experiments on noisy user queries to a voice-controlled dia- log system. In particular, we compare a rule- based, an n-gram language model (LM) and a recurrent neural network (RNN) approaches, evaluating the results on a German Q&A cor- pus and reporting accuracy for different case categories. We show that while RNNs reach higher accuracy especially on large datasets, character n-gram models with interpolation are still competitive, in particular on mixed- case words where their fall-back mechanisms come into play.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— case restoration
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Classification
Natural Language Processing > Applications > Text Generation
Deep Learning > Learning Types > Deep Learning
Deep Learning > Learning Types > Representation Learning
Deep Learning > Architectures > Recurrent Neural Networks
Natural Language Processing > Applications > Text Processing