2018
EMNLP
EMNLP 2018
Code-switched Language Models Using Dual RNNs and Same-Source Pretraining
Abstract
AbstractThis work focuses on building language models (LMs) for code-switched text. We propose two techniques that significantly improve these LMs: 1) A novel recurrent neural network unit with dual components that focus on each language in the code-switched text separately 2) Pretraining the LM using synthetic text from a generative model estimated using the training data. We demonstrate the effectiveness of our proposed techniques by reporting perplexities on a Mandarin-English task and derive significant reductions in perplexity.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Natural Language Processing
🧭
Keyword Pioneer
— synthetic text generation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Deep Learning > Architectures > Neural Networks
Deep Learning > Techniques > Pretraining
Natural Language Processing > Generation > Language Modeling
Natural Language Processing > Resources & Methods > Multilingual NLP
Natural Language Processing > Resources & Methods > Language Modeling
Artificial Intelligence > Core AI > Language