Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

Yunsu Kim; Julian Schamper; Hermann Ney

2017 EACL EACL 2017

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

Abstract

AbstractWe address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

🐣 Hot Topic Early Bird — expectation maximization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yunsu Kim , Julian Schamper , Hermann Ney

Topics

Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Optimization

Keywords

unsupervised learning sparse coding machine translation expectation maximization word embedding

Download PDF

Related papers

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages 2017

Learning and Knowledge Transfer with Memory Networks for Machine Comprehension 2017

Is this a Child, a Girl or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings 2017

Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit 2017

Assessing Convincingness of Arguments in Online Debates with Limited Number of Features 2017