Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Shyam Upadhyay; Jordan Kodner; Dan Roth

2018 EMNLP EMNLP 2018

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Abstract

AbstractGenerating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — cross-lingual candidate generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Shyam Upadhyay , Jordan Kodner , Dan Roth

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Paradigms > Few-Shot Learning Natural Language Processing > Applications > Named Entity Recognition Artificial Intelligence > Core AI > Information Extraction

Keywords

entity linking text generation low-resource language bootstrapping algorithm cross-lingual candidate generation transliteration generation cross-lingual candidate

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018