HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

Brian Thompson; Rebecca Knowles; Xuan Zhang; Huda Khayrallah; Kevin Duh; Philipp Koehn

2019 EMNLP EMNLP 2019

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

Abstract

AbstractBilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — lexicon integration

🐣 Hot Topic Early Bird — constrained decoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Brian Thompson , Rebecca Knowles , Xuan Zhang , Huda Khayrallah , Kevin Duh , Philipp Koehn

Topics

Natural Language Processing > Applications > Machine Translation Machine Learning > Core Methods > Multi-Task Learning

Keywords

neural machine translation constrained decoding continued training lexicon integration bilingual lexicon

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019