2017
ACL
ACL 2017
Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings
Abstract
AbstractWe propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information. Our method first exploits word embeddings in order to efficiently evaluate trillions of candidate sentence pairs and then a classifier to find the most reliable ones. We report significant improvements in domain adaptation for statistical machine translation when using a translation model trained on the sentence pairs extracted from in-domain monolingual corpora.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— pseudo-parallel sentence
🐣
Hot Topic Early Bird
— domain adaptation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
📈
Trend Setter
— Domain Adaptation
Authors
Topics
Machine Learning > Core Methods > Embedding Learning
Machine Learning > Application Areas > Domain Adaptation
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Resources & Methods > Text Representation
Machine Learning > Learning Types > Transfer Learning
Machine Learning > Learning Paradigms > Domain Adaptation