2018
EMNLP
EMNLP 2018
UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
Abstract
AbstractWe present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training classic binary classification models over an artificially produced binary classification dataset derived from a high-quality translation set, and a minimalistic set of 6 semantic distance features that rely only on easy-to-gather resources. We rank translations by their probability for the “good” label. Our results show that logistic regression pairs best with our approach, yielding more consistent results throughout the different settings evaluated.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— semantic distance
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Classification
Machine Learning > Optimization & Theory > Optimization
Machine Learning > Application Areas > Efficient Computing
Natural Language Processing > Applications > Information Retrieval
Natural Language Processing > Applications > Machine Translation
Machine Learning > Learning Types > Supervised Learning
Machine Learning > Learning Types > Classification