2018
EMNLP
EMNLP 2018
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
Abstract
AbstractWe posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.
🧭
Keyword Pioneer
— parallel corpus filtering
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing