2020
EMNLP
EMNLP 2020
Filtering Noisy Parallel Corpus using Transformers with Proxy Task Learning
Abstract
AbstractThis paper illustrates Huawei’s submission to the WMT20 low-resource parallel corpus filtering shared task. Our approach focuses on developing a proxy task learner on top of a transformer-based multilingual pre-trained language model to boost the filtering capability for noisy parallel corpora. Such a supervised task also helps us to iterate much more quickly than using an existing neural machine translation system to perform the same task. After performing empirical analyses of the finetuning task, we benchmark our approach by comparing the results with past years’ state-of-theart records. This paper wraps up with a discussion of limitations and future work. The scripts for this study will be made publicly available.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— proxy task learning
🐣
Hot Topic Early Bird
— multilingual language model
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Application Areas > Domain Adaptation
Deep Learning > Architectures > Transformers
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Generation > Machine Translation
Machine Learning > Learning Types > Classification
Deep Learning > Learning Types > Transfer Learning