2019
ACL
ACL 2019
Dual Monolingual Cross-Entropy Delta Filtering of Noisy Parallel Data
Abstract
AbstractWe introduce a purely monolingual approach to filtering for parallel data from a noisy corpus in a low-resource scenario. Our work is inspired by Junczysdowmunt:2018, but we relax the requirements to allow for cases where no parallel data is available. Our primary contribution is a dual monolingual cross-entropy delta criterion modified from Cynical data selection Axelrod:2017, and is competitive (within 1.8 BLEU) with the best bilingual filtering method when used to train SMT systems. Our approach is featherweight, and runs end-to-end on a standard laptop in three hours.
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— noisy data filtering
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Unsupervised Learning
Machine Learning > Optimization & Theory > Optimization
Natural Language Processing > Applications > Machine Translation
Data Science & Analytics > Methods > Data Mining
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Learning Types > Data Augmentation