2020
EMNLP
EMNLP 2020
Exploiting Sentence Order in Document Alignment
Abstract
AbstractWe present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring. Our method results in 61% relative reduction in error compared to the best previously published result on the WMT16 document alignment shared task. Our method improves downstream MT performance on web-scraped Sinhala–English documents from ParaCrawl, outperforming the document alignment method used in the most recent ParaCrawl release. It also outperforms a comparable corpora method which uses the same multilingual embeddings, demonstrating that exploiting sentence order is beneficial even if the end goal is sentence-level bitext.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— sentence order
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Embedding Learning
Natural Language Processing > Applications > Information Extraction
Natural Language Processing > Applications > Information Retrieval
Natural Language Processing > Applications > Machine Translation
Machine Learning > Learning Types > Representation Learning
Natural Language Processing > Generation > Machine Translation