2019
ACL
ACL 2019
JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages
Abstract
AbstractViable cross-lingual transfer critically depends on the availability of parallel texts. Shortage of such resources imposes a development and evaluation bottleneck in multilingual processing. We introduce JW300, a parallel corpus of over 300 languages with around 100 thousand parallel sentences per language pair on average. In this paper, we present the resource and showcase its utility in experiments with cross-lingual word embedding induction and multi-source part-of-speech projection.
🧭
Keyword Pioneer
— part-of-speech projection
🐣
Hot Topic Early Bird
— low-resource language
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio
🌉
Interdisciplinary Bridge
— Interdisciplinary and Machine Learning and Natural Language Processing
Authors
Topics
Natural Language Processing > Understanding > Part-of-Speech Tagging
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Resources & Methods > Multilingual NLP
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Learning Paradigms > Transfer Learning