2018 NAACL NAACL 2018

Cross-Lingual Learning-to-Rank with Shared Representations

Abstract

AbstractCross-lingual information retrieval (CLIR) is a document retrieval task where the documents are written in a language different from that of the userโ€™s query. This is a challenging problem for data-driven approaches due to the general lack of labeled training data. We introduce a large-scale dataset derived from Wikipedia to support CLIR research in 25 languages. Further, we present a simple yet effective neural learning-to-rank model that shares representations across languages and reduces the data requirement. This model can exploit training data in, for example, Japanese-English CLIR to improve the results of Swahili-English CLIR.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Artificial Intelligence and Natural Language Processing
๐Ÿฃ Hot Topic Early Bird โ€” multilingual retrieval
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio