2018 ACL ACL 2018

Subcharacter Information in Japanese Embeddings: When Is It Worth It?

Abstract

AbstractLanguages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.

The Questioner
🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning
🧭 Keyword Pioneer — analogy dataset
🐣 Hot Topic Early Bird — japanese language
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio