Subcharacter Information in Japanese Embeddings: When Is It Worth It?

Marzena Karpinska; Bofang Li; Anna Rogers; Aleksandr Drozd

2018 ACL ACL 2018

Subcharacter Information in Japanese Embeddings: When Is It Worth It?

Abstract

AbstractLanguages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.

❓ The Questioner

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning

🧭 Keyword Pioneer — analogy dataset

🐣 Hot Topic Early Bird — japanese language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marzena Karpinska , Bofang Li , Anna Rogers , Aleksandr Drozd

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Embedding Learning Interdisciplinary > Linguistics > Computational Linguistics

Keywords

word embedding character-level model japanese language analogy dataset subcharacter information

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018