2018
EMNLP
EMNLP 2018
Juman++: A Morphological Analysis Toolkit for Scriptio Continua
Abstract
AbstractWe present a three-part toolkit for developing morphological analyzers for languages without natural word boundaries. The first part is a C++11/14 lattice-based morphological analysis library that uses a combination of linear and recurrent neural net language models for analysis. The other parts are a tool for exposing problems in the trained model and a partial annotation tool. Our morphological analyzer of Japanese achieves new SOTA on Jumandic-based corpora while being 250 times faster than the previous one. We also perform a small experiment and quantitive analysis and experience of using development tools. All components of the toolkit is open source and available under a permissive Apache 2 License.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Natural Language Processing
🧭
Keyword Pioneer
— lattice-based analysis
🐣
Hot Topic Early Bird
— japanese language
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Natural Language Processing > Generation > Language Modeling
Natural Language Processing > Resources & Methods > Text Representation
Artificial Intelligence > Core AI > Natural Language Processing
Deep Learning > Architectures > Recurrent Neural Networks
Natural Language Processing > Applications > Natural Language Processing