What’s Wrong with Hebrew NLP? And How to Make it Right

Reut Tsarfaty; Shoval Sadde; Stav Klein; Amit Seker

2019 IJCNLP IJCNLP 2019

What’s Wrong with Hebrew NLP? And How to Make it Right

Abstract

AbstractFor languages with simple morphology such as English, automatic annotation pipelines such as spaCy or Stanford’s CoreNLP successfully serve projects in academia and the industry. For many morphologically-rich languages (MRLs), similar pipelines show sub-optimal performance that limits their applicability for text analysis in research and the industry. The sub-optimal performance is mainly due to errors in early morphological disambiguation decisions, that cannot be recovered later on in the pipeline, yielding incoherent annotations on the whole. This paper describes the design and use of the ONLP suite, a joint morpho-syntactic infrastructure for processing Modern Hebrew texts. The joint inference over morphology and syntax substantially limits error propagation, and leads to high accuracy. ONLP provides rich and expressive annotations which already serve diverse academic and commercial needs. Its accompanying demo further serves educational activities, introducing Hebrew NLP intricacies to researchers and non-researchers alike.

❓ The Questioner

🌉 Interdisciplinary Bridge — Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — morpho-syntactic parsing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Reut Tsarfaty , Shoval Sadde , Stav Klein , Amit Seker

Topics

Natural Language Processing > Understanding > Parsing Interdisciplinary > Linguistics > Computational Linguistics

Keywords

dependency parsing annotation pipeline morphological disambiguation hebrew language morpho-syntactic parsing

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019