Sanskrit Word Segmentation Using Character-level Recurrent and Convolutional Neural Networks

Oliver Hellwig; Sebastian Nehrdich

2018 EMNLP EMNLP 2018

Sanskrit Word Segmentation Using Character-level Recurrent and Convolutional Neural Networks

Abstract

AbstractThe paper introduces end-to-end neural network models that tokenize Sanskrit by jointly splitting compounds and resolving phonetic merges (Sandhi). Tokenization of Sanskrit depends on local phonetic and distant semantic features that are incorporated using convolutional and recurrent elements. Contrary to most previous systems, our models do not require feature engineering or extern linguistic resources, but operate solely on parallel versions of raw and segmented text. The models discussed in this paper clearly improve over previous approaches to Sanskrit word segmentation. As they are language agnostic, we will demonstrate that they also outperform the state of the art for the related task of German compound splitting.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

📈 Trend Setter — Applications

🧭 Keyword Pioneer — sandhi splitting

🐣 Hot Topic Early Bird — word segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Oliver Hellwig , Sebastian Nehrdich

Topics

Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications Artificial Intelligence > Core AI > Language Deep Learning > Architectures > Recurrent Neural Networks Natural Language Processing > Applications > Text Processing

Keywords

natural language processing word segmentation convolutional neural network recurrent neural network character-level model character-level neural network sandhi splitting

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018