Sanskrit Sandhi Splitting using seq2(seq)2

Rahul Aralikatte; Neelamadhav Gantayat; Naveen Panwar; Anush Sankaran; Senthil Mani

2018 EMNLP EMNLP 2018

Sanskrit Sandhi Splitting using seq2(seq)2

Abstract

AbstractIn Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — compound word

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rahul Aralikatte , Neelamadhav Gantayat , Naveen Panwar , Anush Sankaran , Senthil Mani

Topics

Deep Learning > Architectures > Neural Networks Machine Learning > Core Methods > Sequence Labeling Artificial Intelligence > Core AI > Natural Language Processing Deep Learning > Architectures > Recurrent Neural Networks Deep Learning > Learning Types > Sequence Modeling

Keywords

chinese word segmentation recurrent neural network sequence-to-sequence model compound word morpheme analysis sandhi splitting morpheme splitting sanskrit processing

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018