All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media

Jasabanta Patro; Bidisha Samanta; Saurabh Singh; Abhipsa Basu; Prithwish Mukherjee; Monojit Choudhury; Animesh Mukherjee

2017 EMNLP EMNLP 2017

All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media

Abstract

Abstractn this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman’s correlation values, our methods perform more than two times better (∼ 0.62) in predicting the borrowing likeliness compared to the best performing baseline (∼ 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88% of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — word borrowing

🐣 Hot Topic Early Bird — computational linguistics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Jasabanta Patro , Bidisha Samanta , Saurabh Singh , Abhipsa Basu , Prithwish Mukherjee , Monojit Choudhury , Animesh Mukherjee

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Multilingual NLP Computer Science > Applications > Information Retrieval Interdisciplinary > Linguistics > Computational Linguistics Interdisciplinary > Social > Social Media Analysis Artificial Intelligence > Core AI > Language

Keywords

text classification computational linguistics language identification social media word borrowing

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017