Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

Vivek Srivastava; Mayank Singh

2021 NAACL NAACL 2021

Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

Abstract

AbstractCode-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.

🧭 Keyword Pioneer — multilingual text processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Vivek Srivastava , Mayank Singh

Topics

Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

text complexity code-mixed text multilingual text processing code-mixing metrics

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021