SentNoB: A Dataset for Analysing Sentiment on Noisy Bangla Texts

Khondoker Ittehadul Islam; Sudipta Kar; Md Saiful Islam; Mohammad Ruhul Amin

2021 EMNLP EMNLP 2021

SentNoB: A Dataset for Analysing Sentiment on Noisy Bangla Texts

Abstract

AbstractIn this paper, we propose an annotated sentiment analysis dataset made of informally written Bangla texts. This dataset comprises public comments on news and videos collected from social media covering 13 different domains, including politics, education, and agriculture. These comments are labeled with one of the polarity labels, namely positive, negative, and neutral. One significant characteristic of the dataset is that each of the comments is noisy in terms of the mix of dialects and grammatical incorrectness. Our experiments to develop a benchmark classification system show that hand-crafted lexical features provide superior performance than neural network and pretrained language models. We have made the dataset and accompanying models presented in this paper publicly available at https://git.io/JuuNB.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — bangla text

🐣 Hot Topic Early Bird — bangla language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Khondoker Ittehadul Islam , Sudipta Kar , Md Saiful Islam , Mohammad Ruhul Amin

Topics

Natural Language Processing > Understanding > Sentiment Analysis Natural Language Processing > Applications > Sentiment Analysis Machine Learning > Learning Types > Classification

Keywords

sentiment analysis text classification social media polarity classification noisy text bangla language bangla text

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021