KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection

Adeep Hande; Ruba Priyadharshini; Bharathi Raja Chakravarthi

2020 COLING COLING 2020

KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection

Abstract

AbstractWe introduce Kannada CodeMixed Dataset (KanCMD), a multi-task learning dataset for sentiment analysis and offensive language identification. The KanCMD dataset highlights two real-world issues from the social media text. First, it contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook. Second, it has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language. Hence, KanCMD is meant to stimulate research in under-resourced Kannada language on real-world code-mixed social media text and multi-task learning. KanCMD was obtained by crawling the YouTube, and a minimum of three annotators annotates each comment. We release KanCMD 7,671 comments for multitask learning research purpose.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Adeep Hande , Ruba Priyadharshini , Bharathi Raja Chakravarthi

Topics

Natural Language Processing > Applications > Text Classification Natural Language Processing > Applications > Sentiment Analysis Artificial Intelligence > Learning Paradigms > Multi-Agent Systems

Keywords

multi-task learning sentiment analysis offensive language detection code-mixed text under-resourced language

Download PDF

Related papers

Persuasiveness of News Editorials depending on Ideology and Personality 2020

A Graph Representation of Semi-structured Data for Web Question Answering 2020

Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations 2020

Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism 2020

End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network 2020