CUET_Novice@DravidianLangTech 2025: Abusive Comment Detection in Malayalam Text Targeting Women on Social Media Using Transformer-Based Models

Farjana Alam Tofa; Khadiza Sultana Sayma; Md Osama; Ashim Dey

2025 NAACL NAACL 2025

CUET_Novice@DravidianLangTech 2025: Abusive Comment Detection in Malayalam Text Targeting Women on Social Media Using Transformer-Based Models

Abstract

AbstractSocial media has become a widely used platform for communication and entertainment, but it has also become a space where abuseand harassment can thrive. Women, in particular, face hateful and abusive comments that reflect gender inequality. This paper discussesour participation in the Abusive Text Targeting Women in Dravidian Languages shared task at DravidianLangTech@NAACL 2025, whichfocuses on detecting abusive text targeting women in Malayalam social media comments. The shared task provided a dataset of YouTubecomments in Tamil and Malayalam, focusing on sensitive and controversial topics where abusive behavior is prevalent. Our participationfocused on the Malayalam dataset, where the goal was to classify comments into these categories accurately. Malayalam-BERT achievedthe best performance on the subtask, securing 3rd place with a macro f1-score of 0.7083, highlighting the effectiveness of transformer models for low-resource languages. These results contribute to tackling gender-based abuse and improving online content moderation for underrepresented languages.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio