byteSizedLLM@DravidianLangTech 2025: Multimodal Misogyny Meme Detection in Low-Resource Dravidian Languages Using Transliteration-Aware XLM-RoBERTa, ResNet-50, and Attention-BiLSTM

Durga Prasad Manukonda; Rohith Gowtham Kodali

2025 NAACL NAACL 2025

byteSizedLLM@DravidianLangTech 2025: Multimodal Misogyny Meme Detection in Low-Resource Dravidian Languages Using Transliteration-Aware XLM-RoBERTa, ResNet-50, and Attention-BiLSTM

Abstract

AbstractDetecting misogyny in memes is challenging due to their multimodal nature, especially in low-resource languages like Tamil and Malayalam. This paper presents our work in the Misogyny Meme Detection task, utilizing both textual and visual features. We propose an Attention-Driven BiLSTM-XLM-RoBERTa-ResNet model, combining a transliteration-aware fine-tuned XLM-RoBERTa for text analysis and ResNet-50 for image feature extraction. Our model achieved Macro-F1 scores of 0.8805 for Malayalam and 0.8081 for Tamil, demonstrating competitive performance. However, challenges such as class imbalance and domain-specific image representation persist. Our findings highlight the need for better dataset curation, task-specific fine-tuning, and advanced fusion techniques to enhance multimodal hate speech detection in Dravidian languages.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Interdisciplinary

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Durga Prasad Manukonda , Rohith Gowtham Kodali

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > Object Detection Interdisciplinary > Social > Social Media Analysis

Keywords

multimodal learning low-resource language misogyny detection multilingual transformer image-text fusion

Download PDF

Related papers

Few-shot Personalization of LLMs with Mis-aligned Responses 2025

NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals 2025

Understanding Figurative Meaning through Explainable Visual Entailment 2025

CogLM: Tracking Cognitive Development of Large Language Models 2025

MAD Speech: Measures of Acoustic Diversity of Speech 2025