kubapok@LT-EDI 2024: Evaluating Transformer Models for Hate Speech Detection in Tamil

Jakub Pokrywka; Krzysztof Jassem

2024 EACL EACL 2024

kubapok@LT-EDI 2024: Evaluating Transformer Models for Hate Speech Detection in Tamil

Abstract

AbstractWe describe the second-place submission for the shared task organized at the Fourth Workshop on Language Technology for Equality, Diversity, and Inclusion (LT-EDI-2024). The task focuses on detecting caste/migration hate speech in Tamil. The included texts involve the Tamil language in both Tamil script and transliterated into Latin script, with some texts also in English. Considering different scripts, we examined the performance of 12 transformer language models on the dev set. Our analysis revealed that for the whole dataset, the model google/muril-large-cased performs the best. We used an ensemble of several models for the final challenge submission, achieving 0.81 for the test dataset.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jakub Pokrywka , Krzysztof Jassem

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Text Classification Interdisciplinary > Social > Social Media Analysis Machine Learning > Learning Types > Ensemble Learning Machine Learning > Learning Types > Deep Learning

Keywords

ensemble learning text classification model evaluation ensemble method transformer language model hate speech detection tamil language hate speech classification transformer model

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024