UnityAI Guard: Pioneering Toxicity Detection Across Low-Resource Indian Languages

Himanshu Beniwal; Reddybathuni Venkat; Rohit Kumar; Birudugadda Srivibhav; Daksh Jain; Pavan Deekshith Doddi; Eshwar Dhande; Adithya Ananth; Kuldeep; Mayank Singh

2025 EMNLP EMNLP 2025

UnityAI Guard: Pioneering Toxicity Detection Across Low-Resource Indian Languages

Abstract

AbstractThis work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 567k training instances and 30k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Himanshu Beniwal , Reddybathuni Venkat , Rohit Kumar , Birudugadda Srivibhav , Daksh Jain , Pavan Deekshith Doddi , Eshwar Dhande , Adithya Ananth , Kuldeep , Mayank Singh

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Classification Artificial Intelligence > Core AI > Natural Language Processing Machine Learning > Learning Types > Multi-Lingual Learning Machine Learning > Application Areas > Text Classification

Keywords

binary classification text classification multilingual nlp toxicity detection content moderation low-resource language indian language multilingual model indic language

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025