CNLP-NITS-PP at GenAI Detection Task 2: Leveraging DistilBERT and XLM-RoBERTa for Multilingual AI-Generated Text Detection

Annepaka Yadagiri; Reddi Mohana Krishna; Partha Pakray

2025 COLING COLING 2025

CNLP-NITS-PP at GenAI Detection Task 2: Leveraging DistilBERT and XLM-RoBERTa for Multilingual AI-Generated Text Detection

Abstract

AbstractIn today’s digital landscape, distinguishing between human-authored essays and content generated by advanced Large Language Models such as ChatGPT, GPT-4, Gemini, and LLaMa has become increasingly complex. This differentiation is essential across sectors like academia, cybersecurity, social media, and education, where the authenticity of written material is often crucial. Addressing this challenge, the COLING 2025 competition introduced Task 2, a binary classification task to separate AI-generated text from human-authored content. Using a benchmark dataset for English and Arabic, developing a methodology that fine-tuned various transformer-based neural networks, including CNN-LSTM, RNN, Bi-GRU, BERT, DistilBERT, GPT-2, and RoBERTa. Our Team CNLP-NITS-PP achieved competitive performance through meticulous hyperparameter optimization, reaching a Recall score of 0.825. Specifically, we ranked 18th in the English sub-task A with an accuracy of 0.77 and 20th in the Arabic sub-task B with an accuracy of 0.59. These results underscore the potential of transformer-based models in academic settings to detect AI-generated content effectively, laying a foundation for more advanced methods in essay authenticity verification.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Annepaka Yadagiri , Reddi Mohana Krishna , Partha Pakray

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Transformers

Keywords

binary classification text classification multilingual nlp ai-generated text detection ai-generated text academic integrity transformer model transformer-based neural network

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025