2025 COLING COLING 2025

TurQUaz at GenAI Detection Task 1:Dr. Perplexity or: How I Learned to Stop Worrying and Love the Finetuning

Abstract

AbstractThis paper details our methods for addressing Task 1 of the GenAI Content Detection shared tasks, which focus on distinguishing AI-generated text from human-written content. The task comprises two subtasks: Subtask A, centered on English-only datasets, and Subtask B, which extends the challenge to multilingual data. Our approach uses a fine-tuned XLM-RoBERTa model for classification, complemented by features including perplexity and TF-IDF. While perplexity is commonly regarded as a useful indicator for identifying machine-generated text, our findings suggest its limitations in multi-model and multilingual contexts. Our approach ranked 6th in Subtask A, but a submission issue left our Subtask B unranked, where it would have placed 23rd.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio