nits_teja_srikar at GenAI Detection Task 2: Distinguishing Human and AI-Generated Essays Using Machine Learning and Transformer Models

Sai Teja Lekkala; Annepaka Yadagiri; Mangadoddi Srikar Vardhan; Partha Pakray

2025 COLING COLING 2025

nits_teja_srikar at GenAI Detection Task 2: Distinguishing Human and AI-Generated Essays Using Machine Learning and Transformer Models

Abstract

AbstractThis paper presents models to differentiate between human-written and AI-generated essays, addressing challenges posed by advanced AI models like ChatGPT and Claude. Using a structured dataset, we fine-tune multiple machine learning models, including XGBoost and Logistic Regression, along with ensemble learning and k-fold cross-validation. The dataset is processed through TF-IDF vectorization, followed by text cleaning, lemmatization, stemming, and part-of-speech tagging before training. Our team nits_teja_srikar achieves high accuracy, with DistilBERT performing at 77.3% accuracy, standing at 20th position for English, and XLM-RoBERTa excelling in Arabic at 92.2%, standing at 14th position in the official leaderboard, demonstrating the model’s potential for real-world applications.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — essay classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sai Teja Lekkala , Annepaka Yadagiri , Mangadoddi Srikar Vardhan , Partha Pakray

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification

Keywords

ensemble learning logistic regression ai-generated text detection tf-idf vectorization essay classification transformer model

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025