Grape at GenAI Detection Task 1: Leveraging Compact Models and Linguistic Features for Robust Machine-Generated Text Detection

Nhi Hoai Doan; Kentaro Inui

2025 COLING COLING 2025

Grape at GenAI Detection Task 1: Leveraging Compact Models and Linguistic Features for Robust Machine-Generated Text Detection

Abstract

AbstractIn this project, we aim to address two subtasks of Task 1: Binary Multilingual Machine-Generated Text (MGT) Detection (Human vs. Machine) as part of the COLING 2025 Workshop on MGT Detection (Wang et al., 2025) using different approaches. The first method involves separately fine-tuning small language models tailored to the specific subtask. The second approach builds on this methodology by incorporating linguistic, syntactic, and semantic features, leveraging ensemble learning to integrate these features with model predictions for more robust classification. By evaluating and comparing these approaches, we aim to identify the most effective techniques for detecting machine-generated content across languages, providing insights into improving automated verification tools amidst the rapid growth of LLM-generated text in digital spaces.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nhi Hoai Doan , Kentaro Inui

Topics

Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Supervised Learning Machine Learning > Learning Types > Ensemble Learning Machine Learning > Core Methods > Ensemble Methods Machine Learning > Core Methods > Ensemble Learning Deep Learning > Models > Transformers Artificial Intelligence > Core AI > Language

Keywords

ensemble learning machine-generated text detection syntactic analysis linguistic feature semantic feature small language model syntactic feature binary text classification

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025