Detecting LLM-Assisted Cheating on Open-Ended Writing Tasks on Language Proficiency Tests

Chenhao Niu; Kevin P. Yancey; Ruidong Liu; Mirza Basim Baig; André Kenji Horie; James Sharpnack

2024 EMNLP EMNLP 2024

Detecting LLM-Assisted Cheating on Open-Ended Writing Tasks on Language Proficiency Tests

Abstract

AbstractThe high capability of recent Large Language Models (LLMs) has led to concerns about possible misuse as cheating assistants in open-ended writing tasks in assessments. Although various detecting methods have been proposed, most of them have not been evaluated on or optimized for real-world samples from LLM-assisted cheating, where the generated text is often copy-typed imperfectly by the test-taker. In this paper, we present a framework for training LLM-generated text detectors that can effectively detect LLM-generated samples after being copy-typed. We enhance the existing transformer-based classifier training process with contrastive learning on constructed pairwise data and self-training on unlabeled data, and evaluate the improvements on a real-world dataset from the Duolingo English Test (DET), a high-stakes online English proficiency test. Our experiments demonstrate that the improved model outperforms the original transformer-based classifier and other baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — large language model generated text detection

🐣 Hot Topic Early Bird — educational assessment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenhao Niu , Kevin P. Yancey , Ruidong Liu , Mirza Basim Baig , André Kenji Horie , James Sharpnack

Topics

Machine Learning > Learning Types > Contrastive Learning Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification Deep Learning > Learning Types > Contrastive Learning Deep Learning > Learning Types > Classification Artificial Intelligence > Core AI > Natural Language Processing

Keywords

contrastive learning educational assessment llm-generated text detection large language model generated text detection cheating detection language proficiency test

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024