RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Liam Dugan; Alyssa Hwang; Filip Trhlík; Andrew Zhu; Josh Magnus Ludan; Hainiu Xu; Daphne Ippolito; Chris Callison-Burch

2024 ACL ACL 2024

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Abstract

AbstractMany commercial and open-source models claim to detect machine-generated text with extremely high accuracy (99% or more). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging—lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our data along with a leaderboard to encourage future research.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Liam Dugan , Alyssa Hwang , Filip Trhlík , Andrew Zhu , Josh Magnus Ludan , Hainiu Xu , Daphne Ippolito , Chris Callison-Burch

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Fairness

Keywords

adversarial robustness text classification machine-generated text detection benchmark dataset detector evaluation

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024