2024 NAACL NAACL 2024

Fine-tuning Language Models for AI vs Human Generated Text detection

Abstract

AbstractIn this paper, we introduce a machine-generated text detection system designed totackle the challenges posed by the prolifera-tion of large language models (LLMs). Withthe rise of LLMs such as ChatGPT and GPT-4,there is a growing concern regarding the po-tential misuse of machine-generated content,including misinformation dissemination. Oursystem addresses this issue by automating theidentification of machine-generated text acrossmultiple subtasks: binary human-written vs.machine-generated text classification, multi-way machine-generated text classification, andhuman-machine mixed text detection. We em-ploy the RoBERTa Base model and fine-tuneit on a diverse dataset encompassing variousdomains, languages, and sources. Throughrigorous evaluation, we demonstrate the effec-tiveness of our system in accurately detectingmachine-generated text, contributing to effortsaimed at mitigating its potential misuse.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio