PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

Kseniia Petukhova; Roman Kazakov; Ekaterina Kochmar

2024 NAACL NAACL 2024

PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

Abstract

AbstractIn this paper, we present our submission to the SemEval-2024 Task 8 “Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection”, focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 16th from 139 in the ranking for Subtask A, and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.

❓ The Questioner

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Kseniia Petukhova , Roman Kazakov , Ekaterina Kochmar

Topics

Machine Learning > Core Methods > Classification Computer Science > Applications > Cybersecurity

Keywords

binary classification text classification machine-generated text detection embedding extraction

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024