Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

Yang Xu; Yu Wang; Hao An; Zhichen Liu; Yongyuan Li

2024 EMNLP EMNLP 2024

Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

Abstract

AbstractHuman and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model’s capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — relative likelihood

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yang Xu , Yu Wang , Hao An , Zhichen Liu , Yongyuan Li

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Stochastic Processes Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Classification Artificial Intelligence > Core AI > Natural Language Processing

Keywords

text classification likelihood estimation language model zero-shot detection language detection relative likelihood spectrum analysis human-model text detection human model detection

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024