2025
COLING
COLING 2025
Pangram at GenAI Detection Task 3: An Active Learning Approach to Machine-Generated Text Detection
Abstract
AbstractWe pretrain an autoregressive LLM-based detector on a wide variety of datasets, domains, languages, prompt schemes, and LLMs used to generate the AI portion of the dataset. We aggressively employ several augmentation strategies and preprocessing strategies to improve robustness. We then mine the RAID train set for the AI examples with the largest error based on the original classifier, and mix those examples and their human-written counterparts back into the training set. We then retrain the detector until convergence.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— error mining
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Active Learning
Deep Learning > Architectures > Transformers
Natural Language Processing > Applications > Text Classification
Artificial Intelligence > Core AI > Large Language Models
Deep Learning > Models > Large Language Models
Machine Learning > Learning Paradigms > Active Learning