2023 EACL EACL 2023

Improving Long-Text Authorship Verification via Model Selection and Data Tuning

Abstract

AbstractAuthorship verification is used to link texts written by the same author without needing a model per author, making it useful to deanonymizing users spreading text with malicious intent. In this work, we evaluated our Cross-Encoder system with four Transformers using differently tuned variants of fanfiction data and found that our BigBird pipeline outperformed Longformer, RoBERTa, and ELECTRA and performed competitively against the official top ranked system from the PAN evaluation. We also examined the effect of authors and fandoms not seen in training on model performance. Through this, we found fandom has the greatest influence on true trials, and that a balanced training dataset in terms of class and fandom performed the most consistently.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — bigbird pipeline
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio