2024 NAACL NAACL 2024

Simple LLM based Approach to Counter Algospeak

Abstract

AbstractWith the use of algorithmic moderation on online communication platforms, an increase in adaptive language aiming to evade the automatic detection of problematic content has been observed. One form of this adapted language is known as “Algospeak” and is most commonly associated with large social media platforms, e.g., TikTok. It builds upon Leetspeak or online slang with its explicit intention to avoid machine readability. The machine-learning algorithms employed to automate the process of content moderation mostly rely on human-annotated datasets and supervised learning, often not adjusted for a wide variety of languages and changes in language. This work uses linguistic examples identified in research literature to introduce a taxonomy for Algospeak and shows that with the use of an LLM (GPT-4), 79.4% of the established terms can be corrected to their true form, or if needed, their underlying associated concepts. With an example sentence, 98.5% of terms are correctly identified. This research demonstrates that LLMs are the future in solving the current problem of moderation avoidance by Algospeak.

🧭 Keyword Pioneer — algorithmic moderation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio