Smart Lexical Search for Label Flipping Adversial Attack

Alberto Gutiérrez-Megías; Salud María Jiménez-Zafra; L. Alfonso Ureña; Eugenio Martínez-Cámara

2024 ACL ACL 2024

Smart Lexical Search for Label Flipping Adversial Attack

Abstract

AbstractLanguage models are susceptible to vulnerability through adversarial attacks, using manipulations of the input data to disrupt their performance. Accordingly, it represents a cibersecurity leak. Data manipulations are intended to be unidentifiable by the learning model and by humans, small changes can disturb the final label of a classification task. Hence, we propose a novel attack built upon explainability methods to identify the salient lexical units to alter in order to flip the classification label. We asses our proposal on a disinformation dataset, and we show that our attack reaches high balance among stealthiness and efficiency.

🧭 Keyword Pioneer — lexical manipulation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Alberto Gutiérrez-Megías , Salud María Jiménez-Zafra , L. Alfonso Ureña , Eugenio Martínez-Cámara

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Theory

Keywords

text classification adversarial attack disinformation detection explainability method lexical manipulation

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024