From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang; Zhuohan Long; Zhihao Fan; zhongyu wei

2024 EMNLP EMNLP 2024

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Abstract

AbstractThe rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siyuan Wang , Zhuohan Long , Zhihao Fan , zhongyu wei

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Responsible AI Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Multi-Modal Learning Artificial Intelligence > Core AI > Safety

Keywords

model robustness adversarial attack multimodal large language model evaluation benchmark security evaluation defense strategy multimodal jailbreaking

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024