Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
Certification of Speaker Recognition Models to Additive Perturbations
AAAI 2025
Investigating the Security Threat Arising from “Yes-No” Implicit Bias in Large Language Models
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
AAAI 2025
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
AAAI 2025
Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction
AAAI 2025
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
AAAI 2025
Scaling Trends for Data Poisoning in LLMs
AAAI 2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
AAAI 2025
LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
AAAI 2025
SMLE: Safe Machine Learning via Embedded Overapproximation
AAAI 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
AAAI 2025
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
AAAI 2025
MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models
AAAI 2025
Multimodal Pragmatic Jailbreak on Text-to-image Models
ACL 2025
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
ACL 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
ACL 2025
Exploring the Impact of Instruction-Tuning on LLM’s Susceptibility to Misinformation
ACL 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
CVPR 2025
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
ACL 2025
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
ACL 2025
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs
ACL 2025
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ACL 2025
PL-Guard: Benchmarking Language Model Safety for Polish
ACL 2025
<
1
2
3
4
5
…
13
>