Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning
AAAI 2025
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
Investigating the Security Threat Arising from “Yes-No” Implicit Bias in Large Language Models
AAAI 2025
Certification of Speaker Recognition Models to Additive Perturbations
AAAI 2025
Scaling Trends for Data Poisoning in LLMs
AAAI 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks
AAAI 2025
Scalable Surrogate Verification of Image-Based Neural Network Control Systems Using Composition and Unrolling
AAAI 2025
Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning
AAAI 2025
Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective
AAAI 2025
Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction
AAAI 2025
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
AAAI 2025
SMLE: Safe Machine Learning via Embedded Overapproximation
AAAI 2025
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
ACL 2025
LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
AAAI 2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
AAAI 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
AAAI 2025
Stepwise Reasoning Disruption Attack of LLMs
ACL 2025
MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models
AAAI 2025
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
AAAI 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
CVPR 2025
Enhancing Robustness in Incremental Learning with Adversarial Training
AAAI 2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
AAAI 2025
Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models
ACL 2025
<
1
2
3
4
5
…
13
>