Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models
AAAI 2026
Runtime Safety and Reach-avoid Prediction of Stochastic Systems via Observation-aware Barrier Functions
AAAI 2026
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
AAAI 2025
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
AAAI 2025
Scaling Trends for Data Poisoning in LLMs
AAAI 2025
SMLE: Safe Machine Learning via Embedded Overapproximation
AAAI 2025
Probabilistic Shielding for Safe Reinforcement Learning
AAAI 2025
Enhancing Robustness in Incremental Learning with Adversarial Training
AAAI 2025
Certification of Speaker Recognition Models to Additive Perturbations
AAAI 2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
AAAI 2025
Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective
AAAI 2025
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
AAAI 2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
AAAI 2025
LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
AAAI 2025
VestaBench: An Embodied Benchmark for Safe Long-Horizon Planning Under Multi-Constraint and Adversarial Settings
EMNLP 2025
Shield Synthesis for LTL Modulo Theories
AAAI 2025
Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning
AAAI 2025
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning
AAAI 2025
Scalable Surrogate Verification of Image-Based Neural Network Control Systems Using Composition and Unrolling
AAAI 2025
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks
AAAI 2025
Investigating the Security Threat Arising from “Yes-No” Implicit Bias in Large Language Models
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
CVPR 2025
PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks
AAAI 2025
Offline Safe Reinforcement Learning Using Trajectory Classification
AAAI 2025
<
1
2
3
4
5
…
13
>