Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning
AAAI 2023
Safety Verification of Nonlinear Systems with Bayesian Neural Network Controllers
AAAI 2023
Evaluating Model-Free Reinforcement Learning toward Safety-Critical Tasks
AAAI 2023
Rethinking Safe Control in the Presence of Self-Seeking Humans
AAAI 2023
Safety Validation of Learning-Based Autonomous Systems: A Multi-Fidelity Approach
AAAI 2023
Targeted Knowledge Infusion To Make Conversational AI Explainable and Safe
AAAI 2023
Advances in AI for Safety, Equity, and Well-Being on Web and Social Media: Detection, Robustness, Attribution, and Mitigation
AAAI 2023
Combining Runtime Monitoring and Machine Learning with Human Feedback
AAAI 2023
Towards Safe and Resilient Autonomy in Multi-Robot Systems
AAAI 2023
MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
ACL 2023
Text Adversarial Purification as Defense against Adversarial Attacks
ACL 2023
Language Detoxification with Attribute-Discriminative Latent Space
ACL 2023
TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees
ACL 2023
Defending against Insertion-based Textual Backdoor Attacks via Attribution
ACL 2023
Can Large Language Models Safely Address Patient Questions Following Cataract Surgery?
ACL 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities
NIPS 2023
Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker
NIPS 2023
Enhancing Safe Exploration Using Safety State Augmentation
NIPS 2022
Provable Defense against Backdoor Policies in Reinforcement Learning
NIPS 2022
Shield Decentralization for Safe Multi-Agent Reinforcement Learning
NIPS 2022
Increasing Confidence in Adversarial Robustness Evaluations
NIPS 2022
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
NIPS 2022
Toward Robust Spiking Neural Network Against Adversarial Perturbation
NIPS 2022
Risk-Driven Design of Perception Systems
NIPS 2022
On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
NIPS 2022
<
1
…
9
10
11
12
13
>