Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Detection as Regression: Certified Object Detection with Median Smoothing
NIPS 2020
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
NIPS 2020
Towards Safe Policy Improvement for Non-Stationary MDPs
NIPS 2020
Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free
NIPS 2020
Network Error Logging: Client-side measurement of end-to-end web service reliability
NSDI 2020
A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples
COLING 2020
Imitation Attacks and Defenses for Black-box Machine Translation Systems
EMNLP 2020
Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms
NIPS 2020
Hidden Risks of Machine Learning Applied to Healthcare: Unintended Feedback Loops Between Models and Future Data Causing Model Degradation
MLHC 2020
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
NIPS 2020
Safe Reinforcement Learning via Curriculum Induction
NIPS 2020
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
ICML 2020
Defense Through Diverse Directions
ICML 2020
Hierarchical Verification for Adversarial Robustness
ICML 2020
Safe Exploration for Interactive Machine Learning
NIPS 2019
Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks
NIPS 2019
Unlabeled Data Improves Adversarial Robustness
NIPS 2019
Uniform Error Bounds for Gaussian Process Regression with Application to Safe Control
NIPS 2019
Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks
ICCV 2019
Protecting Elections by Recounting Ballots
IJCAI 2019
Ask not what AI can do, but what AI should do: Towards a framework of task delegability
NIPS 2019
Quality Control Attack Schemes in Crowdsourcing
IJCAI 2019
Attribution-Based Confidence Metric For Deep Neural Networks
NIPS 2019
Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks
CVPR 2019
Adversarial Training and Robustness for Multiple Perturbations
NIPS 2019
<
1
…
112
113
114
…
119
>