Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Detection as Regression: Certified Object Detection with Median Smoothing NIPS 2020

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning NIPS 2020

Towards Safe Policy Improvement for Non-Stationary MDPs NIPS 2020

Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free NIPS 2020

Network Error Logging: Client-side measurement of end-to-end web service reliability NSDI 2020

A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples COLING 2020

Imitation Attacks and Defenses for Black-box Machine Translation Systems EMNLP 2020

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms NIPS 2020

Hidden Risks of Machine Learning Applied to Healthcare: Unintended Feedback Loops Between Models and Future Data Causing Model Degradation MLHC 2020

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations NIPS 2020

Safe Reinforcement Learning via Curriculum Induction NIPS 2020

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger ICML 2020

Defense Through Diverse Directions ICML 2020

Hierarchical Verification for Adversarial Robustness ICML 2020

Safe Exploration for Interactive Machine Learning NIPS 2019

Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks NIPS 2019

Unlabeled Data Improves Adversarial Robustness NIPS 2019

Uniform Error Bounds for Gaussian Process Regression with Application to Safe Control NIPS 2019

Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks ICCV 2019

Protecting Elections by Recounting Ballots IJCAI 2019

Ask not what AI can do, but what AI should do: Towards a framework of task delegability NIPS 2019

Quality Control Attack Schemes in Crowdsourcing IJCAI 2019

Attribution-Based Confidence Metric For Deep Neural Networks NIPS 2019

Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks CVPR 2019

Adversarial Training and Robustness for Multiple Perturbations NIPS 2019