Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
NIPS 2024
CMD: a framework for Context-aware Model self-Detoxification
EMNLP 2024
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
EMNLP 2024
Reward (Mis)design for Autonomous Driving (Abstract Reprint)
AAAI 2024
Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees (Abstract Reprint)
AAAI 2024
Monitoring of Perception Systems: Deterministic, Probabilistic, and Learning-Based Fault Detection and Identification (Abstract Reprint)
AAAI 2024
Multi-Agent First Order Constrained Optimization in Policy Space
NIPS 2023
The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks
CVPR 2023
Towards Test-Time Refusals via Concept Negation
NIPS 2023
Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms
NIPS 2023
Unveiling the Implicit Toxicity in Large Language Models
EMNLP 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
EMNLP 2023
Self-Detoxifying Language Models via Toxification Reversal
EMNLP 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition
EMNLP 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
EMNLP 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
EMNLP 2023
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
EMNLP 2023
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
EMNLP 2023
GTA: Gated Toxicity Avoidance for LM Performance Preservation
EMNLP 2023
Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints
AAAI 2023
CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration
AAAI 2023
Safe Reinforcement Learning via Shielding under Partial Observability
AAAI 2023
Correct-by-Construction Reinforcement Learning of Cardiac Pacemakers from Duration Calculus Requirements
AAAI 2023
SafeLight: A Reinforcement Learning Method toward Collision-Free Traffic Signal Control
AAAI 2023
AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning
AAAI 2023
<
1
…
8
9
10
…
13
>