Artificial Intelligence › Core AI ›

Safety

317 directly classified papers

Papers per year

Papers

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models NIPS 2024

CMD: a framework for Context-aware Model self-Detoxification EMNLP 2024

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models EMNLP 2024

Reward (Mis)design for Autonomous Driving (Abstract Reprint) AAAI 2024

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees (Abstract Reprint) AAAI 2024

Monitoring of Perception Systems: Deterministic, Probabilistic, and Learning-Based Fault Detection and Identification (Abstract Reprint) AAAI 2024

Multi-Agent First Order Constrained Optimization in Policy Space NIPS 2023

The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks CVPR 2023

Towards Test-Time Refusals via Concept Negation NIPS 2023

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms NIPS 2023

Unveiling the Implicit Toxicity in Large Language Models EMNLP 2023

ToViLaG: Your Visual-Language Generative Model is Also An Evildoer EMNLP 2023

Self-Detoxifying Language Models via Toxification Reversal EMNLP 2023

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition EMNLP 2023

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models EMNLP 2023

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models EMNLP 2023

Towards Detecting Contextual Real-Time Toxicity for In-Game Chat EMNLP 2023

InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning EMNLP 2023

GTA: Gated Toxicity Avoidance for LM Performance Preservation EMNLP 2023

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints AAAI 2023

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration AAAI 2023

Safe Reinforcement Learning via Shielding under Partial Observability AAAI 2023

Correct-by-Construction Reinforcement Learning of Cardiac Pacemakers from Duration Calculus Requirements AAAI 2023

SafeLight: A Reinforcement Learning Method toward Collision-Free Traffic Signal Control AAAI 2023

AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning AAAI 2023