Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
AAAI 2025
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
ACL 2025
LongSafety: Evaluating Long-Context Safety of Large Language Models
ACL 2025
Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights
ACL 2025
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
AAAI 2025
Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications
ACL 2025
Scaling Trends for Data Poisoning in LLMs
AAAI 2025
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
ACL 2025
Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction
AAAI 2025
Rewrite to Jailbreak: Discover Learnable and Transferable Implicit Harmfulness Instruction
ACL 2025
Tell Me What You Don’t Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
ACL 2025
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement
ACL 2025
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
ACL 2025
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
ACL 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
Scalable Surrogate Verification of Image-Based Neural Network Control Systems Using Composition and Unrolling
AAAI 2025
Enhancing Robustness in Incremental Learning with Adversarial Training
AAAI 2025
PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks
AAAI 2025
Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective
AAAI 2025
Shield Synthesis for LTL Modulo Theories
AAAI 2025
Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning
AAAI 2025
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning
AAAI 2025
Probabilistic Shielding for Safe Reinforcement Learning
AAAI 2025
Offline Safe Reinforcement Learning Using Trajectory Classification
AAAI 2025
Defense Against Prompt Injection Attack by Leveraging Attack Techniques
ACL 2025
<
1
2
3
4
5
…
13
>