Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
317 directly classified papers
Papers per year
2016: 1
2017: 1
2018: 4
2019: 8
2020: 11
2021: 21
2022: 29
2023: 36
2024: 87
2025: 117
2026: 2
Papers
Hyperbolic Safety-Aware Vision-Language Models
CVPR 2025
SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
EMNLP 2025
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
ACL 2025
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ACL 2025
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
EMNLP 2025
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
EMNLP 2025
Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications
ACL 2025
Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
EMNLP 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
EMNLP 2025
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
EMNLP 2025
Automating Steering for Safe Multimodal Large Language Models
EMNLP 2025
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
ACL 2025
How to Fine-Tune Safely on a Budget: Model Adaptation Using Minimal Resources
EMNLP 2025
sudoLLM: On Multi-role Alignment of Language Models
EMNLP 2025
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
ACL 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
Root Defense Strategies: Ensuring Safety of LLM at the Decoding Level
ACL 2025
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
ACL 2025
Rescorla-Wagner Steering of LLMs for Undesired Behaviors over Disproportionate Inappropriate Context
EMNLP 2025
Monitoring of Perception Systems: Deterministic, Probabilistic, and Learning-Based Fault Detection and Identification (Abstract Reprint)
AAAI 2024
Towards Trustworthy Deep Learning
AAAI 2024
A Huber Loss Minimization Approach to Byzantine Robust Federated Learning
AAAI 2024
GaLileo: General Linear Relaxation Framework for Tightening Robustness Certification of Transformers
AAAI 2024
Long-Term Safe Reinforcement Learning with Binary Feedback
AAAI 2024
Pure-Past Action Masking
AAAI 2024
<
1
…
4
5
6
…
13
>