Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

ARGH-Mark: Anchor-Synchronized Watermarking with Hamming Correction for Robust and Quality-Preserving LLM Attribution AAAI 2026

Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning? AAAI 2026

How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation Under the One-Time-Pad-Based Framework AAAI 2026

On the Alignment of Large Language Models with Global Human Opinion AAAI 2026

DarkBench+: An Extended Benchmark for Evaluating Dark Patterns in Large Language Models AAAI 2026

Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models AAAI 2026

Detecting Compute Structuring in AI Governance Is Likely Feasible AAAI 2026

Designing Incident Reporting Systems for Harms from General-Purpose AI AAAI 2026

Fine-Grained Interpretation of Political Opinions in Large Language Models AAAI 2026

The Confidence Trap: Gender Bias and Predictive Certainty in LLMs AAAI 2026

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer AAAI 2026

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification AAAI 2026

SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge AAAI 2026

T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model AAAI 2026

AURA: Affordance-Understanding and Risk-aware Alignment Technique for Large Language Models AAAI 2026

ACID Test: A Benchmark for Cultural Safety and Alignment in LALMs AAAI 2026

Identifying Features Associated with Bias Against 93 Stigmatized Groups in Language Models and Guardrail Model Safety Mitigation AAAI 2026

Silenced Biases: The Dark Side LLMs Learned to Refuse AAAI 2026

Reducing the Scope of Language Models AAAI 2026

Steering Representations, Safeguarding Privacy: A Cross-Modal Privacy Protection Method for Generative AI AAAI 2026

ShadeEdit: A Utility-Preserving and Defense-Evasive Knowledge Manipulation Attack in Federated LLMs AAAI 2026

SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs AAAI 2026

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs AAAI 2026

Fairness Perceptions of Large Language Models AAAI 2026

Beyond World Models: Rethinking Understanding in AI Models AAAI 2026