Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Situating Youth Agency in Designing AI & Art Policies AAAI 2026

FAIR-SIGHT: Fairness Assurance in Image Recognition via Simultaneous Conformal Thresholding and Dynamic Output Repair WACV 2026

CAAC: Confidence-Aware Attention Calibration to Reduce Hallucinations in Large Vision-Language Models WACV 2026

Digital Forensic AI You Can Explain: A Case Study on Video Source Camera Identification WACV 2026

BAFIS: Dataset + Framework to Assess Occupational Bias and Human Preference in Modern Text-to-image Models WACV 2026

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation WACV 2026

Safe Vision-Language Models via Unsafe Weights Manipulation WACV 2026

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models’ Detection of Human risky health behavior Content in Jirai Community EACL 2026

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models EACL 2026

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework EACL 2026

Improving LLM Domain Certification with Pretrained Guide Models EACL 2026

Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions EACL 2026

ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models EACL 2026

Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs EACL 2026

Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty EACL 2026

Detecting Subtle Biases: An Ethical Lens on Underexplored Areas in AI Language Models Biases EACL 2026

Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs EACL 2026

CAIRE: Cultural Attribution of Images with Retrieval EACL 2026

When Words Wear Masks: Detecting Malicious Intents and Hostile Impacts of Online Hate Speech EACL 2026

Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments EACL 2026

Rethinking the Evaluation of Alignment Methods: Insights into Diversity, Generalisation, and Safety EACL 2026

Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts EACL 2026

Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models EACL 2026

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy EACL 2026

The Shepherd Test: How Will Super Intelligent Agents Balance Care and Control in Asymmetric Relationships? AAAI 2026