Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

EuroGEST: Investigating gender stereotypes in multilingual language models EMNLP 2025

Multiple LLM Agents Debate for Equitable Cultural Alignment ACL 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts ACL 2025

HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America EMNLP 2025

Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning ACL 2025

HalluLens: LLM Hallucination Benchmark ACL 2025

Evaluation of Medical Large Language Models: Taxonomy, Review, and Directions IJCAI 2025

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers EMNLP 2025

Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases ACL 2025

PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory ACL 2025

Representation Bending for Large Language Model Safety ACL 2025

Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora ACL 2025

Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context ACL 2025

MPO: Multilingual Safety Alignment via Reward Gap Optimization ACL 2025

UTF: Under-trained Tokens as Fingerprints —— a Novel Approach to LLM Identification ACL 2025

Guardians of Trust: Risks and Opportunities for LLMs in Mental Health ACL 2025

Multilingual Large Language Models Leak Human Stereotypes across Language Boundaries ACL 2025

ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging. ACL 2025

MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection ACL 2025

GUIR at SemEval-2025 Task 4: Adaptive Weight Tuning with Gradual Negative Matching for LLM Unlearning ACL 2025

RaggedyFive at SemEval-2025 Task 3: Hallucination Span Detection Using Unverifiable Answer Detection ACL 2025

JU-CSE-NLP’25 at SemEval-2025 Task 4: Learning to Unlearn LLMs ACL 2025

PROTECT: Policy-Related Organizational Value Taxonomy for Ethical Compliance and Trust ACL 2025

UAlign: LLM Alignment Benchmark for the Ukrainian Language ACL 2025

Understanding PII Leakage in Large Language Models: A Systematic Survey IJCAI 2025