Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Responsible AI Considerations in Text Summarization Research: A Review of Current Practices EMNLP 2023

JUST_ONE at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) ACL 2023

Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI ACL 2023

Not The End of Story: An Evaluation of ChatGPT-Driven Vulnerability Description Mappings ACL 2023

Uncurated Image-Text Datasets: Shedding Light on Demographic Bias CVPR 2023

Co2PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning EMNLP 2023

Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision CVPR 2023

Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research EMNLP 2023

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models EMNLP 2023

“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters EMNLP 2023

WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models ACL 2023

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created through Human-Machine Collaboration ACL 2023

FairPrism: Evaluating Fairness-Related Harms in Text Generation ACL 2023

DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning NIPS 2022

Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare MLHC 2022

Aligning to Social Norms and Values in Interactive Narratives NAACL 2022

Aligning Generative Language Models with Human Values NAACL 2022

Implications of Model Indeterminacy for Explanations of Automated Decisions NIPS 2022

Washing The Unwashable : On The (Im)possibility of Fairwashing Detection NIPS 2022

Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting NIPS 2022

The Limits of Word Level Differential Privacy NAACL 2022

Targeted Identity Group Prediction in Hate Speech Corpora NAACL 2022

Users Hate Blondes: Detecting Sexism in User Comments on Online Romanian News NAACL 2022

Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler NAACL 2022

Privacy Leakage in Text Classification A Data Extraction Approach NAACL 2022