Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Responsible AI Considerations in Text Summarization Research: A Review of Current Practices
EMNLP 2023
JUST_ONE at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS)
ACL 2023
Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI
ACL 2023
Not The End of Story: An Evaluation of ChatGPT-Driven Vulnerability Description Mappings
ACL 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
CVPR 2023
Co2PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
EMNLP 2023
Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
CVPR 2023
Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research
EMNLP 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
EMNLP 2023
“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters
EMNLP 2023
WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models
ACL 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created through Human-Machine Collaboration
ACL 2023
FairPrism: Evaluating Fairness-Related Harms in Text Generation
ACL 2023
DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning
NIPS 2022
Development and Validation of ML-DQA – a Machine Learning Data Quality Assurance Framework for Healthcare
MLHC 2022
Aligning to Social Norms and Values in Interactive Narratives
NAACL 2022
Aligning Generative Language Models with Human Values
NAACL 2022
Implications of Model Indeterminacy for Explanations of Automated Decisions
NIPS 2022
Washing The Unwashable : On The (Im)possibility of Fairwashing Detection
NIPS 2022
Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
NIPS 2022
The Limits of Word Level Differential Privacy
NAACL 2022
Targeted Identity Group Prediction in Hate Speech Corpora
NAACL 2022
Users Hate Blondes: Detecting Sexism in User Comments on Online Romanian News
NAACL 2022
Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler
NAACL 2022
Privacy Leakage in Text Classification A Data Extraction Approach
NAACL 2022
<
1
…
66
67
68
…
80
>