Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Private Set Generation with Discriminative Information NIPS 2022

A Major Obstacle for NLP Research: Let’s Talk about Time Allocation! EMNLP 2022

Debiasing Masks: A New Framework for Shortcut Mitigation in NLU EMNLP 2022

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits NIPS 2022

QUARK: Controllable Text Generation with Reinforced Unlearning NIPS 2022

Training language models to follow instructions with human feedback NIPS 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents EMNLP 2022

Delivering Trustworthy AI through Formal XAI AAAI 2022

SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis NIPS 2022

SafeText: A Benchmark for Exploring Physical Safety in Language Models EMNLP 2022

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models NIPS 2022

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset NIPS 2022

Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence EMNLP 2022

Gendered Mental Health Stigma in Masked Language Models EMNLP 2022

Preparing High School Teachers to Integrate AI Methods into STEM Classrooms AAAI 2022

DeepAuth: A DNN Authentication Framework by Model-Unique and Fragile Signature Embedding AAAI 2022

ExSum: From Local Explanations to Model Understanding NAACL 2022

Measure and Improve Robustness in NLP Models: A Survey NAACL 2022

How Gender Debiasing Affects Internal Model Representations, and Why It Matters NAACL 2022

Theory-Grounded Measurement of U.S. Social Stereotypes in English Language Models NAACL 2022

Cross-Domain Detection of GPT-2-Generated Technical Text NAACL 2022

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning ACL 2022

Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals ACL 2022

Pipelines for Social Bias Testing of Large Language Models ACL 2022

Towards Automating Model Explanations with Certified Robustness Guarantees AAAI 2022