← Optimization & Theory

Deep Learning › Optimization & Theory ›

Evaluation

345 directly classified papers

Papers per year

Papers

Probing Across Time: What Does RoBERTa Know and When? EMNLP 2021

Discretized Integrated Gradients for Explaining Language Models EMNLP 2021

What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think EMNLP 2021

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation EMNLP 2021

We Need to Talk About train-dev-test Splits EMNLP 2021

TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation EMNLP 2021

On the Limits of Minimal Pairs in Contrastive Evaluation EMNLP 2021

How Does BERT Rerank Passages? An Attribution Analysis with Information Bottlenecks EMNLP 2021

Controlled tasks for model analysis: Retrieving discrete information from sequences EMNLP 2021

HateCheck: Functional Tests for Hate Speech Detection Models ACL 2021

When Do You Need Billions of Words of Pretraining Data? ACL 2021

Lower Perplexity is Not Always Human-Like ACL 2021

Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution ACL 2021

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions ACL 2021

Not all parameters are born equal: Attention is mostly what you need EMNLP 2021

Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids’ Representations EMNLP 2021

A Novel Visual Interpretability for Deep Neural Networks by Optimizing Activation Maps with Perturbation AAAI 2021

Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models NIPS 2021

CIDEr-R: Robust Consensus-based Image Description Evaluation EMNLP 2021

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues EMNLP 2021

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation EMNLP 2021

The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution in Dialogue: A Cross-Team Analysis EMNLP 2021

Anaphora Resolution in Dialogue: Cross-Team Analysis of the DFKI-TalkingRobots Team Submissions for the CODI-CRAC 2021 Shared-Task EMNLP 2021

Exploratory Model Analysis Using Data-Driven Neuron Representations EMNLP 2021

On the Difficulty of Membership Inference Attacks CVPR 2021