← Optimization & Theory

Deep Learning › Optimization & Theory ›

Evaluation

345 directly classified papers

Papers per year

Papers

On the Sensitivity and Stability of Model Interpretations in NLP ACL 2022

Down and Across: Introducing Crossword-Solving as a New NLP Benchmark ACL 2022

BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation ACL 2022

Impact of Evaluation Methodologies on Code Summarization ACL 2022

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods ACL 2022

Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models ACL 2022

Logic Traps in Evaluating Attribution Scores ACL 2022

Rethinking and Refining the Distinct Metric ACL 2022

On the Importance of Data Size in Probing Fine-tuned Models ACL 2022

Richer Countries and Richer Representations ACL 2022

Measuring Compositional Consistency for Video Question Answering CVPR 2022

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering CVPR 2022

The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting CVPR 2022

VisCUIT: Visual Auditor for Bias in CNN Image Classifier CVPR 2022

Merry Go Round: Rotate a Frame and Fool a DNN CVPR 2022

Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation EMNLP 2022

SEM-F1: an Automatic Way for Semantic Evaluation of Multi-Narrative Overlap Summaries at Scale EMNLP 2022

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets EMNLP 2022

Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates EMNLP 2022

Reproducibility Issues for BERT-based Evaluation Metrics EMNLP 2022

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models EMNLP 2022

On Measuring the Intrinsic Few-Shot Hardness of Datasets EMNLP 2022

Revisiting Grammatical Error Correction Evaluation and Beyond EMNLP 2022

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal EMNLP 2022

DEMETR: Diagnosing Evaluation Metrics for Translation EMNLP 2022