Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Evaluation
345 directly classified papers
Papers per year
2014: 1
2016: 3
2017: 1
2018: 9
2019: 21
2020: 34
2021: 32
2022: 50
2023: 28
2024: 90
2025: 76
Papers
On the Sensitivity and Stability of Model Interpretations in NLP
ACL 2022
Down and Across: Introducing Crossword-Solving as a New NLP Benchmark
ACL 2022
BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation
ACL 2022
Impact of Evaluation Methodologies on Code Summarization
ACL 2022
A Comparative Study of Faithfulness Metrics for Model Interpretability Methods
ACL 2022
Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models
ACL 2022
Logic Traps in Evaluating Attribution Scores
ACL 2022
Rethinking and Refining the Distinct Metric
ACL 2022
On the Importance of Data Size in Probing Fine-tuned Models
ACL 2022
Richer Countries and Richer Representations
ACL 2022
Measuring Compositional Consistency for Video Question Answering
CVPR 2022
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
CVPR 2022
The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting
CVPR 2022
VisCUIT: Visual Auditor for Bias in CNN Image Classifier
CVPR 2022
Merry Go Round: Rotate a Frame and Fool a DNN
CVPR 2022
Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation
EMNLP 2022
SEM-F1: an Automatic Way for Semantic Evaluation of Multi-Narrative Overlap Summaries at Scale
EMNLP 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
EMNLP 2022
Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates
EMNLP 2022
Reproducibility Issues for BERT-based Evaluation Metrics
EMNLP 2022
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models
EMNLP 2022
On Measuring the Intrinsic Few-Shot Hardness of Datasets
EMNLP 2022
Revisiting Grammatical Error Correction Evaluation and Beyond
EMNLP 2022
Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal
EMNLP 2022
DEMETR: Diagnosing Evaluation Metrics for Translation
EMNLP 2022
<
1
…
8
9
10
…
14
>