Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Machine Learning
›
Optimization & Theory
›
Evaluation
515 directly classified papers
Papers per year
2003: 1
2004: 1
2005: 1
2006: 1
2008: 2
2009: 1
2010: 1
2013: 5
2016: 3
2017: 8
2018: 11
2019: 24
2020: 25
2021: 34
2022: 68
2023: 74
2024: 105
2025: 147
2026: 3
Papers
Expected Validation Performance and Estimation of a Random Variable’s Maximum
EMNLP 2021
Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations
EMNLP 2021
Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain
EMNLP 2021
SOPE: Spectrum of Off-Policy Estimators
NIPS 2021
Predicting Deep Neural Network Generalization with Perturbation Response Curves
NIPS 2021
Towards a more Robust Evaluation for Conversational Question Answering
ACL 2021
Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation
EMNLP 2020
On the Same Page? Comparing Inter-Annotator Agreement in Sentence and Document Level Human Machine Translation Evaluation
EMNLP 2020
Intrinsic Evaluation of Summarization Datasets
EMNLP 2020
Let’s Stop Incorrect Comparisons in End-to-end Relation Extraction!
EMNLP 2020
KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations
CVPR 2020
Using PRMSE to evaluate automated scoring systems in the presence of label noise
ACL 2020
Predicting Performance for Natural Language Processing Tasks
ACL 2020
Evaluating the Performance of Reinforcement Learning Algorithms
ICML 2020
R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason
ACL 2020
An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results
ACL 2020
Evaluating Explanation Methods for Neural Machine Translation
ACL 2020
Dataless Model Selection With the Deep Frame Potential
CVPR 2020
Evaluation of Causal Structure Learning Algorithms via Risk Estimation
UAI 2020
Effectively Unbiased FID and Inception Score and Where to Find Them
CVPR 2020
Probing Task-Oriented Dialogue Representation from Language Models
EMNLP 2020
Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning
EMNLP 2020
Semi-Supervised Learning for Maximizing the Partial AUC
AAAI 2020
Assessing Human-Parity in Machine Translation on the Segment Level
EMNLP 2020
Item Response Theory for Efficient Human Evaluation of Chatbots
EMNLP 2020
<
1
…
17
18
19
20
21
>