← Optimization & Theory

Machine Learning › Optimization & Theory ›

Evaluation

515 directly classified papers

Papers per year

Papers

Expected Validation Performance and Estimation of a Random Variable’s Maximum EMNLP 2021

Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations EMNLP 2021

Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain EMNLP 2021

SOPE: Spectrum of Off-Policy Estimators NIPS 2021

Predicting Deep Neural Network Generalization with Perturbation Response Curves NIPS 2021

Towards a more Robust Evaluation for Conversational Question Answering ACL 2021

Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation EMNLP 2020

On the Same Page? Comparing Inter-Annotator Agreement in Sentence and Document Level Human Machine Translation Evaluation EMNLP 2020

Intrinsic Evaluation of Summarization Datasets EMNLP 2020

Let’s Stop Incorrect Comparisons in End-to-end Relation Extraction! EMNLP 2020

KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations CVPR 2020

Using PRMSE to evaluate automated scoring systems in the presence of label noise ACL 2020

Predicting Performance for Natural Language Processing Tasks ACL 2020

Evaluating the Performance of Reinforcement Learning Algorithms ICML 2020

R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason ACL 2020

An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results ACL 2020

Evaluating Explanation Methods for Neural Machine Translation ACL 2020

Dataless Model Selection With the Deep Frame Potential CVPR 2020

Evaluation of Causal Structure Learning Algorithms via Risk Estimation UAI 2020

Effectively Unbiased FID and Inception Score and Where to Find Them CVPR 2020

Probing Task-Oriented Dialogue Representation from Language Models EMNLP 2020

Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning EMNLP 2020

Semi-Supervised Learning for Maximizing the Partial AUC AAAI 2020

Assessing Human-Parity in Machine Translation on the Segment Level EMNLP 2020

Item Response Theory for Efficient Human Evaluation of Chatbots EMNLP 2020