← Core Methods

Machine Learning › Core Methods ›

Evaluation

167 directly classified papers

Papers per year

Papers

Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing ACL 2020

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers ACL 2020

Approximate Cross-Validation for Structured Models NIPS 2020

Improving Confidence Estimates for Unfamiliar Examples CVPR 2020

Overview of the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies ACL 2020

uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems ACL 2020

MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines ACL 2020

SyntaxGym: An Online Platform for Targeted Evaluation of Language Models ACL 2020

A Re-evaluation of Knowledge Graph Completion Methods ACL 2020

Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning JMLR 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion INTERSPEECH 2019

Misleading Failures of Partial-input Baselines ACL 2019

WiRe57 : A Fine-Grained Benchmark for Open Information Extraction ACL 2019

Are Red Roses Red? Evaluating Consistency of Question-Answering Models ACL 2019

The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization EMNLP 2019

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance EMNLP 2019

Accurate Layerwise Interpretable Competence Estimation NIPS 2019

Tightness-Aware Evaluation Protocol for Scene Text Detection CVPR 2019

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models NIPS 2019

Multiclass Performance Metric Elicitation NIPS 2019

Minimizers of the Empirical Risk and Risk Monotonicity NIPS 2019

GEval: Tool for Debugging NLP Datasets and Models ACL 2019

A Repository of Conversational Datasets ACL 2019

A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity ACL 2019

EvalD Reference-Less Discourse Evaluation for WMT18 EMNLP 2018