Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core Methods
Machine Learning
›
Core Methods
›
Evaluation
167 directly classified papers
Papers per year
2007: 1
2009: 1
2010: 1
2011: 2
2012: 1
2013: 2
2014: 1
2015: 1
2017: 1
2018: 7
2019: 15
2020: 14
2021: 11
2022: 25
2023: 31
2024: 24
2025: 29
Papers
Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing
ACL 2020
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
ACL 2020
Approximate Cross-Validation for Structured Models
NIPS 2020
Improving Confidence Estimates for Unfamiliar Examples
CVPR 2020
Overview of the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
ACL 2020
uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems
ACL 2020
MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
ACL 2020
SyntaxGym: An Online Platform for Targeted Evaluation of Language Models
ACL 2020
A Re-evaluation of Knowledge Graph Completion Methods
ACL 2020
Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning
JMLR 2019
MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion
INTERSPEECH 2019
Misleading Failures of Partial-input Baselines
ACL 2019
WiRe57 : A Fine-Grained Benchmark for Open Information Extraction
ACL 2019
Are Red Roses Red? Evaluating Consistency of Question-Answering Models
ACL 2019
The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization
EMNLP 2019
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
EMNLP 2019
Accurate Layerwise Interpretable Competence Estimation
NIPS 2019
Tightness-Aware Evaluation Protocol for Scene Text Detection
CVPR 2019
HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
NIPS 2019
Multiclass Performance Metric Elicitation
NIPS 2019
Minimizers of the Empirical Risk and Risk Monotonicity
NIPS 2019
GEval: Tool for Debugging NLP Datasets and Models
ACL 2019
A Repository of Conversational Datasets
ACL 2019
A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity
ACL 2019
EvalD Reference-Less Discourse Evaluation for WMT18
EMNLP 2018
<
1
2
3
4
5
6
7
>