Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
A Meta-Analysis of Overfitting in Machine Learning
NIPS 2019
Evaluating Question Answering Evaluation
EMNLP 2019
A Closer Look at Data Bias in Neural Extractive Summarization Models
EMNLP 2019
Evaluating Research Novelty Detection: Counterfactual Approaches
EMNLP 2019
Findings of the WMT 2019 Shared Tasks on Quality Estimation
ACL 2019
The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation
ACL 2019
Narrative Generation in the Wild: Methods from NaNoGenMo
ACL 2019
Evaluating Automatic Term Extraction Methods on Individual Documents
ACL 2019
Confirming the Non-compositionality of Idioms for Sentiment Analysis
ACL 2019
Explaining Simple Natural Language Inference
ACL 2019
Are Red Roses Red? Evaluating Consistency of Question-Answering Models
ACL 2019
Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets
ACL 2019
Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets
ACL 2019
Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
ACL 2019
Interpretable Predictive Modeling for Climate Variables with Weighted Lasso
AAAI 2019
On the Efficiency of Data Collection for Crowdsourced Classification
IJCAI 2018
Breaking NLI Systems with Sentences that Require Simple Lexical Inferences
ACL 2018
Tackling the Story Ending Biases in The Story Cloze Test
ACL 2018
Intersection-Validation: A Method for Evaluating Structure Learning without Ground Truth
AISTATS 2018
MeSH-based dataset for measuring the relevance of text retrieval
ACL 2018
Towards a Better Metric for Evaluating Question Generation Systems
EMNLP 2018
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
EMNLP 2018
Semantic Structural Evaluation for Text Simplification
NAACL 2018
Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation
NAACL 2018
Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research
JMLR 2018
<
1
…
62
63
64
…
67
>