Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core Methods
Machine Learning
›
Core Methods
›
Evaluation
167 directly classified papers
Papers per year
2007: 1
2009: 1
2010: 1
2011: 2
2012: 1
2013: 2
2014: 1
2015: 1
2017: 1
2018: 7
2019: 15
2020: 14
2021: 11
2022: 25
2023: 31
2024: 24
2025: 29
Papers
ViM: Out-of-Distribution With Virtual-Logit Matching
CVPR 2022
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models
EMNLP 2022
IsoScore: Measuring the Uniformity of Embedding Space Utilization
ACL 2022
Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?
ACL 2022
Assessing a Single Image in Reference-Guided Image Synthesis
AAAI 2022
Unbiased IoU for Spherical Image Object Detection
AAAI 2022
Azimuth: Systematic Error Analysis for Text Classification
EMNLP 2022
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
EMNLP 2022
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
EMNLP 2022
CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale Multi-Label Text Classification
EMNLP 2021
Surrogate Regret Bounds for Polyhedral Losses
NIPS 2021
Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations
NIPS 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
EMNLP 2021
FAST: A carefully sampled and cognitively motivated dataset for distributional semantic evaluation
EMNLP 2021
MultiLexNorm: A Shared Task on Multilingual Lexical Normalization
EMNLP 2021
TabPert : An Effective Platform for Tabular Perturbation
EMNLP 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
NAACL 2021
Better than Average: Paired Evaluation of NLP systems
IJCNLP 2021
Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions
ACL 2021
Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability
ACL 2021
A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation
ACL 2020
Multi-Hypothesis Machine Translation Evaluation
ACL 2020
An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results
ACL 2020
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models
EMNLP 2020
Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks
AAAI 2020
<
1
2
3
4
5
6
7
>