← Core Methods

Machine Learning › Core Methods ›

Evaluation

167 directly classified papers

Papers per year

Papers

ViM: Out-of-Distribution With Virtual-Logit Matching CVPR 2022

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models EMNLP 2022

IsoScore: Measuring the Uniformity of Embedding Space Utilization ACL 2022

Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE? ACL 2022

Assessing a Single Image in Reference-Guided Image Synthesis AAAI 2022

Unbiased IoU for Spherical Image Object Detection AAAI 2022

Azimuth: Systematic Error Analysis for Text Classification EMNLP 2022

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way EMNLP 2022

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation EMNLP 2022

CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale Multi-Label Text Classification EMNLP 2021

Surrogate Regret Bounds for Polyhedral Losses NIPS 2021

Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations NIPS 2021

The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation EMNLP 2021

FAST: A carefully sampled and cognitively motivated dataset for distributional semantic evaluation EMNLP 2021

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization EMNLP 2021

TabPert : An Effective Platform for Tabular Perturbation EMNLP 2021

Robustness Gym: Unifying the NLP Evaluation Landscape NAACL 2021

Better than Average: Paired Evaluation of NLP systems IJCNLP 2021

Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions ACL 2021

Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability ACL 2021

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation ACL 2020

Multi-Hypothesis Machine Translation Evaluation ACL 2020

An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results ACL 2020

Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models EMNLP 2020

Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks AAAI 2020