Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core Methods
Machine Learning
›
Core Methods
›
Evaluation
167 directly classified papers
Papers per year
2007: 1
2009: 1
2010: 1
2011: 2
2012: 1
2013: 2
2014: 1
2015: 1
2017: 1
2018: 7
2019: 15
2020: 14
2021: 11
2022: 25
2023: 31
2024: 24
2025: 29
Papers
ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision
EMNLP 2023
DeltaScore: Fine-Grained Story Evaluation with Perturbations
EMNLP 2023
A Closer Look into Using Large Language Models for Automatic Evaluation
EMNLP 2023
FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
EMNLP 2023
Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus
EMNLP 2023
$p$-value Adjustment for Monotonous, Unbiased, and Fast Clustering Comparison
NIPS 2023
OpenDataVal: a Unified Benchmark for Data Valuation
NIPS 2023
GSLB: The Graph Structure Learning Benchmark
NIPS 2023
Automated Classification of Model Errors on ImageNet
NIPS 2023
Evidence > Intuition: Transferability Estimation for Encoder Selection
EMNLP 2022
A Multifaceted Framework to Evaluate Evasion, Content Preservation, and Misattribution in Authorship Obfuscation Techniques
EMNLP 2022
Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation
EMNLP 2022
The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature
ACL 2022
Dataset Geography: Mapping Language Data to Language Users
ACL 2022
Quantified Reproducibility Assessment of NLP Results
ACL 2022
BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation
NAACL 2022
BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation
ACL 2022
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
ACL 2022
Revisiting text decomposition methods for NLI-based factuality scoring of summaries
EMNLP 2022
Towards a Rigorous Evaluation of Time-Series Anomaly Detection
AAAI 2022
CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification
NIPS 2022
Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case
NIPS 2022
CGLB: Benchmark Tasks for Continual Graph Learning
NIPS 2022
Estimating and Explaining Model Performance When Both Covariates and Labels Shift
NIPS 2022
Better Uncertainty Calibration via Proper Scores for Classification and Beyond
NIPS 2022
<
1
2
3
4
5
6
7
>