MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments

Roelien C. Timmer; Necva Bölücü; Stephen Wan

2026 EACL EACL 2026

MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments

Abstract

AbstractLeaderboards are crucial in the machine learning (ML) domain for benchmarking and tracking progress. However, creating leaderboards traditionally demands significant manual effort. In recent years, efforts have been made to automate leaderboard generation, but existing datasets for this purpose are limited by capturing only the best results from each paper and limited metadata. We present MetaLead, a fully human-annotated ML Leaderboard dataset that captures all experimental results for result transparency and contains extra metadata, such as the result experimental type (baseline, proposed method, or variation of proposed method) for experiment-type guided comparisons, and explicitly separates train and test dataset for cross-domain assessment. This enriched structure makes MetaLead a powerful resource for more transparent and nuanced evaluations across ML research. MetaLead dataset and code repository: https://github.com/RoelTim/metalead

🧭 Keyword Pioneer — leaderboard dataset

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Machine Learning, Natural Language Processing

Authors

Roelien C. Timmer , Necva Bölücü , Stephen Wan

Topics

Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Application Areas > Efficient Computing

Keywords

experiment tracking leaderboard dataset transparent reporting

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026