LEGOBench: Scientific Leaderboard Generation Benchmark

Shruti Singh; Shoaib Alam; Husain Malwat; Mayank Singh

2024 EMNLP EMNLP 2024

LEGOBench: Scientific Leaderboard Generation Benchmark

Abstract

AbstractThe ever-increasing volume of paper submissions makes it difficult to stay informed about the latest state-of-the-art research. To address this challenge, we introduce LEGOBench, a benchmark for evaluating systems that generate scientific leaderboards. LEGOBench is curated from 22 years of preprint submission data on arXiv and more than 11k machine learning leaderboards on the PapersWithCode portal. We present a language model-based and four graph-based leaderboard generation task configuration. We evaluate popular encoder-only scientific language models as well as decoder-only large language models across these task configurations. State-of-the-art models showcase significant performance gaps in automatic leaderboard generation on LEGOBench. The code is available on GitHub and the dataset is hosted on OSF.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — leaderboard generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shruti Singh , Shoaib Alam , Husain Malwat , Mayank Singh

Topics

Natural Language Processing > Generation > Text Generation Machine Learning > Learning Types > Evaluation Machine Learning > Application Areas > Information Retrieval Artificial Intelligence > Core AI > Information Retrieval

Keywords

benchmark evaluation information retrieval machine learning language model scientific literature scientific paper graph neural network leaderboard generation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024