2025
ICML
ICML 2025
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Authors
Ian Magnusson
,
Nguyen Tai
,
Ben Bogin
,
David Heineman
,
Jena D. Hwang
,
Luca Soldaini
,
Akshita Bhagia
,
Jiacheng Liu
,
Dirk Groeneveld
,
Oyvind Tafjord
,
Noah A. Smith
,
Pang Wei Koh
,
Jesse Dodge