Scaling Up Models and Data with t5x and seqio

Adam Roberts; Hyung Won Chung; Gaurav Mishra; Anselm Levskaya; James Bradbury; Daniel Andor; Sharan Narang; Brian Lester; Colin Gaffney; Afroz Mohiuddin; Curtis Hawthorne; Aitor Lewkowycz; Alex Salcianu; Marc van Zee; Jacob Austin; Sebastian Goodman; Livio Baldini Soares; Haitang Hu; Sasha Tsvyashchenko; Aakanksha Chowdhery; Jasmijn Bastings; Jannis Bulian; Xavier Garcia; Jianmo Ni; Andrew Chen; Kathleen Kenealy; Kehang Han; Michelle Casbon; Jonathan H. Clark; Stephan Lee; Dan Garrette; James Lee-Thorp; Colin Raffel; Noam Shazeer; Marvin Ritter; Maarten Bosma; Alexandre Passos; Jeremy Maitin-Shepard; Noah Fiedel; Mark Omernick; Brennan Saeta; Ryan Sepassi; Alexander Spiridonov; Joshua Newlan; Andrea Gesmundo

2023 JMLR JMLR 2023

Scaling Up Models and Data with t5x and seqio

Abstract

Scaling up training datasets and model parameters have benefited neural network-based language models, but also present challenges like distributed compute, input data bottlenecks and reproducibility of results. We introduce two simple and scalable software libraries that simplify these issues: t5x enables training large language models at scale, while seqio enables reproducible input and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on multi-terabyte datasets. Configurations and instructions for T5-like and GPT-like models are also provided. The libraries can be found at https://github.com/google-research/t5x and https://github.com/google/seqio. [abs] [ pdf ][ bib ] [ code ] © JMLR 2023. (edit, beta)

👥 Mega-Team — 45 authors

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Adam Roberts , Hyung Won Chung , Gaurav Mishra , Anselm Levskaya , James Bradbury , Daniel Andor , Sharan Narang , Brian Lester , Colin Gaffney , Afroz Mohiuddin , Curtis Hawthorne , Aitor Lewkowycz , Alex Salcianu , Marc van Zee , Jacob Austin , Sebastian Goodman , Livio Baldini Soares , Haitang Hu , Sasha Tsvyashchenko , Aakanksha Chowdhery , Jasmijn Bastings , Jannis Bulian , Xavier Garcia , Jianmo Ni , Andrew Chen , Kathleen Kenealy , Kehang Han , Michelle Casbon , Jonathan H. Clark , Stephan Lee , Dan Garrette , James Lee-Thorp , Colin Raffel , Noam Shazeer , Marvin Ritter , Maarten Bosma , Alexandre Passos , Jeremy Maitin-Shepard , Noah Fiedel , Mark Omernick , Brennan Saeta , Ryan Sepassi , Alexander Spiridonov , Joshua Newlan , Andrea Gesmundo

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Resources & Methods > Large Language Models Computer Science > Applications > Software Engineering

Keywords

distributed computing model scaling data pipeline large language model neural network

Download PDF

Related papers

Flexible Model Aggregation for Quantile Regression 2023

Efficient Computation of Rankings from Pairwise Comparisons 2023

Efficient Structure-preserving Support Tensor Train Machine 2023

Attacks against Federated Learning Defense Systems and their Mitigation 2023

How Do You Want Your Greedy: Simultaneous or Repeated? 2023