Scaling Up Models and Data with t5x and seqio
Abstract
Scaling up training datasets and model parameters have benefited neural network-based language models, but also present challenges like distributed compute, input data bottlenecks and reproducibility of results. We introduce two simple and scalable software libraries that simplify these issues: t5x enables training large language models at scale, while seqio enables reproducible input and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on multi-terabyte datasets. Configurations and instructions for T5-like and GPT-like models are also provided. The libraries can be found at https://github.com/google-research/t5x and https://github.com/google/seqio. [abs] [ pdf ][ bib ] [ code ] © JMLR 2023. (edit, beta)