TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Stefan Hegselmann; Alejandro Buendia; Hunter Lang; Monica Agrawal; Xiaoyi Jiang; David Sontag

2023 AISTATS AISTATS 2023

TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Abstract

We study the application of large language models to zero-shot and few-shot classification of tabular data. We prompt the large language model with a serialization of the tabular data to a natural-language string, together with a short description of the classification problem. In the few-shot setting, we fine-tune the large language model using some labeled examples. We evaluate several serialization methods including templates, table-to-text models, and large language models. Despite its simplicity, we find that this technique outperforms prior deep-learning-based tabular classification methods on several benchmark datasets. In most cases, even zero-shot classification obtains non-trivial performance, illustrating the method’s ability to exploit prior knowledge encoded in large language models. Unlike many deep learning methods for tabular datasets, this approach is also competitive with strong traditional baselines like gradient-boosted trees, especially in the very-few-shot setting.

🧭 Keyword Pioneer — text serialization

🐣 Hot Topic Early Bird — large language models

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

Authors

Stefan Hegselmann , Alejandro Buendia , Hunter Lang , Monica Agrawal , Xiaoyi Jiang , David Sontag

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Zero-Shot Learning Machine Learning > Learning Paradigms > Few-Shot Learning Machine Learning > Learning Types > Few-Shot Learning Deep Learning > Models > Large Language Models

Keywords

zero-shot learning few-shot learning tabular datum zero-shot classification text serialization large language model

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023