What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects

Naihao Deng; Sheng Zhang; Henghui Zhu; Shuaichen Chang; Jiani Zhang; Alexander Hanbo Li; Chung-Wei Hang; Hideo Kobayashi; Yiqun Hu; Patrick Ng

2026 EACL EACL 2026

What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects

Abstract

AbstractTable modeling has progressed for decades. In this work, we revisit this trajectory and highlight emerging challenges in the LLM era, particularly the paradox of choice: the difficulty of attributing performance gains amid diverse base models and training sets in the context of table instruction tuning. We replicate four table LLMs by instruction-tuning three foundation models on four existing datasets, yielding 12 models. We then evaluate these models across 16 table benchmarks. Our study is the first to quantitatively disentangle the effects of training data and base model selection, revealing that base model choice plays a more dominant role than the training data itself. Generalization and reasoning remain challenging, inviting future effort on table modeling. Based on our findings, we share our thoughts on the future directions for table modeling.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Naihao Deng , Sheng Zhang , Henghui Zhu , Shuaichen Chang , Jiani Zhang , Alexander Hanbo Li , Chung-Wei Hang , Hideo Kobayashi , Yiqun Hu , Patrick Ng

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Learning Paradigms > Transfer Learning Deep Learning > Techniques > Pretraining

Keywords

transfer learning instruction tuning foundation model table processing

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026