RetrySQL: Text-to-SQL Training with Retry Data for Self-Correcting Query Generation

Alicja Rączkowska; Riccardo Belluzzo; Piotr Zieliński; Joanna Baran; Paweł Olszewski

2026 AAAI AAAI 2026

RetrySQL: Text-to-SQL Training with Retry Data for Self-Correcting Query Generation

Abstract

Abstract The text-to-SQL task is an active challenge in Natural Language Processing. Many existing solutions focus on using black-box language models extended with specialized components within customized end-to-end text-to-SQL pipelines. While these solutions use both closed-source proprietary language models and coding-oriented open-source models, there is a lack of research regarding SQL-specific small generative models. At the same time, recent advancements in self-correcting generation strategies show promise for improving the capabilities of existing architectures. The application of these concepts to the text-to-SQL task remains unexplored. In this paper, we introduce RetrySQL, a new approach to training text-to-SQL generation models. We prepare reasoning steps for reference SQL queries and then corrupt them to create retry data that contains both incorrect and corrected steps, divided with a special token. We continuously pre-train open-source coding models with this data and demonstrate that retry steps yield an improvements of up to 4 and 9 percentage points for overall and challenging execution metrics, respectively, as compared to pre-training without retry data. We showcase that the self-correcting behavior is learned by the model and the increase in downstream accuracy metrics is a result of this additional skill. Finally, we incorporate RetrySQL-trained models into the full text-to-SQL pipeline and showcase that they are competitive in terms of execution accuracy with proprietary models that contain orders of magnitude more parameters. RetrySQL demonstrates that self-correction can be learned in the text-to-SQL task and provides a novel way of improving generation accuracy for small SQL-oriented language models.

🧭 Keyword Pioneer — self-correcting generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alicja Rączkowska , Riccardo Belluzzo , Piotr Zieliński , Joanna Baran , Paweł Olszewski

Topics

Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Text Classification

Keywords

code generation language model text-to-sql generation self-correcting generation retry datum

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026