2024 EMNLP EMNLP 2024

Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

Abstract

AbstractIn-context learning with large language models (LLMs) is the current mainstream method for text-to-SQL. Previous studies have explored selecting relevant demonstrations from a human-labeled demonstration pool, but these methods lack diversity and incur high labeling costs. In this work, we address measuring and enhancing the diversity of the text-to-SQL demonstration pool. First, we introduce a diversity metric and present that the diversity of the existing labeling data can be further enhanced. Motivated by these findings, we propose Fused that iteratively fuses demonstrations to create a diverse demonstration pool based on human labeling or even from scratch with LLMs, reducing labeling costs. Fused achieves an average improvement of 2.1% based on existing labeling and 5.5% from scratch on several mainstream datasets, demonstrating its effectiveness.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio