Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

Yujian Gan; Xinyun Chen; Jinxia Xie; Matthew Purver; John R. Woodward; John Drake; Qiaofu Zhang

2021 EMNLP EMNLP 2021

Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

Abstract

AbstractAddressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts in the text descriptions; (2) removing the need of nested subqueries and set operators; and (3) making the schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — sql generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yujian Gan , Xinyun Chen , Jinxia Xie , Matthew Purver , John R. Woodward , John Drake , Qiaofu Zhang

Topics

Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Learning Types > Representation Learning Natural Language Processing > Applications > Semantic Parsing

Keywords

semantic parsing sql generation intermediate representation natural language interface sql query schema linking text to sql

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021