Dataset for a Neural Natural Language Interface for Databases (NNLIDB)

Florin Brad; Radu Cristian Alexandru Iacob; Ionel Alexandru Hosu; Traian Rebedea

2017 IJCNLP IJCNLP 2017

Dataset for a Neural Natural Language Interface for Databases (NNLIDB)

Abstract

AbstractProgress in natural language interfaces to databases (NLIDB) has been slow mainly due to linguistic issues (such as language ambiguity) and domain portability. Moreover, the lack of a large corpus to be used as a standard benchmark has made data-driven approaches difficult to develop and compare. In this paper, we revisit the problem of NLIDBs and recast it as a sequence translation problem. To this end, we introduce a large dataset extracted from the Stack Exchange Data Explorer website, which can be used for training neural natural language interfaces for databases. We also report encouraging baseline results on a smaller manually annotated test corpus, obtained using an attention-based sequence-to-sequence neural network.

🌱 Topic Pioneer — Databases

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Deep Learning and Natural Language Processing

📈 Trend Setter — Databases

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Florin Brad , Radu Cristian Alexandru Iacob , Ionel Alexandru Hosu , Traian Rebedea

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Neural Networks Natural Language Processing > Resources & Methods > Language Modeling Computer Science > Applications > Databases

Keywords

attention mechanism semantic parsing database query natural language interface neural network

Download PDF

Related papers

Procedural Text Generation from an Execution Video 2017

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset 2017

Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior 2017

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts 2017

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task 2017