2025
EMNLP
EMNLP 2025
SQUiD: Synthesizing Relational Databases from Unstructured Text
Abstract
AbstractRelational databases are central to modern data management, yet most data exists in unstructured forms like text documents. To bridge this gap, we leverage large language models (LLMs) to automatically synthesize a relational database by generating its schema and populating its tables from raw text. We introduce SQUiD, a novel neurosymbolic framework that decomposes this task into four stages, each with specialized techniques. Our experiments show that SQUiD consistently outperforms baselines across diverse datasets. Our code and datasets are publicly available at: https://github.com/Mushtari-Sadia/SQUiD.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— relational database synthesis
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Causal Inference
Machine Learning > Core Methods > Representation Learning
Machine Learning > Application Areas > Domain Adaptation
Natural Language Processing > Applications > Information Extraction
Machine Learning > Learning Types > In-Context Learning
Artificial Intelligence > Core AI > Knowledge Graphs
Natural Language Processing > Applications > Semantic Parsing
Deep Learning > Learning Types > Retrieval-Augmented Generation