SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications

Zexuan Zhong; Jiaqi Guo; Wei Yang; Jian Peng; Tao Xie; Jian-Guang Lou; Ting Liu; Dongmei Zhang

2018 EMNLP EMNLP 2018

SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications

Abstract

AbstractRecent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e., semantically equivalent programs may have many syntactically different forms. To address this issue, in this paper, we propose a semantics-based approach named SemRegex. SemRegex provides solutions for a subtask of the program-synthesis problem: generating regular expressions from natural language. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — regular expression generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Zexuan Zhong , Jiaqi Guo , Wei Yang , Jian Peng , Tao Xie , Jian-Guang Lou , Ting Liu , Dongmei Zhang

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Optimization & Theory > Optimization Deep Learning > Models > Generative Models Computer Science > Applications > Software Engineering Machine Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Reasoning Natural Language Processing > Applications > Semantic Parsing

Keywords

program synthesis sequence-to-sequence learning natural language specification semantic correctness regular expression generation test case dfa equivalence

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018