One-to-many testing for code generation from (just) natural language

Mansi Uniyal; Mukul Singh; Gust Verbruggen; Sumit Gulwani; Vu Le

2024 EMNLP EMNLP 2024

One-to-many testing for code generation from (just) natural language

Abstract

AbstractMBPP is a popular dataset for evaluating the task of code generation from natural language. Despite its popularity, there are three problems: (1) it relies on providing test cases to generate the right signature, (2) there is poor alignment between instruction and evaluation test cases, and (3) contamination of the exact phrasing being present in training datasets. We adapt MBPP to emphasize on generating code from just natural language by (1) removing ambiguity about the semantics of the task from the descriptions, and (2) evaluating generated code on multiple sets of assertions to account for ambiguity in the syntax. We compare popular open and closed weight models on the original (MBPP) and adapted (MBUPP) datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mansi Uniyal , Mukul Singh , Gust Verbruggen , Sumit Gulwani , Vu Le

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Text Generation Artificial Intelligence > Core AI > Natural Language Processing Deep Learning > Learning Types > Code Generation

Keywords

text generation code generation natural language language model unit test test case

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024