Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Hyungjoo Chae; Taeyoon Kwon; Seungjun Moon; Yongho Song; Dongjin Kang; Kai Tzu-iunn Ong; Beong-woo Kwak; Seonghyeon Bae; Seung-won Hwang; Jinyoung Yeo

2024 EMNLP EMNLP 2024

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Abstract

AbstractThis paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans’ code edit traces for coding questions and human-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs’ code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available in https://huggingface.co/spaces/Coffee-Gym/Project-Coffee-Gym.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — unit test

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hyungjoo Chae , Taeyoon Kwon , Seungjun Moon , Yongho Song , Dongjin Kang , Kai Tzu-iunn Ong , Beong-woo Kwak , Seonghyeon Bae , Seung-won Hwang , Jinyoung Yeo

Topics

Artificial Intelligence > Core AI > Foundation Models Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Text Generation Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning reward function code language model unit test code editing natural language feedback code editing feedback unit test evaluation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024