Learning to Ideate for Machine Learning Engineering Agents

Yunxiang Zhang; Kang Zhou; Zhichao Xu; Kiran Ramnath; Yun Zhou; Sangmin Woo; Haibo Ding; Lin Lee Cheong

2026 EACL EACL 2026

Learning to Ideate for Machine Learning Engineering Agents

Abstract

AbstractExisting machine learning engineering (MLE) agents struggle to iteratively optimize their implemented algorithms for effectiveness. To address this, we introduce MLE-Ideator, a dual-agent framework that separates ideation from implementation. In our system, an implementation agent can request strategic help from a dedicated Ideator. We show this approach is effective in two ways. First, in a training-free setup, our framework significantly outperforms implementation-only agent baselines on MLE-Bench. Second, we demonstrate that the Ideator can be trained with reinforcement learning (RL) to generate more effective ideas. With only 1K training samples from 10 MLE tasks, our RL-trained Qwen3-8B Ideator achieves an 11.5% relative improvement compared to its untrained counterpart and surpasses Claude Sonnet 3.5. These results highlights a promising path toward training strategic AI systems for scientific discovery.

🧭 Keyword Pioneer — agent optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yunxiang Zhang , Kang Zhou , Zhichao Xu , Kiran Ramnath , Yun Zhou , Sangmin Woo , Haibo Ding , Lin Lee Cheong

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing

Keywords

reinforcement learning strategic reasoning machine learning engineering agent optimization dual-agent framework

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026