TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference
Abstract
Abstract Humans increasingly query Large Language Models (LLMs) to accomplish personal tasks according to their individual preferences. However, these preferences are often unconsciously veiled during conversation. To address this, LLMs have to elicit human preferences through multi-turn dialogue, where tasks are accomplished via iterative clarifying questions and final response generated by LLMs as effective questioners. Existing approaches based on self-taught reasoning have two limitations: 1) they struggle to avoid generating irrelevant questions and 2) the final responses to tasks are misled by the conversations. To overcome these limitations, we propose TO-GATE, a novel framework that enhances question generation through trajectory optimization. TO-GATE comprises two key components: a clarification resolver, which generates optimal questioning trajectories to produce effective elicitation questions, and a summarizer, which ensures task-aligned final responses. Experimental results show that TO-GATE significantly outperforms baseline methods, achieving a 9.32% improvement on standard preference elicitation benchmarks.