LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization

Qi Zhang; Shouqing Yang; Lirong Gao; Hao Chen; Xiaomeng Hu; Jinglei Chen; Jiexiang Wang; Sheng Guo; Bo Zheng; Haobo Wang; Junbo Zhao

2025 EMNLP EMNLP 2025

LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization

Abstract

AbstractLarge language models (LLMs) have demonstrated impressive capabilities in reasoning with the emergence of reasoning models like OpenAI-o1 and DeepSeek-R1. Recent research focuses on integrating reasoning capabilities into the realm of retrieval-augmented generation (RAG) via outcome-supervised reinforcement learning (RL) approaches, while the correctness of intermediate think-and-search steps is usually neglected. To address this issue, we design a process-level reward module to mitigate the unawareness of intermediate reasoning steps in outcome-level supervision without additional annotation. Grounded on this, we propose **Le**arning to **T**hink-and-**S**earch (**LeTS**), a novel framework that hybridizes stepwise process reward and outcome-based reward to current RL methods for RAG. Extensive experiments demonstrate the generalization and inference efficiency of **LeTS** across various RAG benchmarks. In addition, these results reveal the potential of process- and outcome-level reward hybridization in boosting LLMs’ reasoning ability via RL under other scenarios.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — outcome reward

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qi Zhang , Shouqing Yang , Lirong Gao , Hao Chen , Xiaomeng Hu , Jinglei Chen , Jiexiang Wang , Sheng Guo , Bo Zheng , Haobo Wang , Junbo Zhao

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Optimization & Theory > Optimization Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Retrieval-Augmented Generation Natural Language Processing > Generation > Retrieval-Augmented Generation

Keywords

reinforcement learning retrieval-augmented generation reasoning model process reward large language model outcome reward

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025