PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Zhiwei Liu; Weiran Yao; Jianguo Zhang; Rithesh Murthy; Liangwei Yang; ZUXIN LIU; Tian Lan; Ming Zhu; Juntao Tan; Shirley Kokane; Thai Hoang; Juan Carlos Niebles; Shelby Heinecke; Huan Wang; Silvio Savarese; Caiming Xiong

2024 CONLL CoNLL 2024

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Abstract

AbstractWe introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly.We investigate the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, we developed two RPO methods, RPO-Traj and RPO-Batch, to adapt to different settings.Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, can effectively learn and apply action principles to enhance performance.

🧭 Keyword Pioneer — principled reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhiwei Liu , Weiran Yao , Jianguo Zhang , Rithesh Murthy , Liangwei Yang , ZUXIN LIU , Tian Lan , Ming Zhu , Juntao Tan , Shirley Kokane , Thai Hoang , Juan Carlos Niebles , Shelby Heinecke , Huan Wang , Silvio Savarese , Caiming Xiong

Topics

Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

reinforcement learning llm agent principled reasoning reflective optimization action principle

Download PDF

Related papers

Lossy Context Surprisal Predicts Task-Dependent Patterns in Relative Clause Processing 2024

Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models 2024

Transformer verbatim in-context retrieval across time and scale 2024

EditEval: An Instruction-Based Benchmark for Text Improvements 2024

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches For Language Models 2024