QGFN: Controllable Greediness with Action Values

Elaine Lau; Stephen Zhewen Lu; Ling Pan; Doina Precup; Emmanuel Bengio

2024 NIPS NeurIPS 2024

QGFN: Controllable Greediness with Action Values

Abstract

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — combinatorial object

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Elaine Lau , Stephen Zhewen Lu , Ling Pan , Doina Precup , Emmanuel Bengio

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Core Methods > Representation Learning Deep Learning > Models > Generative Models Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning Deep Learning > Learning Types > Generative Models

Keywords

reinforcement learning action value energy-based model generative flow network combinatorial object reward sampling sampling policy action-value estimate action-value estimation

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024