Rémi Munos

117 papers · 2006–2025 · 7 conferences · across top CS/AI conferences

Achievements

🗺️ Taxonomy Completionist (43) 🌈 Renaissance Researcher (9) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🏃 Academic Marathon (19) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13) 🌟 Keyword Trendsetter Combo (6) 🏠 Conference Loyalist (46) 🐺 Lone Wolf (3) 🤝 Dynamic Duo (35) 👑 Triple Crown 🌱 Topic Pioneer 🔬 Deep Specialist (15) 🏆 Keyword Champion 💎 Century Club (117) 🔥 Unstoppable (20) 🗃️ Keyword Collector (212) 📈 Trend Setter 🚀 Conference Pioneer ⚡ Prolific Year (5)

Conferences

NIPS (46) ICML (41) JMLR (11) AISTATS (10) ICLR (7) ACML (1) COLT (1)

Top co-authors

Michal Valko (35) Mark Rowland (27) Will Dabney (23) Yunhao Tang (23) Mohammad Gheshlaghi azar (14) Bilal Piot (12) Daniele Calandriello (10) Bernardo Avila Pires (9) Zhaohan Daniel Guo (8) Pierre Menard (8)

Research topics

Applications (2) Statistics (1)

Keywords

reinforcement learning (23) multi-armed bandit (17) regret bound (17) markov decision process (12) value function (11) stochastic optimization (10) variance reduction (9) deep reinforcement learning (9) distributional reinforcement learning (9) sample complexity (9) value iteration (7) policy gradient (7) off-policy learning (7) online algorithm (7) policy optimization (7) representation learning (6) online learning (6) stratified sampling (5) game theory (5) nash equilibrium (5)

Papers

Optimizing Return Distributions with Distributional Dynamic Programming JMLR 2025

Temporal Difference Flows ICML 2025

Optimizing Language Models for Inference Time Objectives using Reinforcement Learning ICML 2025

Human Alignment of Large Language Models through Online Preference Optimisation ICML 2024

An Analysis of Quantile Temporal-Difference Learning JMLR 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences AISTATS 2024

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model NIPS 2024

Multi-turn Reinforcement Learning with Preference Human Feedback NIPS 2024

Local and Adaptive Mirror Descents in Extensive-Form Games NIPS 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment ICML 2024

Nash Learning from Human Feedback ICML 2024

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition ICML 2023

Towards a better understanding of representation dynamics under TD-learning ICML 2023

Fast Rates for Maximum Entropy Exploration ICML 2023

VA-learning as a more efficient alternative to Q-learning ICML 2023

Model-free Posterior Sampling via Learning Rate Randomization NIPS 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm ICML 2023

Understanding Self-Predictive Learning for Reinforcement Learning ICML 2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation ICML 2023

Quantile Credit Assignment ICML 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice ICML 2023

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments ICML 2023

Adapting to game trees in zero-sum imperfect information games ICML 2023

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees NIPS 2022

Marginalized Operators for Off-policy Reinforcement Learning AISTATS 2022

Generalised Policy Improvement with Geometric Policy Composition ICML 2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning NIPS 2022

BYOL-Explore: Exploration by Bootstrapped Prediction NIPS 2022

Large-Scale Representation Learning on Graphs via Bootstrapping ICLR 2022

Taylor Expansion of Discount Factors ICML 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation NIPS 2021

Learning in two-player zero-sum partially observable Markov games with perfect recall NIPS 2021

Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning ICML 2021

Counterfactual Credit Assignment in Model-Free Reinforcement Learning ICML 2021

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization ICML 2021

Monte-Carlo Tree Search as Regularized Policy Optimization ICML 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning NIPS 2020

Taylor Expansion Policy Optimization ICML 2020

Adaptive Trade-Offs in Off-Policy Learning AISTATS 2020

Conditional Importance Sampling for Off-Policy Learning AISTATS 2020

Spectral bandits JMLR 2020

A Generalized Training Approach for Multiagent Learning ICLR 2020

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning NIPS 2020

Fast computation of Nash Equilibria in Imperfect Information Games ICML 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning ICML 2020

Recurrent Experience Replay in Distributed Reinforcement Learning ICLR 2019

Hindsight Credit Assignment NIPS 2019

Planning in entropy-regularized Markov decision processes and games NIPS 2019

Multiagent Evaluation under Incomplete Information NIPS 2019

The Termination Critic AISTATS 2019

Statistics and Samples in Distributional Reinforcement Learning ICML 2019

Universal Successor Features Approximators ICLR 2019

Maximum a Posteriori Policy Optimisation ICLR 2018

Optimistic optimization of a Brownian NIPS 2018

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments NIPS 2018

An Analysis of Categorical Distributional Reinforcement Learning AISTATS 2018

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning ICLR 2018

Noisy Networks For Exploration ICLR 2018

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement ICML 2018

Implicit Quantile Networks for Distributional Reinforcement Learning ICML 2018

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures ICML 2018

Learning to search with MCTSnets ICML 2018

The Uncertainty Bellman Equation and Exploration ICML 2018

Autoregressive Quantile Networks for Generative Modeling ICML 2018

A Distributional Perspective on Reinforcement Learning ICML 2017

Minimax Regret Bounds for Reinforcement Learning ICML 2017

Successor Features for Transfer in Reinforcement Learning NIPS 2017

Count-Based Exploration with Neural Density Models ICML 2017

Automated Curriculum Learning for Neural Networks ICML 2017

Memory-Efficient Backpropagation Through Time NIPS 2016

Safe and Efficient Off-Policy Reinforcement Learning NIPS 2016

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning NIPS 2016

Analysis of Classification-based Policy Iteration Algorithms JMLR 2016

Unifying Count-Based Exploration and Intrinsic Motivation NIPS 2016

Adaptive Strategy for Stratified Monte Carlo Sampling JMLR 2015

Cheap Bandits ICML 2015

Black-box optimization of noisy functions with unknown smoothness NIPS 2015

Toward Minimax Off-policy Value Estimation AISTATS 2015

Active Regression by Stratification NIPS 2014

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem ICML 2014

Spectral Bandits for Smooth Graph Functions ICML 2014

Best-Arm Identification in Linear Bandits NIPS 2014

Optimistic Planning in Markov Decision Processes Using a Generative Model NIPS 2014

Bounded Regret for Finite-Armed Structured Bandits NIPS 2014

Efficient learning by implicit exploration in bandit problems with side observations NIPS 2014

Thompson Sampling for 1-Dimensional Exponential Family Bandits NIPS 2013

Toward Optimal Stratification for Stratified Monte-Carlo Integration ICML 2013

Stochastic Simultaneous Optimistic Optimization ICML 2013

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes NIPS 2013

Risk-Aversion in Multi-armed Bandits NIPS 2012

Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button NIPS 2012

Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions NIPS 2012

Finite-Sample Analysis of Least-Squares Policy Iteration JMLR 2012

Linear Regression With Random Projections JMLR 2012

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit AISTATS 2012

Optimistic planning for Markov decision processes AISTATS 2012

Speedy Q-Learning NIPS 2011

Selecting the State-Representation in Reinforcement Learning NIPS 2011

Finite Time Analysis of Stratified Sampling for Monte Carlo NIPS 2011

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness NIPS 2011

Sparse Recovery with Brownian Sensing NIPS 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences COLT 2011

-Armed Bandits JMLR 2011

Adaptive Bandits: Towards the best history-dependent strategy AISTATS 2011

LSTD with Random Projections NIPS 2010

Finite-sample Analysis of Bellman Residual Minimization ACML 2010

Scrambled Objects for Least-Squares Regression NIPS 2010

Error Propagation for Approximate Policy and Value Iteration NIPS 2010

Sensitivity analysis in HMMs with application to likelihood maximization NIPS 2009

Compressed Least-Squares Regression NIPS 2009

Particle Filter-based Policy Gradient in POMDPs NIPS 2008

Online Optimization in X-Armed Bandits NIPS 2008

Algorithms for Infinitely Many-Armed Bandits NIPS 2008

Finite-Time Bounds for Fitted Value Iteration JMLR 2008

Fitted Q-iteration in continuous action-space MDPs NIPS 2007

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation JMLR 2006

Policy Gradient in Continuous Time JMLR 2006