Csaba Szepesvári

158 papers · 2007–2025 · 11 conferences · across top CS/AI conferences

Achievements

🌍 Conference Polyglot (11) 🗺️ Taxonomy Completionist (48) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10) 🏃 Academic Marathon (18) 🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (56) 🌟 Keyword Trendsetter Combo (3) 🌱 Topic Pioneer 👑 Triple Crown 🔬 Deep Specialist (11) 🏆 Keyword Champion (3) 🧬 Topic Evolution 🏆 Grand Slam 🤝 Dynamic Duo (33) ❓ The Questioner (3) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (19) ⚡ Prolific Year (12) 💎 Century Club (158) 🗃️ Keyword Collector (201)

Conferences

NIPS (56) ICML (37) AISTATS (26) COLT (14) ALT (7) JMLR (6) UAI (4) ICLR (3) IJCAI (3) AAAI (1) L4DC (1)

Top co-authors

András György (33) Tor Lattimore (22) Branislav Kveton (16) Dale Schuurmans (15) Gellert Weisz (13) Bo Dai (11) Jincheng Mei (10) Yasin Abbasi-Yadkori (10) Mohammad Ghavamzadeh (9) Mengdi Wang (8)

Keywords

regret bound (49) online learning (31) multi-armed bandit (25) markov decision process (20) stochastic optimization (20) reinforcement learning (16) sample complexity (13) linear function approximation (12) policy iteration (9) regret analysis (8) partial monitoring (8) function approximation (8) regret minimization (7) online algorithm (7) thompson sampling (7) value function (7) contextual bandit (7) learning to rank (6) policy optimization (6) stochastic bandit (6)

Papers

Thompson Sampling for Bandit Convex Optimisation COLT 2025

Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits NIPS 2024

To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty NIPS 2024

Ensemble sampling for linear bandits: small ensembles suffice NIPS 2024

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates NIPS 2024

Switching the Loss Reduces the Cost in Batch Reinforcement Learning ICML 2024

Exploration via linearly perturbed loss minimisation AISTATS 2024

Stochastic Gradient Descent for Gaussian Processes Done Right ICLR 2024

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^\pi$-Realizability and Concentrability NIPS 2024

Confident Natural Policy Gradient for Local Planning in $q_\pi$-realizable Constrained MDPs NIPS 2024

Context-lumpable stochastic bandits NIPS 2023

Exponential Hardness of Reinforcement Learning with Linear Function Approximation COLT 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice ICML 2023

Revisiting Simple Regret: Fast Rates for Returning a Good Arm ICML 2023

Stochastic Gradient Succeeds for Bandits ICML 2023

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation ICML 2023

Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning AISTATS 2023

Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore NIPS 2023

Regret Minimization via Saddle Point Optimization NIPS 2023

Ordering-based Conditions for Global Convergence of Policy Gradient Methods NIPS 2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL NIPS 2023

Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics ICLR 2023

The Role of Baselines in Policy Gradient Optimization NIPS 2022

A free lunch from the noise: Provable and practical exploration for representation learning UAI 2022

Towards painless policy optimization for constrained MDPs UAI 2022

When Is Partially Observable Reinforcement Learning Not Scary? COLT 2022

Efficient local planning with linear function approximation ALT 2022

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions ALT 2022

The Curse of Passive Data Collection in Batch Reinforcement Learning AISTATS 2022

Faster Rates, Adaptive Algorithms, and Finite-Time Bounds for Linear Composition Optimization and Gradient TD Learning AISTATS 2022

Confident Least Square Value Iteration with Local Access to a Simulator AISTATS 2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization NIPS 2022

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs NIPS 2022

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games NIPS 2022

Near-Optimal Sample Complexity Bounds for Constrained MDPs NIPS 2022

On the Optimality of Batch Policy Optimization Algorithms ICML 2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method NIPS 2021

Understanding the Effect of Stochasticity in Policy Optimization NIPS 2021

No Regrets for Learning the Prior in Bandits NIPS 2021

On the Role of Optimization in Double Descent: A Least Squares Study NIPS 2021

Online Sparse Reinforcement Learning AISTATS 2021

Adaptive Approximate Policy Iteration AISTATS 2021

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting AISTATS 2021

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions ALT 2021

Asymptotically Optimal Information-Directed Sampling COLT 2021

**Paper retracted by author request (see pdf for retraction notice from the authors)** Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping COLT 2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function COLT 2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes COLT 2021

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient ICML 2021

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference ICML 2021

A Distribution-dependent Analysis of Meta Learning ICML 2021

Meta-Thompson Sampling ICML 2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration ICML 2021

Leveraging Non-uniformity in First-order Non-convex Optimization ICML 2021

Tighter Risk Certificates for Neural Networks JMLR 2021

A simpler approach to accelerated optimization: iterative averaging meets optimism ICML 2020

Online Algorithm for Unsupervised Sequential Selection with Contextual Information NIPS 2020

Differentiable Meta-Learning of Bandit Policies NIPS 2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities NIPS 2020

CoinDICE: Off-Policy Confidence Interval Estimation NIPS 2020

Randomized Exploration in Generalized Linear Bandits AISTATS 2020

Adaptive Exploration in Linear Contextual Bandit AISTATS 2020

Model Selection in Contextual Stochastic Bandit Problems NIPS 2020

PAC-Bayes Analysis Beyond the Usual Bounds NIPS 2020

ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool NIPS 2020

Efficient Planning in Large MDPs with Weak Linear Function Approximation NIPS 2020

Escaping the Gravitational Pull of Softmax NIPS 2020

Model-Based Reinforcement Learning with Value-Targeted Regression L4DC 2020

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers JMLR 2020

On the Global Convergence Rates of Softmax Policy Gradient Methods ICML 2020

Learning with Good Feature Representations in Bandits and in RL with a Generative Model ICML 2020

Exploration by Optimisation in Partial Monitoring COLT 2020

Model-Based Reinforcement Learning with Value-Targeted Regression ICML 2020

Behaviour Suite for Reinforcement Learning ICLR 2020

Online Algorithm for Unsupervised Sensor Selection AISTATS 2019

Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging NIPS 2019

Detecting Overfitting via Adversarial Examples NIPS 2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits IJCAI 2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring COLT 2019

POLITEX: Regret Bounds for Policy Iteration using Expert Prediction ICML 2019

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits ICML 2019

Perturbed-History Exploration in Stochastic Linear Bandits UAI 2019

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback UAI 2019

Model-Free Linear Quadratic Control via Reduction to Expert Prediction AISTATS 2019

An Exponential Tail Bound for the Deleted Estimate AAAI 2019

An Exponential Efron-Stein Inequality for $L_q$ Stable Learning Rules ALT 2019

Cleaning up the neighborhood: A full classification for adversarial partial monitoring ALT 2019

Online Learning to Rank with Features ICML 2019

Distribution-Dependent Analysis of Gibbs-ERM Principle COLT 2019

CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration ICML 2019

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration ICML 2018

Bandits with Delayed, Aggregated Anonymous Feedback ICML 2018

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers ICML 2018

TopRank: A practical algorithm for online stochastic ranking NIPS 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors NIPS 2018

Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? AISTATS 2018

Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities JMLR 2017

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds ALT 2017

Stochastic Rank-1 Bandits AISTATS 2017

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits AISTATS 2017

Unsupervised Sequential Sensor Acquisition AISTATS 2017

Multi-view Matrix Factorization for Linear Dynamical System Estimation NIPS 2017

Structured Best Arm Identification with Fixed Confidence ALT 2017

Online Learning to Rank in Stochastic Click Models ICML 2017

Bernoulli Rank-1 Bandits for Click Feedback IJCAI 2017

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities NIPS 2016

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles AISTATS 2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models COLT 2016

Regularized Policy Iteration with Nonparametric Function Spaces JMLR 2016

DCM Bandits: Learning to Rank with Multiple Clicks ICML 2016

Conservative Bandits ICML 2016

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control ICML 2016

Shifting Regret, Mirror Descent, and Matrices ICML 2016

SDP Relaxation with Randomized Rounding for Energy Disaggregation NIPS 2016

Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM AISTATS 2015

Fast Cross-Validation for Incremental Learning IJCAI 2015

Online Learning with Gaussian Payoffs and Side Observations NIPS 2015

Linear Multi-Resource Allocation with Semi-Bandit Feedback NIPS 2015

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path NIPS 2015

Deterministic Independent Component Analysis ICML 2015

Cascading Bandits: Learning to Rank in the Cascade Model ICML 2015

On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments ICML 2015

Near-optimal max-affine estimators for convex regression AISTATS 2015

Combinatorial Cascading Bandits NIPS 2015

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits AISTATS 2015

Toward Minimax Off-policy Value Estimation AISTATS 2015

Universal Option Models NIPS 2014

Online Learning in Markov Decision Processes with Changing Cost Sequences ICML 2014

A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models AISTATS 2014

Adaptive Monte Carlo via Bandit Allocation ICML 2014

Online Learning with Costly Features and Labels NIPS 2013

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions NIPS 2013

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning ICML 2013

Online Learning under Delayed Feedback ICML 2013

Cost-sensitive Multiclass Classification Risk Bounds ICML 2013

Characterizing the Representer Theorem ICML 2013

Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits AISTATS 2012

The adversarial stochastic shortest path problem with unknown transition probabilities AISTATS 2012

Deep Representations and Codes for Image Auto-Annotation NIPS 2012

Regret Bounds for the Adaptive Control of Linear Quadratic Systems COLT 2011

Improved Algorithms for Linear Stochastic Bandits NIPS 2011

-Armed Bandits JMLR 2011

Agnostic KWIK learning and efficient approximate reinforcement learning COLT 2011

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments COLT 2011

Error Propagation for Approximate Policy and Value Iteration NIPS 2010

Online Markov Decision Processes under Bandit Feedback NIPS 2010

Parametric Bandits: The Generalized Linear Case NIPS 2010

A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping AISTATS 2010

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs NIPS 2010

REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization AISTATS 2010

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation NIPS 2009

A General Projection Property for Distribution Families NIPS 2009

Multi-Step Dyna Planning for Policy Evaluation and Control NIPS 2009

Finite-Time Bounds for Fitted Value Iteration JMLR 2008

Regularized Policy Iteration NIPS 2008

A Convergent $O(n)$ Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation NIPS 2008

Online Optimization in X-Armed Bandits NIPS 2008

Fitted Q-iteration in continuous action-space MDPs NIPS 2007