Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Offline RL
725 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 1
2012: 2
2014: 3
2015: 2
2016: 6
2017: 4
2018: 8
2019: 29
2020: 60
2021: 105
2022: 129
2023: 187
2024: 126
2025: 37
2026: 22
Papers
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
COLT 2021
A Workflow for Offline Model-Free Robotic Reinforcement Learning
CORL 2021
Safe Driving via Expert Guided Policy Optimization
CORL 2021
Learning Off-Policy with Online Planning
CORL 2021
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
NIPS 2021
Control Variates for Slate Off-Policy Evaluation
NIPS 2021
Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
NIPS 2020
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
UAI 2020
Private Reinforcement Learning with PAC and Regret Guarantees
ICML 2020
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
ICML 2020
Counterfactual Data Augmentation using Locally Factored Dynamics
NIPS 2020
Multi-task Batch Reinforcement Learning with Metric Learning
NIPS 2020
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
NIPS 2020
Critic Regularized Regression
NIPS 2020
Off-Policy Interval Estimation with Lipschitz Value Iteration
NIPS 2020
Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
NIPS 2020
Offline Imitation Learning with a Misspecified Simulator
NIPS 2020
CoinDICE: Off-Policy Confidence Interval Estimation
NIPS 2020
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
NIPS 2020
Provably Efficient Neural GTD for Off-Policy Learning
NIPS 2020
Off-Policy Imitation Learning from Observations
NIPS 2020
A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
NIPS 2020
Self-Imitation Learning via Generalized Lower Bound Q-learning
NIPS 2020
MOPO: Model-based Offline Policy Optimization
NIPS 2020
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
NIPS 2020
<
1
…
24
25
26
…
29
>