Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Goal-Conditioned On-Policy Reinforcement Learning
NIPS 2024
Hierarchical Planning and Learning for Robots in Stochastic Settings Using Zero-Shot Option Invention
AAAI 2024
Simplifying Constraint Inference with Inverse Reinforcement Learning
NIPS 2024
Rating-Based Reinforcement Learning
AAAI 2024
Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables
AAAI 2024
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
NIPS 2024
Learning Formal Mathematics From Intrinsic Motivation
NIPS 2024
TRIP NEGOTIATOR: A Travel Persona-aware Reinforced Dialogue Generation Model for Personalized Integrative Negotiation in Tourism
EMNLP 2024
Solving Minimum-Cost Reach Avoid using Reinforcement Learning
NIPS 2024
Enhancing Alignment using Curriculum Learning & Ranked Preferences
EMNLP 2024
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
EMNLP 2024
Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation
NIPS 2024
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
EMNLP 2024
Exploiting Careful Design of SVM Solution for Aspect-term Sentiment Analysis
EMNLP 2024
C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
NIPS 2024
E2CL: Exploration-based Error Correction Learning for Embodied Agents
EMNLP 2024
Latent Learning Progress Drives Autonomous Goal Selection in Human Reinforcement Learning
NIPS 2024
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
EMNLP 2024
Towards Pareto-Efficient RLHF: Paying Attention to a Few High-Reward Samples with Reward Dropout
EMNLP 2024
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
NIPS 2024
Reward Modeling Requires Automatic Adjustment Based on Data Quality
EMNLP 2024
Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment
NIPS 2024
A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies
EMNLP 2024
MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization
EMNLP 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
EMNLP 2024
<
1
…
8
9
10
…
83
>