Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability
JMLR 2025
Augmented Lagrangian Risk-constrained Reinforcement Learning for Portfolio Optimization (Student Abstract)
AAAI 2025
Rule-Guided Reinforcement Learning Policy Evaluation and Improvement
IJCAI 2025
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
JMLR 2025
Imitation Learning via Focused Satisficing
IJCAI 2025
Concurrent Planning and Execution Using Dispatch-Dependent Values
IJCAI 2025
A Case for Validation Buffer in Pessimistic Actor-Critic
IJCAI 2025
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
NIPS 2024
Sample-Efficient Constrained Reinforcement Learning with General Parameterization
NIPS 2024
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
AAAI 2024
Policy Aggregation
NIPS 2024
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
NIPS 2024
Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment
AAAI 2024
ABLE: Personalized Disability Support with Politeness and Empathy Integration
EMNLP 2024
Flipping-based Policy for Chance-Constrained Markov Decision Processes
NIPS 2024
Periodic agent-state based Q-learning for POMDPs
NIPS 2024
P2BPO: Permeable Penalty Barrier-Based Policy Optimization for Safe RL
AAAI 2024
RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning
NIPS 2024
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
NIPS 2024
Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning
AAAI 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
EMNLP 2024
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
EMNLP 2024
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
NIPS 2024
How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach
NIPS 2024
An Implicit Trust Region Approach to Behavior Regularized Offline Reinforcement Learning
AAAI 2024
<
1
…
7
8
9
…
83
>