Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Proactive Constrained Policy Optimization with Preemptive Penalty
AAAI 2026
MTRL-CG: Multi-Task Reinforcement Learning Method with Spectral Clustering-Based Task Grouping
AAAI 2026
Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning
AAAI 2026
Reward Model Evaluation via Automatically-Ranked Policy Alignment
AAAI 2026
Qualitative Analysis of ω-Regular Objectives on Robust MDPs
AAAI 2026
Learning to Optimize Job Shop Scheduling Under Structural Uncertainty
AAAI 2026
UNO! UNified Offline Training Paradigm for Learning Path Recommendation
AAAI 2026
FD-MAGRPO: Functionality-Driven Multi-Agent Group Relative Policy Optimization for Analog-LDO Sizing
AAAI 2026
Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions
AAAI 2026
Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
AAAI 2026
MARPO: A Reflective Policy Optimization for Multi-Agent Reinforcement Learning
AAAI 2026
Stability-Aware Reinforcement Learning for Robust Class Integration Test Order Generation
AAAI 2026
Revealing POMDPs: Qualitative and Quantitative Analysis for Parity Objectives
AAAI 2026
Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking
AAAI 2026
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
EACL 2026
ReasonAct: Progressive Training for Fine-Grained Video Reasoning in Small Models
AAAI 2026
Expressive Temporal Specifications for Reward Monitoring
AAAI 2026
Stabilizing Policy Gradient Methods via Reward Profiling
AAAI 2026
State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning
AAAI 2026
T4NMTD: Transition-Centric Reinforcement Learning for Non-Markovian Task Decomposition
AAAI 2026
Policy Newton Methods for Distortion Riskmetrics
AAAI 2026
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration
AAAI 2026
Test-driven Reinforcement Learning in Continuous Control
AAAI 2026
HCPO: Hierarchical Conductor-Based Policy Optimization in Multi-Agent Reinforcement Learning
AAAI 2026
Misalignment from Treating Means as Ends
AAAI 2026
<
1
2
3
4
5
…
83
>