Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
End-Effector Cartesian Velocity Control for Redundant Loader Cranes Using Reinforcement Learning (Abstract Reprint)
AAAI 2026
Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping
AAAI 2026
SharedRep-RLHF: A Shared Representation Approach to RLHF with Diverse Preferences
AAAI 2026
Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking
AAAI 2026
Learning to Optimize Job Shop Scheduling Under Structural Uncertainty
AAAI 2026
Stability-Aware Reinforcement Learning for Robust Class Integration Test Order Generation
AAAI 2026
HCPO: Hierarchical Conductor-Based Policy Optimization in Multi-Agent Reinforcement Learning
AAAI 2026
Proactive Constrained Policy Optimization with Preemptive Penalty
AAAI 2026
Test-driven Reinforcement Learning in Continuous Control
AAAI 2026
Reward Model Evaluation via Automatically-Ranked Policy Alignment
AAAI 2026
State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning
AAAI 2026
Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions
AAAI 2026
Policy Newton Methods for Distortion Riskmetrics
AAAI 2026
Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning
AAAI 2026
UNO! UNified Offline Training Paradigm for Learning Path Recommendation
AAAI 2026
ReasonAct: Progressive Training for Fine-Grained Video Reasoning in Small Models
AAAI 2026
Expressive Temporal Specifications for Reward Monitoring
AAAI 2026
MARPO: A Reflective Policy Optimization for Multi-Agent Reinforcement Learning
AAAI 2026
Misalignment from Treating Means as Ends
AAAI 2026
Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
AAAI 2026
FD-MAGRPO: Functionality-Driven Multi-Agent Group Relative Policy Optimization for Analog-LDO Sizing
AAAI 2026
T4NMTD: Transition-Centric Reinforcement Learning for Non-Markovian Task Decomposition
AAAI 2026
Stabilizing Policy Gradient Methods via Reward Profiling
AAAI 2026
Qualitative Analysis of ω-Regular Objectives on Robust MDPs
AAAI 2026
Revealing POMDPs: Qualitative and Quantitative Analysis for Parity Objectives
AAAI 2026
<
1
2
3
4
5
…
83
>