Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Towards A Better Initial Policy Model For Scalable Long-CoT Reinforcement Learning
ACL 2025
VLP: Vision-Language Preference Learning for Embodied Manipulation
EMNLP 2025
Token-level Proximal Policy Optimization for Query Generation
EMNLP 2025
One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues
EMNLP 2025
Reinforcement Learning for Large Language Models via Group Preference Reward Shaping
EMNLP 2025
RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning
EMNLP 2025
MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning
EMNLP 2025
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
EMNLP 2025
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
EMNLP 2025
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
EMNLP 2025
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
EMNLP 2025
RLLTE: Long-Term Evolution Project of Reinforcement Learning
AAAI 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
Thinking Out Loud: Do Reasoning Models Know When They’re Right?
EMNLP 2025
On-Policy Algorithms for Continual Reinforcement Learning (Student Abstract)
AAAI 2025
A Deployed Online Reinforcement Learning Algorithm in an Oral Health Clinical Trial
AAAI 2025
Augmented Lagrangian Risk-constrained Reinforcement Learning for Portfolio Optimization (Student Abstract)
AAAI 2025
T-REG: Preference Optimization with Token-Level Reward Regularization
ACL 2025
Selective Preference Optimization via Token-Level Reward Function Estimation
EMNLP 2025
Can GRPO Boost Complex Multimodal Table Understanding?
EMNLP 2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
EMNLP 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
EMNLP 2025
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
EMNLP 2025
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
EMNLP 2025
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
EMNLP 2025
<
1
…
5
6
7
…
83
>