Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?
ACL 2025
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
ICCV 2025
Formally Verified Approximate Policy Iteration
AAAI 2025
Leveraging Human Input to Enable Robust, Interactive, and Aligned AI Systems
AAAI 2025
Representation-driven Option Discovery in Reinforcement Learning
AAAI 2025
The POWER of Ikigai: Optimizing Life Fulfillment with an Integrated User Simulator and Adaptive Hobby Recommender
AAAI 2025
Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions
AAAI 2025
Continuously evolving rewards in an open-ended environment
JMLR 2025
Statistical field theory for Markov decision processes under uncertainty
JMLR 2025
Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability
JMLR 2025
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
JMLR 2025
A Deployed Online Reinforcement Learning Algorithm in an Oral Health Clinical Trial
AAAI 2025
On-Policy Algorithms for Continual Reinforcement Learning (Student Abstract)
AAAI 2025
RLLTE: Long-Term Evolution Project of Reinforcement Learning
AAAI 2025
Thinking Out Loud: Do Reasoning Models Know When They’re Right?
EMNLP 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
EMNLP 2025
Selective Preference Optimization via Token-Level Reward Function Estimation
EMNLP 2025
Can GRPO Boost Complex Multimodal Table Understanding?
EMNLP 2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
EMNLP 2025
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
EMNLP 2025
One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues
EMNLP 2025
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
EMNLP 2025
IntentionFrame: A Semi-Structured, Multi-Aspect Framework for Fine-Grained Conversational Intention Understanding
EMNLP 2025
Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning
EMNLP 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
<
1
2
3
4
5
…
83
>