Reinforcement Learning › Methods ›

Policy Learning

2068 directly classified papers

Papers per year

Papers

Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data? ACL 2025

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO ICCV 2025

Formally Verified Approximate Policy Iteration AAAI 2025

Leveraging Human Input to Enable Robust, Interactive, and Aligned AI Systems AAAI 2025

Representation-driven Option Discovery in Reinforcement Learning AAAI 2025

The POWER of Ikigai: Optimizing Life Fulfillment with an Integrated User Simulator and Adaptive Hobby Recommender AAAI 2025

Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions AAAI 2025

Continuously evolving rewards in an open-ended environment JMLR 2025

Statistical field theory for Markov decision processes under uncertainty JMLR 2025

Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability JMLR 2025

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes JMLR 2025

A Deployed Online Reinforcement Learning Algorithm in an Oral Health Clinical Trial AAAI 2025

On-Policy Algorithms for Continual Reinforcement Learning (Student Abstract) AAAI 2025

RLLTE: Long-Term Evolution Project of Reinforcement Learning AAAI 2025

Thinking Out Loud: Do Reasoning Models Know When They’re Right? EMNLP 2025

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation EMNLP 2025

Selective Preference Optimization via Token-Level Reward Function Estimation EMNLP 2025

Can GRPO Boost Complex Multimodal Table Understanding? EMNLP 2025

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs EMNLP 2025

StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization EMNLP 2025

One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues EMNLP 2025

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening EMNLP 2025

IntentionFrame: A Semi-Structured, Multi-Aspect Framework for Fine-Grained Conversational Intention Understanding EMNLP 2025

Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning EMNLP 2025

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation ICCV 2025