Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
CVPR 2025
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
CVPR 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
CVPR 2025
Incorporating Review-missing Interactions for Generative Explainable Recommendation
COLING 2025
SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning
RSS 2025
Leveraging LLM-based sentiment analysis for portfolio optimization with proximal policy optimization
ACL 2025
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning
ACL 2025
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
ACL 2025
PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
ACL 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
ACL 2025
Towards Medical Complex Reasoning with LLMs through Medical Verifiable Problems
ACL 2025
CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
EMNLP 2025
DOGlove: Dexterous Manipulation with a Low-Cost Open-Source Haptic Force Feedback Glove
RSS 2025
Offline Reinforcement Learning for LLM Multi-step Reasoning
ACL 2025
Imitation Learning via Focused Satisficing
IJCAI 2025
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
ACL 2025
Proactive Guidance of Multi-Turn Conversation in Industrial Search
ACL 2025
T-REG: Preference Optimization with Token-Level Reward Regularization
ACL 2025
Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
ACL 2025
SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
ACL 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
ACL 2025
Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
ACL 2025
Image Difference Captioning via Adversarial Preference Optimization
EMNLP 2025
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
ACL 2025
<
1
2
3
4
5
…
83
>