Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
EMNLP 2025
Image Difference Captioning via Adversarial Preference Optimization
EMNLP 2025
Simple Policy Optimization
ICML 2025
Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improve Without Labels or Model Updates
EMNLP 2025
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
CVPR 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
CVPR 2025
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
CVPR 2025
Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging
CVPR 2025
LLaVA-Critic: Learning to Evaluate Multimodal Models
CVPR 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025
Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning
IJCNLP 2025
Categorical Semantics of Compositional Reinforcement Learning
JMLR 2025
Incorporating Review-missing Interactions for Generative Explainable Recommendation
COLING 2025
Counterfactual Strategies for Markov Decision Processes
IJCAI 2025
Maximum Entropy Softmax Policy Gradient via Entropy Advantage Estimation
IJCAI 2025
SPoRt - Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL
IJCAI 2025
Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning
IJCAI 2025
S-EPOA: Overcoming the Indistinguishability of Segments with Skill-Driven Preference-Based Reinforcement Learning
IJCAI 2025
Imitation Learning via Focused Satisficing
IJCAI 2025
Reward Models in Deep Reinforcement Learning: A Survey
IJCAI 2025
A Case for Validation Buffer in Pessimistic Actor-Critic
IJCAI 2025
Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data
AAAI 2025
Learning to Generate Structured Output with Schema Reinforcement Learning
ACL 2025
Optimizing Decomposition for Optimal Claim Verification
ACL 2025
To Measure or Not: A Cost-Sensitive, Selective Measuring Environment for Agricultural Management Decisions with Reinforcement Learning
AAAI 2025
<
1
…
4
5
6
…
83
>