Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning
1263 directly classified papers
Papers per year
2006: 1
2007: 2
2008: 3
2009: 2
2010: 1
2011: 2
2012: 3
2013: 2
2014: 3
2015: 2
2016: 8
2017: 44
2018: 95
2019: 134
2020: 123
2021: 131
2022: 143
2023: 127
2024: 194
2025: 240
2026: 3
Papers
Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning
AAAI 2025
Thinking Out Loud: Do Reasoning Models Know When They’re Right?
EMNLP 2025
DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models
AAAI 2025
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
EMNLP 2025
Deep Implicit Imitation Reinforcement Learning in Heterogeneous Action Settings
AAAI 2025
Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment
EMNLP 2025
Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning
AAAI 2025
FedAA: A Reinforcement Learning Perspective on Adaptive Aggregation for Fair and Robust Federated Learning
AAAI 2025
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making
EMNLP 2025
Agentic-R1: Distilled Dual-Strategy Reasoning
EMNLP 2025
Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units
EMNLP 2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
EMNLP 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
Teaching Models to Improve on Tape
AAAI 2025
Token-Level Accept or Reject: A Micro Alignment Approach for Large Language Models
IJCAI 2025
VCA: Video Curious Agent for Long Video Understanding
ICCV 2025
Dense Policy: Bidirectional Autoregressive Learning of Actions
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Finite Expression Method for Solving High-Dimensional Partial Differential Equations
JMLR 2025
MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning
WACV 2025
ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model
WACV 2025
Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing
ACL 2025
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
ACL 2025
DiaLLMs: EHR-Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction
ACL 2025
Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling
EMNLP 2025
<
1
2
3
4
5
…
51
>