Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning
1263 directly classified papers
Papers per year
2006: 1
2007: 2
2008: 3
2009: 2
2010: 1
2011: 2
2012: 3
2013: 2
2014: 3
2015: 2
2016: 8
2017: 44
2018: 95
2019: 134
2020: 123
2021: 131
2022: 143
2023: 127
2024: 194
2025: 240
2026: 3
Papers
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
AAAI 2025
Deep Reinforcement Learning with Time-Scale Invariant Memory
AAAI 2025
Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration
AAAI 2025
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
ACL 2025
LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection
ACL 2025
ASTRO: Automatic Strategy Optimization For Non-Cooperative Dialogues
ACL 2025
Enhancing Predictive Healthcare Using AI-Driven Early Warning Systems
AAAI 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
EMNLP 2025
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
EMNLP 2025
RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters
EMNLP 2025
FedAA: A Reinforcement Learning Perspective on Adaptive Aggregation for Fair and Robust Federated Learning
AAAI 2025
COPR: Continual Human Preference Learning via Optimal Policy Regularization
ACL 2025
Robust Preference Optimization via Dynamic Target Margins
ACL 2025
Token-Level Accept or Reject: A Micro Alignment Approach for Large Language Models
IJCAI 2025
RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications
ICCV 2025
Sample Efficient Alignment Learning With Episodic Control
EMNLP 2025
Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling
EMNLP 2025
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
EMNLP 2025
Dense Policy: Bidirectional Autoregressive Learning of Actions
ICCV 2025
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
ICCV 2025
Playpen: An Environment for Exploring Learning From Dialogue Game Feedback
EMNLP 2025
<
1
…
8
9
10
…
51
>