Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning
1263 directly classified papers
Papers per year
2006: 1
2007: 2
2008: 3
2009: 2
2010: 1
2011: 2
2012: 3
2013: 2
2014: 3
2015: 2
2016: 8
2017: 44
2018: 95
2019: 134
2020: 123
2021: 131
2022: 143
2023: 127
2024: 194
2025: 240
2026: 3
Papers
DiaLLMs: EHR-Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction
ACL 2025
Can GRPO Boost Complex Multimodal Table Understanding?
EMNLP 2025
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
CVPR 2025
The Distributional Reward Critic Framework for Reinforcement Learning Under Perturbed Rewards
AAAI 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
ICCV 2025
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
AAAI 2025
Dense Policy: Bidirectional Autoregressive Learning of Actions
ICCV 2025
DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models
AAAI 2025
Teaching Models to Improve on Tape
AAAI 2025
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Agentic-R1: Distilled Dual-Strategy Reasoning
EMNLP 2025
Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning
AAAI 2025
Finite Expression Method for Solving High-Dimensional Partial Differential Equations
JMLR 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning
WACV 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
EMNLP 2025
ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model
WACV 2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
EMNLP 2025
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater
CVPR 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
APIRL: Deep Reinforcement Learning for REST API Fuzzing
AAAI 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
CTD4 – a Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
AAAI 2025
Learning to Generate Structured Output with Schema Reinforcement Learning
ACL 2025
<
1
…
5
6
7
…
51
>