Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning
1263 directly classified papers
Papers per year
2006: 1
2007: 2
2008: 3
2009: 2
2010: 1
2011: 2
2012: 3
2013: 2
2014: 3
2015: 2
2016: 8
2017: 44
2018: 95
2019: 134
2020: 123
2021: 131
2022: 143
2023: 127
2024: 194
2025: 240
2026: 3
Papers
LookAlike: Consistent Distractor Generation in Math MCQs
ACL 2025
Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback
ACL 2025
Selective Preference Optimization via Token-Level Reward Function Estimation
EMNLP 2025
Contrastive Representation for Interactive Recommendation
AAAI 2025
Highly Imperceptible Black-Box Graph Injection Attacks with Reinforcement Learning
AAAI 2025
Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning
AAAI 2025
DreamAlign: Dynamic Text-to-3D Optimization with Human Preference Alignment
AAAI 2025
Thinking Out Loud: Do Reasoning Models Know When They’re Right?
EMNLP 2025
RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution
EMNLP 2025
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
EMNLP 2025
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
ICCV 2025
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
ICCV 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning
ICCV 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
FlowPolicy: Enabling Fast and Robust 3D Flow-Based Policy via Consistency Flow Matching for Robot Manipulation
AAAI 2025
POI Recommendation via Multi-Objective Adversarial Imitation Learning
AAAI 2025
The Distributional Reward Critic Framework for Reinforcement Learning Under Perturbed Rewards
AAAI 2025
Teaching Models to Improve on Tape
AAAI 2025
Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning
AAAI 2025
DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models
AAAI 2025
Deep Implicit Imitation Reinforcement Learning in Heterogeneous Action Settings
AAAI 2025
VCA: Video Curious Agent for Long Video Understanding
ICCV 2025
Intelligent OPC Engineer Assistant for Semiconductor Manufacturing
AAAI 2025
<
1
…
4
5
6
…
51
>