Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
AAAI 2026
Pseudo-Likelihood Training for Reasoning Diffusion Language Models
EACL 2026
Rectify Evaluation Preference: Improving LLMs’ Critique on Math Reasoning via Perplexity-aware Reinforcement Learning
AAAI 2026
RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow
AAAI 2026
SHARE: Synthesizing Heterogeneous Autism-support Records into Evidence-based Recommendations
AAAI 2026
A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment
ACL 2025
Google Translate’s Research Submission to WMT2025
EMNLP 2025
Rethinking DPO-style Diffusion Aligning Frameworks
ICCV 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
ICCV 2025
PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning
EMNLP 2025
Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
EMNLP 2025
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
CVPR 2025
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
EMNLP 2025
Trial-Oriented Visual Rearrangement
ICCV 2025
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
ICCV 2025
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
EMNLP 2025
Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment
ICCV 2025
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data
EMNLP 2025
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
EMNLP 2025
Hierarchical Reward Modeling for Fault Localization in Large Code Repositories
EMNLP 2025
INREACT: An Inspire-Then-Reinforce Training Framework For Multimodal GUI Agent
EMNLP 2025
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
EMNLP 2025
<
1
2
3
4
5
…
118
>