Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Reflect, Rewrite, Repeat: How Simple Arithmetic Enables Advanced Reasoning in Small Language Models
EACL 2026
FAST-EQA: Efficient Embodied Question Answering with Global and Local Region Relevancy
WACV 2026
Think Just Enough: Leveraging Self-Assessed Confidence for Adaptive Reasoning in Language Models
EACL 2026
QueryGym: Step-by-Step Interaction with Relational Databases
AAAI 2026
Social Influence-Based Mutual Acknowledgement Token Exchange (Student Abstract)
AAAI 2026
USPR: Learning a Unified Solver for Profiled Routing
AAAI 2026
MoralReason: Generalizable Moral Decision Alignment for LLM Agents Using Reasoning-Level Reinforcement Learning
AAAI 2026
ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool learning
AAAI 2026
LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction
AAAI 2026
AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing
AAAI 2026
Re-SpS: A Reinforcement Learning Approach to Speculative Sampling
AAAI 2026
Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning
AAAI 2026
DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering
AAAI 2026
Reinforcement Learning Enhanced Muti-hop Reasoning for Temporal Knowledge Question Answering
AAAI 2026
Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach
AAAI 2026
MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning
AAAI 2026
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
AAAI 2026
TaREx: Reinforcement Learning for Code-Driven Table Reasoning
AAAI 2026
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
AAAI 2026
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling
AAAI 2026
RESTL: Reinforcement Learning Guided by Multi-Aspect Rewards for Signal Temporal Logic Transformation
AAAI 2026
Towards Better Correctness and Efficiency in Code Generation
AAAI 2026
MMhops-R1: Multimodal Multi-hop Reasoning
AAAI 2026
VCGD: Visual Clue Guided Decoding with Caption Model for Mitigating Hallucination in Multimodal Large Language Models
AAAI 2026
Aligning Cross-View Visual Geometries in LVLMs Through Human-Like Reasoning Learning
AAAI 2026
<
1
2
3
4
5
…
118
>