Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
DeMAC: Enhancing Multi-Agent Coordination with Dynamic DAG and Manager-Player Feedback
EMNLP 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
ACL 2025
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
EMNLP 2025
CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation
ACL 2025
Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points
ACL 2025
Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision
EMNLP 2025
MiniELM: A Lightweight and Adaptive Query Rewriting Framework for E-Commerce Search Optimization
ACL 2025
Breaking the Reasoning Barrier A Survey on LLM Complex Reasoning through the Lens of Self-Evolution
ACL 2025
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
ACL 2025
On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation
ACL 2025
To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
ACL 2025
Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively
ACL 2025
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
ACL 2025
Proactive Guidance of Multi-Turn Conversation in Industrial Search
ACL 2025
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL
ACL 2025
BLCU-ICALL at BEA 2025 Shared Task: Multi-Strategy Evaluation of AI Tutors
ACL 2025
Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation
ACL 2025
Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment
EMNLP 2025
Steering LLM Reasoning Through Bias-Only Adaptation
EMNLP 2025
RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution
EMNLP 2025
Identification of Multiple Logical Interpretations in Counter-Arguments
EMNLP 2025
CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
ACL 2025
T-REG: Preference Optimization with Token-Level Reward Regularization
ACL 2025
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
ACL 2025
Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective
ACL 2025
<
1
…
6
7
8
…
118
>