Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
EMNLP 2025
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels
ACL 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
EMNLP 2025
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
EMNLP 2025
Enhancing Predictive Healthcare Using AI-Driven Early Warning Systems
AAAI 2025
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
EMNLP 2025
LLMSR@XLLM25: A Language Model-Based Pipeline for Structured Reasoning Data Construction
ACL 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
EMNLP 2025
Mutual-Taught for Co-adapting Policy and Reward Models
ACL 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning
ACL 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
ACL 2025
Procedural Environment Generation for Tool-Use Agents
EMNLP 2025
ACING: Actor-Critic for Instruction Learning in Black-Box LLMs
EMNLP 2025
LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models
EMNLP 2025
Sparse Rewards Can Self-Train Dialogue Agents
ACL 2025
Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
EMNLP 2025
Beyond Online Sampling: Bridging Offline-to-Online Alignment via Dynamic Data Transformation for LLMs
EMNLP 2025
Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
EMNLP 2025
CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
ACL 2025
Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL
EMNLP 2025
In-Context Policy Adaptation via Cross-Domain Skill Diffusion
AAAI 2025
Prior Prompt Engineering for Reinforcement Fine-Tuning
EMNLP 2025
AgentRM: Enhancing Agent Generalization with Reward Modeling
ACL 2025
MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems
AAAI 2025
<
1
…
7
8
9
…
118
>