Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
EMNLP 2025
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
EMNLP 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
EMNLP 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
A Practical Analysis of Human Alignment with *PO
NAACL 2025
InstructionCP: A Simple yet Effective Approach for Transferring Large Language Models to Target Languages
ACL 2025
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards
EMNLP 2025
OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning
IJCNLP 2025
Rejected Dialects: Biases Against African American Language in Reward Models
NAACL 2025
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
NAACL 2025
LLMSR@XLLM25: A Language Model-Based Pipeline for Structured Reasoning Data Construction
ACL 2025
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
NAACL 2025
Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
NAACL 2025
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning
EMNLP 2025
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
EMNLP 2025
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
NAACL 2025
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
NAACL 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
EMNLP 2025
Adapting LLM Agents with Universal Communication Feedback
NAACL 2025
Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversation
NAACL 2025
Dialogue Systems for Emotional Support via Value Reinforcement
ACL 2025
Reinforced Query Reasoners for Reasoning-intensive Retrieval Tasks
EMNLP 2025
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
CVPR 2025
NHK Submission to WAT 2025: Leveraging Preference Optimization for Article-level Japanese–English News Translation
IJCNLP 2025
Improving Reward Models with Synthetic Critiques
NAACL 2025
<
1
…
10
11
12
…
118
>