Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models
EMNLP 2025
Atomic Consistency Preference Optimization for Long-Form Question Answering
IJCNLP 2025
Imitation Learning Backoff: Reinforcement Learning-based Channel Access for Guaranteeing Fairness (Student Abstract)
AAAI 2025
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
IJCNLP 2025
Enhancing Predictive Healthcare Using AI-Driven Early Warning Systems
AAAI 2025
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling
IJCNLP 2025
Mutual-Taught for Co-adapting Policy and Reward Models
ACL 2025
Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
EMNLP 2025
Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics
EMNLP 2025
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
EMNLP 2025
Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety
EMNLP 2025
Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards
EMNLP 2025
Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization
EMNLP 2025
AgentRM: Enhancing Agent Generalization with Reward Modeling
ACL 2025
Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems
EMNLP 2025
sDPO: Don’t Use Your Data All at Once
COLING 2025
DSG-MCTS: A Dynamic Strategy-Guided Monte Carlo Tree Search for Diversified Reasoning in Large Language Models
EMNLP 2025
BackMATH: Towards Backward Reasoning for Solving Math Problems Step by Step
COLING 2025
ACING: Actor-Critic for Instruction Learning in Black-Box LLMs
EMNLP 2025
Sources of Disagreement in Data for LLM Instruction Tuning
COLING 2025
LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models
EMNLP 2025
Investigating the effectiveness of length based rewards in DPO for building Conversational Financial Question Answering Systems
COLING 2025
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
ACL 2025
FinNLP-FNP-LLMFinLegal @ COLING 2025 Shared Task: Agent-Based Single Cryptocurrency Trading Challenge
COLING 2025
A Practical Analysis of Human Alignment with *PO
NAACL 2025
<
1
…
9
10
11
…
118
>