Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Representation-driven Option Discovery in Reinforcement Learning
AAAI 2025
MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems
AAAI 2025
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
ICCV 2025
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
EMNLP 2025
Debiasing Online Preference Learning via Preference Feature Preservation
ACL 2025
Improving Reward Models with Synthetic Critiques
NAACL 2025
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
ICCV 2025
Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
IJCNLP 2025
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
IJCNLP 2025
Structured Document Translation via Format Reinforcement Learning
IJCNLP 2025
On the Convergence of Moral Self-Correction in Large Language Models
IJCNLP 2025
KERLQA: Knowledge-Enhanced Reinforcement Learning for Question Answering in Low-resource Languages
IJCNLP 2025
Atomic Consistency Preference Optimization for Long-Form Question Answering
IJCNLP 2025
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
IJCNLP 2025
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling
IJCNLP 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
ICCV 2025
Mitigating Object Hallucinations via Sentence-Level Early Intervention
ICCV 2025
Towards Robust, Efficient, and Practical Decision-Making: From Reward-Maximizing Deep Reinforcement Learning to Reward-Matching GFlowNets
AAAI 2025
sDPO: Don’t Use Your Data All at Once
COLING 2025
BackMATH: Towards Backward Reasoning for Solving Math Problems Step by Step
COLING 2025
Sources of Disagreement in Data for LLM Instruction Tuning
COLING 2025
Investigating the effectiveness of length based rewards in DPO for building Conversational Financial Question Answering Systems
COLING 2025
FinNLP-FNP-LLMFinLegal @ COLING 2025 Shared Task: Agent-Based Single Cryptocurrency Trading Challenge
COLING 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
<
1
…
4
5
6
…
118
>