Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
ACL 2025
InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
ACL 2025
Structured Document Translation via Format Reinforcement Learning
IJCNLP 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
ICCV 2025
HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks
ACL 2025
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
ICCV 2025
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
EMNLP 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
ACL 2025
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning
ACL 2025
CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
ACL 2025
CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling
ACL 2025
CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
ACL 2025
Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
IJCNLP 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
ACL 2025
Enhancing Machine Translation with Self-Supervised Preference Data
ACL 2025
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
IJCNLP 2025
Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment
ICCV 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
ACL 2025
KERLQA: Knowledge-Enhanced Reinforcement Learning for Question Answering in Low-resource Languages
IJCNLP 2025
On the Convergence of Moral Self-Correction in Large Language Models
IJCNLP 2025
Atomic Consistency Preference Optimization for Long-Form Question Answering
IJCNLP 2025
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
NAACL 2025
Understanding Reference Policies in Direct Preference Optimization
NAACL 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
<
1
2
3
4
5
…
118
>