Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning
1263 directly classified papers
Papers per year
2006: 1
2007: 2
2008: 3
2009: 2
2010: 1
2011: 2
2012: 3
2013: 2
2014: 3
2015: 2
2016: 8
2017: 44
2018: 95
2019: 134
2020: 123
2021: 131
2022: 143
2023: 127
2024: 194
2025: 240
2026: 3
Papers
Agentic-R1: Distilled Dual-Strategy Reasoning
EMNLP 2025
Marco Large Translation Model at WMT2025: Transforming Translation Capability in LLMs via Quality-Aware Training and Decoding
EMNLP 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
EMNLP 2025
Alignment with Fill-In-the-Middle for Enhancing Code Generation
EMNLP 2025
Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units
EMNLP 2025
CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning
EMNLP 2025
Can GRPO Boost Complex Multimodal Table Understanding?
EMNLP 2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
EMNLP 2025
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
ICCV 2025
RadQA-DPO: A Radiology Question Answering System with Encoder-Decoder Models Enhanced by Direct Preference Optimization
ACL 2025
Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation
AAAI 2025
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
EMNLP 2025
World Models with Hints of Large Language Models for Goal Achieving
NAACL 2025
ReNeg: Learning Negative Embedding with Reward Guidance
CVPR 2025
When2Call: When (not) to Call Tools
NAACL 2025
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
CVPR 2025
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time
NAACL 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
A Practical Analysis of Human Alignment with *PO
NAACL 2025
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
CVPR 2025
Understanding Reference Policies in Direct Preference Optimization
NAACL 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
AACL 2025
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
AAAI 2025
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
ACL 2025
<
1
…
6
7
8
…
51
>