Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Methods
Reinforcement Learning
›
Methods
›
Deep RL
3861 directly classified papers
Papers per year
2005: 1
2006: 9
2007: 14
2008: 15
2009: 9
2010: 21
2011: 27
2012: 32
2013: 21
2014: 17
2015: 10
2016: 33
2017: 102
2018: 222
2019: 399
2020: 450
2021: 533
2022: 478
2023: 532
2024: 513
2025: 326
2026: 97
Papers
Leveraging LLM-based sentiment analysis for portfolio optimization with proximal policy optimization
ACL 2025
Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries
ACL 2025
Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback
ACL 2025
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning
ACL 2025
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
ACL 2025
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
EMNLP 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
ACL 2025
Adversarial Preference Learning for Robust LLM Alignment
ACL 2025
Reward Generalization in RLHF: A Topological Perspective
ACL 2025
Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches
RSS 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
RSS 2025
Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments
RSS 2025
CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
EMNLP 2025
A low-cost and lightweight 6 DoF bimanual arm for dynamic and contact-rich manipulation
RSS 2025
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds
RSS 2025
HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
RSS 2025
From Foresight to Forethought: VLM-In-the-Loop Policy Steering via Latent Alignment
RSS 2025
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
RSS 2025
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
RSS 2025
Action Flow Matching for Lifelong Learning
RSS 2025
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
ACL 2025
Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations
RSS 2025
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
ACL 2025
RAVEN: Robust Advertisement Video Violation Temporal Grounding via Reinforcement Reasoning
ACL 2025
LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection
ACL 2025
<
1
…
6
7
8
…
155
>