← Learning Types

Deep Learning › Learning Types ›

Reinforcement Learning

1263 directly classified papers

Papers per year

Papers

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition AAAI 2025

Deep Reinforcement Learning with Time-Scale Invariant Memory AAAI 2025

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration AAAI 2025

More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives ACL 2025

LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection ACL 2025

ASTRO: Automatic Strategy Optimization For Non-Cooperative Dialogues ACL 2025

Enhancing Predictive Healthcare Using AI-Driven Early Warning Systems AAAI 2025

A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy EMNLP 2025

OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework EMNLP 2025

When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning EMNLP 2025

RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters EMNLP 2025

FedAA: A Reinforcement Learning Perspective on Adaptive Aggregation for Fair and Robust Federated Learning AAAI 2025

COPR: Continual Human Preference Learning via Optimal Policy Regularization ACL 2025

Robust Preference Optimization via Dynamic Target Margins ACL 2025

Token-Level Accept or Reject: A Micro Alignment Approach for Large Language Models IJCAI 2025

RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications ICCV 2025

Sample Efficient Alignment Learning With Episodic Control EMNLP 2025

Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling EMNLP 2025

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning EMNLP 2025

Dense Policy: Bidirectional Autoregressive Learning of Actions ICCV 2025

MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization ICCV 2025

Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences ICCV 2025

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers ICCV 2025

Playpen: An Environment for Exploring Learning From Dialogue Game Feedback EMNLP 2025