← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Representation-driven Option Discovery in Reinforcement Learning AAAI 2025

MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems AAAI 2025

Reinforcement Learning-Guided Data Selection via Redundancy Assessment ICCV 2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning EMNLP 2025

Debiasing Online Preference Learning via Preference Feature Preservation ACL 2025

Improving Reward Models with Synthetic Critiques NAACL 2025

ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning ICCV 2025

Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction IJCNLP 2025

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs IJCNLP 2025

Structured Document Translation via Format Reinforcement Learning IJCNLP 2025

On the Convergence of Moral Self-Correction in Large Language Models IJCNLP 2025

KERLQA: Knowledge-Enhanced Reinforcement Learning for Question Answering in Low-resource Languages IJCNLP 2025

Atomic Consistency Preference Optimization for Long-Form Question Answering IJCNLP 2025

Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization IJCNLP 2025

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling IJCNLP 2025

A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy EMNLP 2025

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness ICCV 2025

Mitigating Object Hallucinations via Sentence-Level Early Intervention ICCV 2025

Towards Robust, Efficient, and Practical Decision-Making: From Reward-Maximizing Deep Reinforcement Learning to Reward-Matching GFlowNets AAAI 2025

sDPO: Don’t Use Your Data All at Once COLING 2025

BackMATH: Towards Backward Reasoning for Solving Math Problems Step by Step COLING 2025

Sources of Disagreement in Data for LLM Instruction Tuning COLING 2025

Investigating the effectiveness of length based rewards in DPO for building Conversational Financial Question Answering Systems COLING 2025

FinNLP-FNP-LLMFinLegal @ COLING 2025 Shared Task: Agent-Based Single Cryptocurrency Trading Challenge COLING 2025

Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025