← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy AAAI 2026

Pseudo-Likelihood Training for Reasoning Diffusion Language Models EACL 2026

Rectify Evaluation Preference: Improving LLMs’ Critique on Math Reasoning via Perplexity-aware Reinforcement Learning AAAI 2026

RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow AAAI 2026

SHARE: Synthesizing Heterogeneous Autism-support Records into Evidence-based Recommendations AAAI 2026

A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment ACL 2025

Google Translate’s Research Submission to WMT2025 EMNLP 2025

Rethinking DPO-style Diffusion Aligning Frameworks ICCV 2025

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training ICCV 2025

PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning EMNLP 2025

Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment EMNLP 2025

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill CVPR 2025

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty EMNLP 2025

Trial-Oriented Visual Rearrangement ICCV 2025

Reinforcement Learning-Guided Data Selection via Redundancy Assessment ICCV 2025

EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device ICCV 2025

Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences ICCV 2025

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation EMNLP 2025

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment ICCV 2025

DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data EMNLP 2025

ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models EMNLP 2025

Hierarchical Reward Modeling for Fault Localization in Large Code Repositories EMNLP 2025

INREACT: An Inspire-Then-Reinforce Training Framework For Multimodal GUI Agent EMNLP 2025

RAISE: Reinforced Adaptive Instruction Selection For Large Language Models EMNLP 2025