← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning from Human Feedback

129 directly classified papers

Papers per year

Papers

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization ACL 2025

Mutual-Taught for Co-adapting Policy and Reward Models ACL 2025

UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models ACL 2025

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization ACL 2025

Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning ACL 2025

How to Mitigate Overfitting in Weak-to-strong Generalization? ACL 2025

Model Extrapolation Expedites Alignment ACL 2025

Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling ACL 2025

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback ACL 2025

InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd Behavior ACL 2025

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch ACL 2025

LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets AAAI 2025

Robust Multi-Objective Preference Alignment with Online DPO AAAI 2025

Improve LLM-as-a-Judge Ability as a General Ability EMNLP 2025

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences AAAI 2025

CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences IJCNLP 2025

DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback AAAI 2025

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective ACL 2025

Alleviating Shifted Distribution in Human Preference Alignment through Meta-Learning AAAI 2025

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling ACL 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback AAAI 2025

Aligning Large Language Models with Implicit Preferences from User-Generated Content ACL 2025

Aligning Language Models Using Follow-up Likelihood as Reward Signal AAAI 2025

Debiasing Online Preference Learning via Preference Feature Preservation ACL 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference ACL 2025