← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning from Human Feedback

129 directly classified papers

Papers per year

Papers

Permutative Preference Alignment from Listwise Ranking of Human Judgments EMNLP 2025

Robust Multi-Objective Preference Alignment with Online DPO AAAI 2025

Logical Reasoning with Outcome Reward Models for Test-Time Scaling EMNLP 2025

Improve LLM-as-a-Judge Ability as a General Ability EMNLP 2025

Model Extrapolation Expedites Alignment ACL 2025

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective ACL 2025

Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling ACL 2025

UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models ACL 2025

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback ACL 2025

Aligning Large Language Models with Implicit Preferences from User-Generated Content ACL 2025

InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd Behavior ACL 2025

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization ACL 2025

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization ACL 2025

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch ACL 2025

Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning ACL 2025

How to Mitigate Overfitting in Weak-to-strong Generalization? ACL 2025

Mutual-Taught for Co-adapting Policy and Reward Models ACL 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference ACL 2025

From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment ACL 2025

Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up ACL 2025

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization ACL 2025

Generative Reward Modeling via Synthetic Criteria Preference Learning ACL 2025

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment ACL 2025

Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and Enhancement ACL 2025

OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework EMNLP 2025