Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
129 directly classified papers
Papers per year
2020: 1
2023: 13
2024: 60
2025: 55
Papers
Permutative Preference Alignment from Listwise Ranking of Human Judgments
EMNLP 2025
Robust Multi-Objective Preference Alignment with Online DPO
AAAI 2025
Logical Reasoning with Outcome Reward Models for Test-Time Scaling
EMNLP 2025
Improve LLM-as-a-Judge Ability as a General Ability
EMNLP 2025
Model Extrapolation Expedites Alignment
ACL 2025
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
ACL 2025
Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling
ACL 2025
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
ACL 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
ACL 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
ACL 2025
InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd Behavior
ACL 2025
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
ACL 2025
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
ACL 2025
Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
ACL 2025
Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning
ACL 2025
How to Mitigate Overfitting in Weak-to-strong Generalization?
ACL 2025
Mutual-Taught for Co-adapting Policy and Reward Models
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
ACL 2025
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
ACL 2025
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
ACL 2025
Generative Reward Modeling via Synthetic Criteria Preference Learning
ACL 2025
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
ACL 2025
Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and Enhancement
ACL 2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
EMNLP 2025
<
1
2
3
4
5
6
>