Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
129 directly classified papers
Papers per year
2020: 1
2023: 13
2024: 60
2025: 55
Papers
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
ACL 2025
Mutual-Taught for Co-adapting Policy and Reward Models
ACL 2025
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
ACL 2025
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
ACL 2025
Learning Together to Perform Better: Teaching Small-Scale LLMs to Collaborate via Preferential Rationale Tuning
ACL 2025
How to Mitigate Overfitting in Weak-to-strong Generalization?
ACL 2025
Model Extrapolation Expedites Alignment
ACL 2025
Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling
ACL 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
ACL 2025
InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd Behavior
ACL 2025
Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
ACL 2025
LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
AAAI 2025
Robust Multi-Objective Preference Alignment with Online DPO
AAAI 2025
Improve LLM-as-a-Judge Ability as a General Ability
EMNLP 2025
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
AAAI 2025
CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences
IJCNLP 2025
DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback
AAAI 2025
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
ACL 2025
Alleviating Shifted Distribution in Human Preference Alignment through Meta-Learning
AAAI 2025
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
ACL 2025
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
AAAI 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
ACL 2025
Aligning Language Models Using Follow-up Likelihood as Reward Signal
AAAI 2025
Debiasing Online Preference Learning via Preference Feature Preservation
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
<
1
2
3
4
5
6
>