Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
129 directly classified papers
Papers per year
2020: 1
2023: 13
2024: 60
2025: 55
Papers
Fast Best-of-N Decoding via Speculative Rejection
NIPS 2024
LeDex: Training LLMs to Better Self-Debug and Explain Code
NIPS 2024
Interpreting Learned Feedback Patterns in Large Language Models
NIPS 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
NIPS 2024
Group Robust Preference Optimization in Reward-free RLHF
NIPS 2024
LACIE: Listener-Aware Finetuning for Calibration in Large Language Models
NIPS 2024
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
NIPS 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
NIPS 2024
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
NIPS 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
NIPS 2024
Aligning to Thousands of Preferences via System Message Generalization
NIPS 2024
Axioms for AI Alignment from Human Feedback
NIPS 2024
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
NIPS 2024
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
NIPS 2024
On Softmax Direct Preference Optimization for Recommendation
NIPS 2024
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
NIPS 2023
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
EMNLP 2023
HuatuoGPT, Towards Taming Language Model to Be a Doctor
EMNLP 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
EMNLP 2023
Improving Summarization with Human Edits
EMNLP 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
EMNLP 2023
Tuna: Instruction Tuning using Feedback from Large Language Models
EMNLP 2023
A Study on Annotation Interfaces for Summary Comparison
ACL 2023
Discovering Language Model Behaviors with Model-Written Evaluations
ACL 2023
Human-in-the-loop Abstractive Dialogue Summarization
ACL 2023
<
1
2
3
4
5
6
>