← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning from Human Feedback

129 directly classified papers

Papers per year

Papers

Fast Best-of-N Decoding via Speculative Rejection NIPS 2024

LeDex: Training LLMs to Better Self-Debug and Explain Code NIPS 2024

Interpreting Learned Feedback Patterns in Large Language Models NIPS 2024

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback NIPS 2024

Group Robust Preference Optimization in Reward-free RLHF NIPS 2024

LACIE: Listener-Aware Finetuning for Calibration in Large Language Models NIPS 2024

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization NIPS 2024

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision NIPS 2024

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels NIPS 2024

Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs NIPS 2024

Aligning to Thousands of Preferences via System Message Generalization NIPS 2024

Axioms for AI Alignment from Human Feedback NIPS 2024

One-Shot Safety Alignment for Large Language Models via Optimal Dualization NIPS 2024

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback NIPS 2024

On Softmax Direct Preference Optimization for Recommendation NIPS 2024

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback NIPS 2023

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF EMNLP 2023

HuatuoGPT, Towards Taming Language Model to Be a Doctor EMNLP 2023

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback EMNLP 2023

Improving Summarization with Human Edits EMNLP 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values EMNLP 2023

Tuna: Instruction Tuning using Feedback from Large Language Models EMNLP 2023

A Study on Annotation Interfaces for Summary Comparison ACL 2023

Discovering Language Model Behaviors with Model-Written Evaluations ACL 2023

Human-in-the-loop Abstractive Dialogue Summarization ACL 2023