← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning from Human Feedback

129 directly classified papers

Papers per year

Papers

BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment EMNLP 2024

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs EMNLP 2024

Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation EMNLP 2024

Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning EMNLP 2024

Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment EMNLP 2024

Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents EMNLP 2024

Don’t Forget Your Reward Values: Language Model Alignment via Value-based Calibration EMNLP 2024

Towards Aligning Language Models with Textual Feedback EMNLP 2024

Rethinking the Role of Proxy Rewards in Language Model Alignment EMNLP 2024

Preference-Guided Reflective Sampling for Aligning Language Models EMNLP 2024

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models EMNLP 2024

Filtered Direct Preference Optimization EMNLP 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality EMNLP 2024

Evolutionary Contrastive Distillation for Language Model Alignment EMNLP 2024

Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning EMNLP 2024

How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment EMNLP 2024

PURE: Aligning LLM via Pluggable Query Reformulation for Enhanced Helpfulness EMNLP 2024

TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models EMNLP 2024

On Diversified Preferences of Large Language Model Alignment EMNLP 2024

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts EMNLP 2024

Self-training Language Models for Arithmetic Reasoning EMNLP 2024

Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback EMNLP 2024

Pedagogical Alignment of Large Language Models EMNLP 2024

Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models EMNLP 2024

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization EMNLP 2024