← Learning Types

Deep Learning › Learning Types ›

Reinforcement Learning from Human Feedback

90 directly classified papers

Papers per year

Papers

DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling ACL 2024

Hybrid Alignment Training for Large Language Models ACL 2024

Rich Human Feedback for Text-to-Image Generation CVPR 2024

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback CVPR 2024

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning EMNLP 2024

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation EMNLP 2024

Word Alignment as Preference for Machine Translation EMNLP 2024

ORPO: Monolithic Preference Optimization without Reference Model EMNLP 2024

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers EMNLP 2024

A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick EMNLP 2024

DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging EMNLP 2024

Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation EMNLP 2024

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets EMNLP 2024

Self-Training Large Language and Vision Assistant for Medical Question Answering EMNLP 2024

SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization EMNLP 2024

Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing EMNLP 2024

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion EMNLP 2024

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data EMNLP 2024

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline EMNLP 2024

Enhancing Alignment using Curriculum Learning & Ranked Preferences EMNLP 2024

V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization EMNLP 2024

Preference Tuning For Toxicity Mitigation Generalizes Across Languages EMNLP 2024

Aligners: Decoupling LLMs and Alignment EMNLP 2024

Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness EMNLP 2024

Geometric-Averaged Preference Optimization for Soft Preference Labels NIPS 2024