Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
90 directly classified papers
Papers per year
2020: 1
2022: 1
2023: 2
2024: 40
2025: 46
Papers
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
ACL 2024
Hybrid Alignment Training for Large Language Models
ACL 2024
Rich Human Feedback for Text-to-Image Generation
CVPR 2024
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
CVPR 2024
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
EMNLP 2024
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
EMNLP 2024
Word Alignment as Preference for Machine Translation
EMNLP 2024
ORPO: Monolithic Preference Optimization without Reference Model
EMNLP 2024
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers
EMNLP 2024
A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
EMNLP 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
EMNLP 2024
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
EMNLP 2024
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets
EMNLP 2024
Self-Training Large Language and Vision Assistant for Medical Question Answering
EMNLP 2024
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
EMNLP 2024
Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing
EMNLP 2024
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
EMNLP 2024
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
EMNLP 2024
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
EMNLP 2024
Enhancing Alignment using Curriculum Learning & Ranked Preferences
EMNLP 2024
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
EMNLP 2024
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
EMNLP 2024
Aligners: Decoupling LLMs and Alignment
EMNLP 2024
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
EMNLP 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
NIPS 2024
<
1
2
3
4
>