Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
90 directly classified papers
Papers per year
2020: 1
2022: 1
2023: 2
2024: 40
2025: 46
Papers
Expectation Preference Optimization: Reliable Preference Estimation for Improving the Reasoning Capability of Large Language Models
EMNLP 2025
Pluralistic Alignment for Healthcare: A Role-Driven Framework
EMNLP 2025
RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters
EMNLP 2025
Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation
ACL 2025
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
ACL 2025
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
ACL 2025
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
ACL 2025
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
ACL 2025
CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling
ACL 2025
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
ACL 2025
ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning
ACL 2025
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
ACL 2025
RadQA-DPO: A Radiology Question Answering System with Encoder-Decoder Models Enhanced by Direct Preference Optimization
ACL 2025
The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation
ACL 2025
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?
ACL 2025
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
AAAI 2025
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
AAAI 2025
Continual SFT Matches Multimodal RLHF with Negative Supervision
CVPR 2025
SEAL: Systematic Error Analysis for Value ALignment
AAAI 2025
Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models
EMNLP 2025
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
AAAI 2025
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
NIPS 2024
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models
ACL 2024
BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization
ACL 2024
A Grounded Preference Model for LLM Alignment
ACL 2024
<
1
2
3
4
>