Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
90 directly classified papers
Papers per year
2020: 1
2022: 1
2023: 2
2024: 40
2025: 46
Papers
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
ACL 2025
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
ACL 2025
CareBot: A Pioneering Full-Process Open-Source Medical Language Model
AAAI 2025
From Lists to Emojis: How Format Bias Affects Model Alignment
ACL 2025
T-REG: Preference Optimization with Token-Level Reward Regularization
ACL 2025
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
ACL 2025
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
AAAI 2025
SEAL: Systematic Error Analysis for Value ALignment
AAAI 2025
Debate Helps Weak-to-Strong Generalization
AAAI 2025
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
AAAI 2025
Data with High and Consistent Preference Difference Are Better for Reward Model
AAAI 2025
Self-Evolutionary Large Language Models Through Uncertainty-Enhanced Preference Optimization
AAAI 2025
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
AAAI 2025
Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation
ACL 2025
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
ACL 2025
Binary Classifier Optimization for Large Language Model Alignment
ACL 2025
Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
ACL 2025
FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation
ACL 2025
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
ACL 2025
SDPO: Segment-Level Direct Preference Optimization for Social Agents
ACL 2025
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels
ACL 2025
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
ACL 2025
DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language Models
ACL 2025
IPO: Your Language Model is Secretly a Preference Classifier
ACL 2025
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
ACL 2025
<
1
2
3
4
>