Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning from Human Feedback
129 directly classified papers
Papers per year
2020: 1
2023: 13
2024: 60
2025: 55
Papers
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
ACL 2025
CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling
ACL 2025
Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks
ACL 2025
LookAlike: Consistent Distractor Generation in Math MCQs
ACL 2025
BLCU-ICALL at BEA 2025 Shared Task: Multi-Strategy Evaluation of AI Tutors
ACL 2025
FactAlign: Long-form Factuality Alignment of Large Language Models
EMNLP 2024
On the Relationship between Truth and Political Bias in Language Models
EMNLP 2024
Improving Context-Aware Preference Modeling for Language Models
NIPS 2024
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
EMNLP 2024
Unintended Impacts of LLM Alignment on Global Representation
ACL 2024
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
ACL 2024
LIRE: listwise reward enhancement for preference alignment
ACL 2024
Disentangling Length from Quality in Direct Preference Optimization
ACL 2024
Teaching Language Models to Self-Improve by Learning from Language Feedback
ACL 2024
ALaRM: Align Language Models via Hierarchical Rewards Modeling
ACL 2024
Exploring Domain Robust Lightweight Reward Models based on Router Mechanism
ACL 2024
Evaluating Large Language Model Biases in Persona-Steered Generation
ACL 2024
Direct Preference Optimization with an Offset
ACL 2024
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
ACL 2024
Reasons to Reject? Aligning Language Models with Judgments
ACL 2024
Diffusion Model Alignment Using Direct Preference Optimization
CVPR 2024
The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models
EMNLP 2024
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
EMNLP 2024
WPO: Enhancing RLHF with Weighted Preference Optimization
EMNLP 2024
LIONs: An Empirically Optimized Approach to Align Language Models
EMNLP 2024
<
1
2
3
4
5
6
>