Papers
16,557 papers found
Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models
Hao Xiang, Bowen Yu, Hongyu Lin et al.
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu et al.
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.
UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization
Zhanhong Fang, Debing Wang, Jinbiao Chen et al.
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
Xingzhou Lou, Junge Zhang, Jian Xie et al.
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
Yao Xiao, Hai Ye, Linyao Chen et al.
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li, Haojing Huang, Yujia Zhang et al.
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
Jian Li, Shenglin Yin, Yujia Zhang et al.
No Preference Left Behind: Group Distributional Preference Optimization
Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou, Jie Liu, Jing Shao et al.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell et al.
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang, Chao Du, Tianyu Pang et al.
On Softmax Direct Preference Optimization for Recommendation
Yuxin Chen, Junfei Tan, An Zhang et al.
Group Robust Preference Optimization in Reward-free RLHF
Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas et al.
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao, Tianrong Zhang, Bochuan Cao et al.
Discovering Preference Optimization Algorithms with and for Large Language Models
Chris Lu, Samuel Holt, Claudio Fanconi et al.
3D Structure Prediction of Atomic Systems with Flow-based Direct Preference Optimization
Rui Jiao, Xiangzhe Kong, Wenbing Huang et al.
Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
Teng Xiao, Yige Yuan, Huaisheng Zhu et al.
Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho et al.
Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization
Xiangxin Zhou, Dongyu Xue, Ruizhe Chen et al.
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng, Mengzhou Xia, Danqi Chen
$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.
Controllable Protein Sequence Generation with LLM Preference Optimization
Xiangyu Liu, Yi Liu, Silei Chen et al.
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
Jingkun An, Yinghao Zhu, Zongjian Li et al.