Papers
16,557 papers found
Direct Multi-Turn Preference Optimization for Language Agents
Wentao Shi, Mengqi Yuan, Junkang Wu et al.
EPO: Hierarchical LLM Agents with Environment Preference Optimization
Qi Zhao, Haotian Fu, Chen Sun et al.
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Fei Wang, Wenxuan Zhou, James Y. Huang et al.
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou, Ravi Agrawal, Shujian Zhang et al.
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong, Noah Lee, James Thorne
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang, Arash Ahmadian, Kelly Marchisio et al.
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
Jaepill Choi, Kyubyung Chae, Jiwoo Song et al.
Filtered Direct Preference Optimization
Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai et al.
Knowledge Editing in Language Models via Adapted Direct Preference Optimization
Amit Rozner, Barak Battash, Lior Wolf et al.
Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and Expected Information Gain
Davide Mazzaccara, Alberto Testoni, Raffaella Bernardi
Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Jiazheng Li, Hainiu Xu, Zhaoyue Sun et al.
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization
Gihun Lee, Minchan Jeong, Yujin Kim et al.
Step-level Value Preference Optimization for Mathematical Reasoning
Guoxin Chen, Minpeng Liao, Chengxi Li et al.
Improving Factual Consistency of News Summarization by Contrastive Preference Optimization
Huawen Feng, Yan Fan, Xiong Liu et al.
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
Yuxi Xie, Guanzhen Li, Xiao Xu et al.
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
Kyuyoung Kim, Ah Jeong Seo, Hao Liu et al.
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Yong Lin, Skyler Seto, Maartje Ter Hoeve et al.
Direct Judgement Preference Optimization
PeiFeng Wang, Austin Xu, Yilun Zhou et al.
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing, Peiran Li, Yuping Wang et al.
Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering
Zetong Li, Qinliang Su, Minhua Huang et al.
Selective Preference Optimization via Token-Level Reward Function Estimation
Kailai Yang, Zhiwei Liu, Qianqian Xie et al.
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making
Kechen Jiao, Zhirui Fang, Jiahao Liu et al.
Structured Preference Optimization for Vision-Language Long-Horizon Task Planning
Xiwen Liang, Min Lin, Weiqi Ruan et al.
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang et al.
Weights-Rotated Preference Optimization for Large Language Models
Chenxu Yang, Ruipeng Jia, Mingyu Zheng et al.