Papers
16,557 papers found
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Siyin Wang, Zhaoye Fei, Qinyuan Cheng et al.
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
Xinghua Zhang, Haiyang Yu, Cheng Fu et al.
Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation
Deokhyung Kang, Jeonghun Cho, Yejin Jeon et al.
Uncertainty-Aware Iterative Preference Optimization for Enhanced LLM Reasoning
Lei Li, Hehuan Liu, Yaxin Zhou et al.
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
Mohammad Saqib Hasan, Saikat Chakraborty, Santu Karmaker et al.
LPOI: Listwise Preference Optimization for Vision Language Models
Fatemeh Pesaran Zadeh, Yoojin Oh, Gunhee Kim
T-REG: Preference Optimization with Token-Level Reward Regularization
Wenxuan Zhou, Shujian Zhang, Lingxiao Zhao et al.
CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Guofeng Cui, Pichao Wang, Yang Liu et al.
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal, Ashima Suvarna, Gantavya Bhatt et al.
K-order Ranking Preference Optimization for Large Language Models
Shihao Cai, Chongming Gao, Yang Zhang et al.
ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning
Yeyuan Wang, Dehong Gao, Rujiao Long et al.
Robust Preference Optimization via Dynamic Target Margins
Jie Sun, Junkang Wu, Jiancan Wu et al.
Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent
Xueyang Feng, Jingsen Zhang, Jiakai Tang et al.
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
Yunqiao Yang, Houxing Ren, Zimu Lu et al.
AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models
Qi Liu, Jingqing Ruan, Hao Li et al.
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Xin-Cheng Wen, Yijun Yang, Cuiyun Gao et al.
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
Xiaofeng Zhou, Heyan Huang, Lizi Liao
Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points
Kechi Zhang, Ge Li, Jia Li et al.
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu, Ji Liu, Lulu Wang et al.
RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization
Tianci Liu, Haoxiang Jiang, Tianze Wang et al.
Eeyore: Realistic Depression Simulation via Expert-in-the-Loop Supervised and Preference Optimization
Siyang Liu, Bianca Brie, Wenda Li et al.
PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
Zouying Cao, Runze Wang, Yifei Yang et al.
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
Yuhan Fu, Ruobing Xie, Xingwu Sun et al.
Reverse Preference Optimization for Complex Instruction Following
Xiang Huang, Ting-En Lin, Feiteng Fang et al.
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization
Amitava Das, Suranjana Trivedy, Danush Khanna et al.