How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

Teng Xiao; Mingxiao Li; Yige Yuan; Huaisheng Zhu; Chao Cui; Vasant G Honavar

2024 EMNLP EMNLP 2024

How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

Abstract

AbstractThis paper introduces a novel generalized self-imitation learning GSIL framework, which effectively and efficiently aligns large language models with offline demonstration data. We develop GSIL by deriving a surrogate objective of imitation learning with density ratio estimates, facilitating the use of self-generated data and optimizing the imitation learning objective with simple classification losses. GSIL eliminates the need for complex adversarial training in standard imitation learning, achieving lightweight and efficient fine-tuning for large language models. In addition, GSIL encompasses a family of offline losses parameterized by a general class of convex functions for density ratio estimation and enables a unified view for alignment with demonstration data. Extensive experiments show that GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding (HuamnEval), mathematical reasoning (GSM8K) and instruction-following benchmark (MT-Bench). Code is public available at https://github.com/tengxiao1/GSIL.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — demonstration datum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Teng Xiao , Mingxiao Li , Yige Yuan , Huaisheng Zhu , Chao Cui , Vasant G Honavar

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Imitation Learning Deep Learning > Learning Types > Fine-Tuning

Keywords

imitation learning instruction following density ratio estimation self-imitation learning large language model demonstration datum offline fine-tuning

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024