Teaching Small Language Models Reasoning through Counterfactual Distillation

Tao Feng; Yicheng Li; Chenglin Li; Hao Chen; Fei Yu; Yin Zhang

2024 EMNLP EMNLP 2024

Teaching Small Language Models Reasoning through Counterfactual Distillation

Abstract

AbstractWith the rise of large language models (LLMs), many studies are interested in transferring the reasoning capabilities of LLMs to small language models (SLMs). Previous distillation methods usually utilize the capabilities of LLMs to generate chain-of-thought (CoT) samples and teach SLMs via fine-tuning. However, such a standard distillation approach performs poorly when applied to out-of-distribution (OOD) examples, and the diversity of the generated CoT samples is insufficient. In this work, we propose a novel counterfactual distillation framework. Firstly, we leverage LLMs to automatically generate high-quality counterfactual data. Given an input text example, our method generates a counterfactual example that is very similar to the original input, but its task label has been changed to the desired one. Then, we utilize multi-view CoT to enhance the diversity of reasoning samples. Experiments on four NLP benchmarks show that our approach enhances the reasoning capabilities of SLMs and is more robust to OOD data. We also conduct extensive ablations and sample studies to understand the reasoning capabilities of SLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Knowledge & Reasoning and Machine Learning and Natural Language Processing

📈 Trend Setter — Reasoning

🧭 Keyword Pioneer — counterfactual distillation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tao Feng , Yicheng Li , Chenglin Li , Hao Chen , Fei Yu , Yin Zhang

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Generation > Language Modeling Knowledge & Reasoning > Reasoning Artificial Intelligence > Core AI > Reasoning Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Learning Types > Knowledge Distillation

Keywords

knowledge distillation chain-of-thought reasoning out-of-distribution generalization counterfactual reasoning reasoning capability small language model counterfactual distillation reasoning capabilities

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024