2025 EMNLP EMNLP 2025

ThinkAnswer Loss: Balancing Semantic Similarity and Exact Matching for LLM Reasoning Enhancement

Abstract

AbstractKnowledge distillation for large language models often uses Chain-of-Thought (CoT) and answer pairs, but existing methods struggle with appropriate supervision signals. Uniform constraints (e.g., cross-entropy) on CoT can enforce literal, verbose reasoning and suppress expressive diversity, while solely semantic constraints on answers can reduce accuracy in classification tasks. This paper proposes ThinkAnswer Loss, an information-theoretic differential supervision framework that decouples CoT and answer supervision. ThinkAnswer Loss applies semantic similarity constraints to the CoT portion while maintaining strict literal matching for the answer. We theoretically demonstrate its connection to mutual information maximization and derive a tight upper bound on generalization error. Experimental validation on text quality assessment and mathematical reasoning tasks shows that our method maintains answer accuracy while effectively reducing CoT length and preserving semantic content, thereby accelerating inference.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio