Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Letian Peng; Yuwei Zhang; Zilong Wang; Jayanth Srinivasa; Gaowen Liu; Zihan Wang; Jingbo Shang

2024 ACL ACL 2024

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Abstract

AbstractThis work aims to build a text embedder that can capture characteristics of texts specified by user instructions clarifying the similarity criterion. While previous methods improve general task awareness by injecting the instruction information into encoding, they fail to be sensitive to clearer criteria like “evaluate similarity based on emotion”. We instead propose a different viewpoint, which treats the instruction as a “question” about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar representations. Specifically, we propose InBedder that instantiates this learning-to-answer idea by only fine-tuning language models via abstractive question answering tasks. Despite its simplicity, InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to language models with large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying diverse instructions to the same unlabeled corpus, demonstrates a high degree of interpretability in the clusters formed.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — encoder-based language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Letian Peng , Yuwei Zhang , Zilong Wang , Jayanth Srinivasa , Gaowen Liu , Zihan Wang , Jingbo Shang

Topics

Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Generation > Language Modeling Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Learning Types > Transfer Learning

Keywords

representation learning question answering instruction following semantic similarity text embedding encoder-based language model

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024