2025 EMNLP EMNLP 2025

Semantic Geometry of Sentence Embeddings

Abstract

AbstractSentence embeddings are central to modern natural language processing, powering tasks such as clustering, semantic search, and retrieval-augmented generation. Yet, they remain largely opaque: their internal features are not directly interpretable, and users lack fine-grained control for downstream tasks. To address this issue, we introduce a formal framework to characterize the organization of features in sentence embeddings through information-theoretic means. Building on this foundation, we develop a method to identify interpretable feature directions and show how they can be composed to capture richer semantic structures. Experiments on both synthetic and real-world datasets confirm the presence of this semantic geometry and highlight the utility of our approach for enhancing interpretability and fine-grained control in sentence embeddings.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio