Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders

Keita Fukushima; Tomoyuki Kajiwara; Takashi Ninomiya

2025 EMNLP EMNLP 2025

Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders

Abstract

AbstractWe propose an unsupervised method to disentangle sentence embeddings from multilingual sentence encoders into language-specific and language-agnostic representations. Such language-agnostic representations distilled by our method can estimate cross-lingual semantic sentence similarity by cosine similarity. Previous studies have trained individual extractors to distill each language-specific and -agnostic representation. This approach suffers from missing information resulting in the original sentence embedding not being fully reconstructed from both language-specific and -agnostic representations; this leads to performance degradation in estimating cross-lingual sentence similarity. We only train the extractor for language-agnostic representations and treat language-specific representations as differences from the original sentence embedding; in this way, there is no missing information. Experimental results for both tasks, quality estimation of machine translation and cross-lingual sentence similarity estimation, show that our proposed method outperforms existing unsupervised methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — sentence embedding disentanglement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Keita Fukushima , Tomoyuki Kajiwara , Takashi Ninomiya

Topics

Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Types > Transfer Learning Natural Language Processing > Generation > Machine Translation Deep Learning > Learning Types > Representation Learning

Keywords

unsupervised learning representation learning cross-lingual transfer semantic similarity sentence embedding multilingual embedding unsupervised disentanglement language-agnostic representation multilingual sentence encoder machine translation quality estimation cross-lingual semantic similarity sentence embedding disentanglement

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025