The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Anton Razzhigaev; Matvey Mikhalchuk; Elizaveta Goncharova; Ivan Oseledets; Denis Dimitrov; Andrey Kuznetsov

2024 EACL EACL 2024

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Abstract

AbstractIn this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. This fact is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anton Razzhigaev , Matvey Mikhalchuk , Elizaveta Goncharova , Ivan Oseledets , Denis Dimitrov , Andrey Kuznetsov

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Transformers

Keywords

transformer architecture representation learning embedding space intrinsic dimension

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024