2024
EMNLP
EMNLP 2024
TEMA: Token Embeddings Mapping for Enriching Low-Resource Language Models
Abstract
AbstractThe objective of the research we present is to remedy the problem of the low quality of language models for low-resource languages. We introduce an algorithm, the Token Embedding Mapping Algorithm (TEMA), that maps the token embeddings of a richly pre-trained model L1 to a poorly trained model L2, thus creating a richer L2’ model. Our experiments show that the L2’ model reduces perplexity with respect to the original monolingual model L2, and that for downstream tasks, including SuperGLUE, the results are state-of-the-art or better for the most semantic tasks. The models obtained with TEMA are also competitive or better than multilingual or extended models proposed as solutions for mitigating the low-resource language problems.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Learning Paradigms > Transfer Learning
Natural Language Processing > Resources & Methods > Multilingual NLP
Machine Learning > Learning Types > Transfer Learning
Artificial Intelligence > Core AI > Large Language Models
Natural Language Processing > Resources & Methods > Transfer Learning
Natural Language Processing > Resources & Methods > Language Modeling
Artificial Intelligence > Core AI > Knowledge Representation
Deep Learning > Techniques > Transfer Learning
Deep Learning > Learning Types > Transfer Learning