2024
EMNLP
EMNLP 2024
RAR: Retrieval-augmented retrieval for code generation in low resource languages
Abstract
AbstractLanguage models struggle in generating code for low-resource programming languages, since these are underrepresented in training data. Either examples or documentation are commonly used for improved code generation. We propose to use both types of information together and present retrieval augmented retrieval (RAR) as a two-step method for selecting relevant examples and documentation. Experiments on three low-resource languages (Power Query M, OfficeScript and Excel formulas) show that RAR outperforms independently example and grammar retrieval (+2.81–26.14%). Interestingly, we show that two-step retrieval selects better examples and documentation when used independently as well.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— low-resource programming
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Learning Paradigms > Transfer Learning
Machine Learning > Application Areas > Domain Adaptation
Natural Language Processing > Applications > Machine Translation
Machine Learning > Learning Types > Retrieval-Augmented Generation
Natural Language Processing > Generation > Retrieval-Augmented Generation
Deep Learning > Learning Types > Retrieval-Augmented Generation