2025
AACL
AACL 2025
Does Synthetic Data Help Named Entity Recognition for Low-Resource Languages?
Abstract
AbstractWe explore whether synthetic datasets generated by large language models using a few high quality seed samples are useful for low-resource named entity recognition, considering 11 languages from three language families. Our results suggest that synthetic data created with such seed data is a reasonable choice when there is no available labeled data, and is better than using entirely automatically labeled data. However, a small amount of high-quality data, coupled with cross-lingual transfer from a related language, always offers better performance. Data and code available at: https://github.com/grvkamath/low-resource-syn-ner.
❓
The Questioner
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— seed datum
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Learning Paradigms > Transfer Learning
Machine Learning > Application Areas > Domain Adaptation
Natural Language Processing > Understanding > Named Entity Recognition
Natural Language Processing > Resources & Methods > Multilingual NLP
Machine Learning > Learning Types > Few-Shot Learning