2023
EMNLP
EMNLP 2023
Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean
Abstract
AbstractThis study investigates the extent to which currently available morpheme parsers/taggers apply to lesser-studied languages and language-usage contexts, with a focus on second language (L2) Korean. We pursue this inquiry by (1) training a neural-network model (pre-trained on first language [L1] Korean data) on varying L2 datasets and (2) measuring its morpheme parsing/POS tagging performance on L2 test sets from both the same and different sources of the L2 train sets. Results show that the L2 trained models generally excel in domain-specific tokenization and POS tagging compared to the L1 pre-trained baseline model. Interestingly, increasing the size of the L2 training data does not lead to improving model performance consistently.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— morpheme parsing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Deep Learning > Architectures > Neural Networks
Natural Language Processing > Understanding > Part-of-Speech Tagging
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Multilingual NLP
Machine Learning > Learning Paradigms > Transfer Learning
Machine Learning > Learning Types > Transfer Learning
Artificial Intelligence > Core AI > Language
Deep Learning > Learning Types > Transfer Learning