2021
EMNLP
EMNLP 2021
High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task
Abstract
AbstractThis paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain sub-words in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Healthcare & Medicine and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— terminology word
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI
Natural Language Processing > Applications > Machine Translation
Healthcare & Medicine > Research > Bioinformatics
Natural Language Processing > Generation > Machine Translation
Machine Learning > Learning Types > Domain Adaptation
Machine Learning > Learning Types > Data Augmentation
Deep Learning > Learning Types > Domain Adaptation