2020
ACL
ACL 2020
Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity
Abstract
AbstractIn this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain. In low-resource setting of clinical STS, these large models tend to be impractical and prone to overfitting. Building on BERT, we study the impact of a number of model design choices, namely different fine-tuning and pooling strategies. We observe that the impact of domain-specific fine-tuning on clinical STS is much less than that in the general domain, likely due to the concept richness of the domain. Based on this, we propose two data augmentation techniques. Experimental results on N2C2-STS 1 demonstrate substantial improvements, validating the utility of the proposed methods.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— semantic textual similarity
🐣
Hot Topic Early Bird
— pre-trained language model
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
Authors
Topics
Machine Learning > Application Areas > Data Augmentation
Natural Language Processing > Understanding > Semantic Analysis
Natural Language Processing > Resources & Methods > Large Language Models
Healthcare & Medicine > Clinical > Clinical NLP
Machine Learning > Learning Types > Transfer Learning
Artificial Intelligence > Core AI > Natural Language Processing