Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Wenda Xu; Michael Saxon; Misha Sra; William Yang Wang

2022 AAAI AAAI 2022

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Abstract

Abstract Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online. At present, two bottlenecks interfere with the goal of building high-quality medical expert-layman style transfer systems: a dearth of pretrained medical-domain language models spanning both expert and layman terminologies and a lack of parallel corpora for training the transfer task itself. To mitigate the first issue, we propose a novel language model (LM) pretraining task, Knowledge Base Assimilation, to synthesize pretraining data from the edges of a graph of expert- and layman-style medical terminology terms into an LM during self-supervised learning. To mitigate the second issue, we build a large-scale parallel corpus in the medical expert-layman domain using a margin-based criterion. Our experiments show that transformer-based models pretrained on knowledge base assimilation and other well-established pretraining tasks fine-tuning on our new parallel corpus leads to considerable improvement against expert-layman transfer benchmarks, gaining an average relative improvement of our human evaluation, the Overall Success Rate (OSR), by 106%.

🌉 Interdisciplinary Bridge — Deep Learning and Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — knowledge base assimilation

🐣 Hot Topic Early Bird — medical domain

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wenda Xu , Michael Saxon , Misha Sra , William Yang Wang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Text Generation Deep Learning > Learning Types > Self-Supervised Learning Healthcare & Medicine > Clinical > Medical NLP

Keywords

self-supervised learning parallel corpus text style transfer knowledge base language model medical domain language model pretraining knowledge base assimilation

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022