Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain

Tomohiro Nishiyama; Lisa Raithel; Roland Roller; Pierre Zweigenbaum; Eiji Aramaki

2024 EACL EACL 2024

Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain

Abstract

AbstractSince medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity.

🌉 Interdisciplinary Bridge — Healthcare & Medicine and Machine Learning and Natural Language Processing and Security & Privacy

🧭 Keyword Pioneer — synthetic text

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tomohiro Nishiyama , Lisa Raithel , Roland Roller , Pierre Zweigenbaum , Eiji Aramaki

Topics

Machine Learning > Application Areas > Privacy Healthcare & Medicine > Clinical > Clinical NLP Security & Privacy > Privacy Natural Language Processing > Applications > Named Entity Recognition Natural Language Processing > Applications > Text Generation

Keywords

natural language processing named entity recognition privacy preservation medical text synthetic data generation adverse drug reaction synthetic text text authenticity

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024