2024 EACL EACL 2024

Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain

Abstract

AbstractSince medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity.

🌉 Interdisciplinary Bridge — Healthcare & Medicine and Machine Learning and Natural Language Processing and Security & Privacy
🧭 Keyword Pioneer — synthetic text
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio