2024 NAACL NAACL 2024

Large Language Models Encode the Practice of Medicine

Abstract

AbstractHealthcare tasks such as predicting clinical outcomes across medical and surgical populations, disease prediction, predicting patient health journeys, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of billions of administrative claims, which essentially encapsulates the practice of medicine, offering a unique perspective on patient care and treatment patterns. Our model, MediClaimGPT, a 125M parameter Transformer demonstrates strong zero-shot predictive capabilities, accurately forecasting patient health events across four evaluation datasets, with its capabilities further demonstrated in various downstream tasks. A significant application of MediClaimGPT is in generating high-quality, clinically plausible synthetic claims data, enhancing healthcare data utility while preserving patient privacy. This research underscores the potential of language models in handling complex datasets and their strategic application in healthcare and related fields.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio