Multimodal Machine Learning for Automated ICD Coding

Keyang Xu; Mike Lam; Jingzhi Pang; Xin Gao; Charlotte Band; Piyush Mathur; Frank Papay; Ashish K. Khanna; Jacek B. Cywinski; Kamal Maheshwari; Pengtao Xie; Eric P. Xing

2019 MLHC MLHC 2019

Multimodal Machine Learning for Automated ICD Coding

Abstract

This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC-III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coecient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Keyang Xu , Mike Lam , Jingzhi Pang , Xin Gao , Charlotte Band , Piyush Mathur , Frank Papay , Ashish K. Khanna , Jacek B. Cywinski , Kamal Maheshwari , Pengtao Xie , Eric P. Xing

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Applications > Text Classification

Keywords

ensemble method multimodal classification icd coding medical text classification

Download PDF

Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction 2019

Counterfactual Reasoning for Fair Clinical Risk Prediction 2019

What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use 2019

Using Domain Knowledge to Overcome Latent Variables in Causal Inference from Time Series 2019

Multimodal Machine Learning for Automated ICD Coding

Abstract

Authors

Topics

Keywords

Related papers