Foundation X: Integrating Classification Localization and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis
Abstract
Developing robust and versatile deep-learning models is essential for enhancing diagnostic accuracy and guiding clinical interventions in medical imaging but it requires a large amount of annotated data. The advancement of deep learning has facilitated the creation of numerous medical datasets with diverse expert-level annotations. Aggregating these datasets can maximize data utilization and address the inadequacy of labeled data. However the heterogeneity of expert-level annotations across tasks such as classification localization and segmentation presents a significant challenge for learning from these datasets. To this end we introduce Foundation X an end-to-end framework that utilizes diverse expert-level annotations from numerous public datasets to train a foundation model capable of multiple tasks including classification localization and segmentation. To address the challenges of annotation and task heterogeneity we propose a Lock-Release pretraining strategy to enhance the cyclic learning from multiple datasets combined with the student-teacher learning paradigm ensuring the model retains general knowledge for all tasks while preventing overfitting to any single task. To demonstrate the effectiveness of Foundation X we trained a model using 11 chest X-ray datasets covering annotations for classification localization and segmentation tasks. Our experimental results show that Foundation X achieves notable performance gains through extensive annotation utilization excels in cross-dataset and cross-task learning and further enhances performance in organ localization and segmentation tasks. All code and pretrained models are publicly accessible at GitHub.com/JLiangLab/Foundation_X.