2023 WACV WACV 2023

Representation Recovering for Self-Supervised Pre-Training on Medical Images

Abstract

Advances in self-supervised learning, especially in contrastive learning, have drawn attention to investigating these techniques in providing effective visual representations from unlabeled images. It enables the models' ability of extracting highly consistent features by generating different views. Due to the recent success of Masked Autoencoders (MAE), an emerging trend of exploring generative modeling in self-supervised learning has come back into sight of the community. The generative approaches encode the input into a compact embedding and empower the models' ability of recovering the original input. However, in our experiments, we found vanilla MAE mainly recovers course high level semantic information and barely recovers detailed low level information. We show that in dense downstream prediction tasks like multi-organ segmentation, directly applying MAE is not ideal. In this paper, we propose RepRec, a hybrid visual representation learning framework for self-supervised pre-training on large-scale unlabelled medical datasets, which takes advantage of both contrastive and generative modeling. In our method, to solve the aforementioned dilemma that MAE encounters, a convolutional encoder is pre-trained to provide low-level feature information, in a contrastive way; and a transformer encoder is pre-trained to produce high level semantic dependency, in a generative way -- by recovering masked representations from the convolutional encoder. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio