Cyto-SSL: A Self-Supervised Pretraining Framework for Cytology Foundation Model

Yiming Zhang; Rui Yan; Xiaohua Wan; Yifan Zhao; Shuang Feng; Zhetao Xu; Ying Wang; Fa Zhang; Bin Hu

2026 AAAI AAAI 2026

Cyto-SSL: A Self-Supervised Pretraining Framework for Cytology Foundation Model

Abstract

Abstract Cytological images originate from exfoliated cells, collected via liquid-based slides and digitized into whole slide images (WSIs). Unlike histological WSIs that exhibit continuous and well-structured tissue, cytological WSIs are sparse in spatial distribution and unstructured in cellular relationships. Typically, the nucleus serves as the primary diagnostic feature, while surrounding cytoplasmic information plays a supportive role. These unique characteristics limit the development of effective foundation models and hinder the transferability of histology-based models for cytopathology. To address this, we propose **Cyto-SSL**, the first self-supervised pretraining framework for cytological images. It introduces **Nuclei-Centered Perturbation**, which highlights individual nuclei by perturbing non-nuclear regions. We also design an SR-Transformer module, which complements this by using sparse attention to concentrate on diagnostically relevant scattered cells, while iRPE helps model to capture local spatial relationships and avoids unnecessary attention to irrelevant global structures. Experimental results show that **Cyto-SSL** enhances performance across diverse cytological datasets and Multiple Instance Learning (MIL) methods. On a WSI-level dataset, it achieved 95.67% accuracy and outperformed ImageNet-pretrained ResNet-50 by 11.33%, demonstrating superior feature representation for cytological analysis. Additionally, **Cyto-SSL** modules are plug-and-play, easily integrated into other pretraining frameworks, yielding a 2.6% accuracy gain across different SSL methods.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yiming Zhang , Rui Yan , Xiaohua Wan , Yifan Zhao , Shuang Feng , Zhetao Xu , Ying Wang , Fa Zhang , Bin Hu

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Domain-Specific > Medical Imaging

Keywords

self-supervised learning feature representation foundation model whole slide image nucleus detection

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026