Scalable Semi-supervised Community Search via Graph Transformer on Attributed Heterogeneous Information Networks
Abstract
Abstract Attributed heterogeneous information networks (AHINs) encode rich semantics through diverse node and edge types. Recent learning-based community search methods on AHINs have shown promising performance but face two major limitations: i) difficulty scaling to large graphs due to memory-intensive neighbor-based propagation (e.g., GNNs and node-level attention), and ii) reliance on explicit community-level labels, which are often unavailable or costly to obtain. To address these issues, we propose a scalable Semi-supervised Community Search framework on AHINs (SCSAH), enabling scalability and efficiency, while eliminating the need for community-level labels by leveraging readily available node classification labels. Specifically, we devise MvSF2Token to extract Multi-view Semantic Features (MvSFs) as compact subgraph-level tokens before training, significantly reducing model propagation complexity. We then design a View-Aware Semantic Graph Transformer (VASGhormer) to effectively encode MvSFs by capturing cross-view dependencies and fusing semantic features. The combination of MvSF2Token and VASGhormer ensures scalability, efficiency, and robust performance. Furthermore, we design a View-Aware Contrastive Learner to train VASGhormer without requiring community-level supervision. Extensive experiments on five real-world datasets show that SCSAH outperforms state-of-the-art methods, achieving 18.06% higher performance and 10.43 times faster training.