Can Pseudo-Label Be More Reliable? A Simple yet Effective Topology-Aware Graph Self-Training Method
Abstract
Abstract Graph Neural Networks (GNNs) have demonstrated impressive success across a range of graph-based tasks. However, their performance in node classification typically relies on enough high-quality labeled data which are difficult to obtain in practice. Self-training emerges as a promising solution to tackle the issue of label scarcity. Most existing studies in this direction mainly rely on classification scores to explore high-confidence unlabeled samples. Nevertheless, these methods often lead to false positive samples, which hinders the capability of GNNs. To this end, we propose a simple yet effective Topology-Aware Graph Self-Training (TA-GST) method. Specifically, we first explore the origin of false positives in pseudo-labeled samples. We then design a topology-aware scoring method, which considers both the classification score and connectivity pattern to enhance the reliability of pseudo-labeled samples. Besides, we depart TA-GST from the traditional teacher-student pattern and simplify it in an end-to-end manner. Extensive experiments on seven real-world datasets demonstrate the effectiveness of our method.