Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs

Wenrui Zhao; Yijun Tian; Zhichao Xu; Yawei Wang; Chuxu Zhang

2026 AAAI AAAI 2026

Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs

Abstract

Abstract Heterogeneous Graph Neural Networks (HGNNs) have demonstrated remarkable capabilities in capturing effective information in heterogeneous graphs, achieving outstanding performance in various learning tasks. However, the heavy dependency of HGNNs on neighbors information may result in high latency, which restricts their practicality in real-world applications. Recent studies have attempted to overcome such latency in Graph Neural Networks (GNNs) by distilling knowledge into student models that do not rely on graph structure. But these approaches primarily focus on replicating teachers' predictive outcomes while neglecting the structural knowledge they encoded. This limitation makes such approach less effective when graphs become complex, particularly on heterogeneous graphs. Motivated by this challenge, we propose HGKD, a novel hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to a multi-layer perceptron student. Additionally, we provide two variants of HGKD that help the student learn from multiple teacher models through Pareto learning and incorporate low-cost neighbor information. We evaluate HGKD and its variants on a range of heterogeneous graph datasets. The results demonstrate that our student model achieves performance comparable to or exceeding that of HGNN teachers, despite not relying on graph structures during inference.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wenrui Zhao , Yijun Tian , Zhichao Xu , Yawei Wang , Chuxu Zhang

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Architectures > Graph Neural Networks

Keywords

knowledge distillation student model heterogeneous graph neural network multi-layer perceptron teacher model structural knowledge

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026