Spatial Graph Attention Network Modeling for Neighborhood-Scale Lead Contamination Risk Prediction Using Publicly Available Data
Abstract
Abstract Lead contamination in urban water systems remains a prevalent public health threat, affecting millions of American households and disproportionately endangering vulnerable population groups. Current municipal risk assessment and inspection strategies are overwhelmingly based on random sampling and complaint-driven protocols that overlook spatial complexity, reinforce inequities, and squander limited resources, leaving critical exposure areas unidentified. This paper presents a lead contamination risk prediction framework from socio-demographic housing features analytics, first of its kind, by drawing on partially anonymized residential testing data as ground truth and applying graph neural networks alongside gradient-boosted ensembles. Specifically, our method integrates spatial Deep Graph Attention Networks classifiers to capture inter-neighborhood contamination dependencies, fuse demographic and spatial evidence, and produce interpretable risk scores. Those scores are actionable by municipal water authorities at the intra-neighborhood level. Through extensive experiments on newly constructed Chicago block-group level datasets, our framework achieves a balanced accuracy of 84.8% and reduces false positive lead contamination by up to 44% versus spatial-only baselines and 21% over current practice, without sacrificing recall on contaminated blocks. Our approach not only extends technical boundaries in spatial-ensemble learning and privacy-preserving urban health modeling, but also provides policymakers and public health officials with a means to assess and address contamination risks, supporting efforts to protect community health and safety.