2026 AAAI AAAI 2026

Semantic Embedding and Synthetic Augmentation for Longitudinal Survey Prediction (Student Abstract)

Abstract

Abstract Longitudinal surveys are a crucial component of behavioral research. Such surveys, however, face significant gaps in the data created by item and unit non-responses as well as semantic gaps resulting from questionnaires, assessed trends, and data collection methods evolving over time. Using 15 waves of vaccination surveys as a test-bed, we demonstrate how modern AI techniques can bridge both item and unit gaps, originating from non-response, and semantic gaps, originating from instrument evolution. We address these gaps through a two-component framework. We leverage LLM-generated semantic embeddings of survey questions to encode question meaning, enabling a Deep & Cross Network used for imputation to jointly model responses across item semantics, individual characteristics, and temporal dynamics. This structure directly addresses survey evolution by operating in learned semantic space. To overcome data scarcity, we use cluster-informed synthetic data generation via hierarchical prompting that produces synthetic responses preserving distributional properties and empirical cluster structure. Our approach achieves a strong improvement in semantic gap tasks and 80-90% synthetic data fidelity, providing practical solutions for evolving longitudinal studies.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — survey imputation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio