2026 AAAI AAAI 2026

Generating-Filtering-Ranking: A Three-Stage MultiModal Data Augmentation Framework Under Partial Modality Missing

Abstract

Abstract Multimodal data significantly improves the performance of pretrained models, but its practical application is often limited by missing or incomplete data across modalities. There are two key challenges that existing methods of synthesizing missing data face: (1) semantic inaccuracies due to model hallucinations and (2) discrepancies in distribution preferences between generated and original data. To address these challenges, we propose a novel three-stage multimodal data augmentation framework (GFR), which Generate, Filter, and Rank missing modality data. Our framework leverages multimodal large models for diverse data generation, designs a scene graph matching-based filtering algorithm to ensure semantic consistency, and constructs a preference-aware ranking model to align the generated data with both the original distribution and task relevance. Our framework not only enhances semantic diversity and consistency in data generation but also effectively captures the implicit characteristics of the original dataset and the target model. We demonstrate the effectiveness of GFR across multiple datasets by testing different missing types and missing ratios.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — scene graph matching
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio