2026 AAAI AAAI 2026

Adaptive Coreset Selection via Uncertainty-Density for Efficient Spam Detection (Student Abstract)

Abstract

Abstract Efficient spam detection in resource-constrained environments remains challenging due to class imbalance, noisy text, and the computational demands of large Transformer models. We introduce a novel coreset selection framework based on a unified Entropy, Class-Balanced Uncertainty-Density Ranking (CBUDR) scheme. Our method prioritizes highly informative and uncertain samples while ensuring diversity and class balance within the selected subset. The framework flexibly supports multiple selection strategies, including Top-K, Bottom-K, and adaptive class-wise schemes, enabling robust performance even when training on as little as 5% of the dataset. Extensive experiments on benchmark datasets (UCI SMS, UTKML Twitter, LingSpam) show that our ranking scheme achieves competitive accuracy, precision, and recall while significantly reducing computational cost. These results demonstrate that carefully designed coreset strategies can surpass full-data performance in both balanced and imbalanced settings, highlighting the potential for deployment on low-power devices and mobile platforms.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio