2023 AAAI AAAI 2023

Diversity Maximization in the Presence of Outliers

Abstract

Abstract Given a set X of n points in a metric space, the problem of diversity maximization is to extract a set S of k points from X so that the diversity of S is maximized. This problem is essential in AI-related fields, such as web search, databases, recommender systems, and data mining. Although there have been extensive studies of this problem, these studies assume that X is clean. This usually does not hold, because real-world datasets usually contain outliers. The state-of-the-art algorithm for the diversity maximization problem is based on furthest point retrieval, which is too sensitive to outliers. We therefore address the problem of diversity maximization with outliers and propose two algorithms with performance guarantee. The first algorithm runs in O((k+z)n) time, guarantees 1/2-approximation, and returns no outliers, where z is the number of outliers. The second algorithm runs in O(kz) time (which is independent of n), guarantees 1/6(1+epsilon)-approximation, and returns no outliers with constant probability. We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning and Mathematics & Optimization
🧭 Keyword Pioneer — furthest point retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors