2023 AAAI AAAI 2023

UEQMS: UMAP Embedded Quick Mean Shift Algorithm for High Dimensional Clustering

Abstract

Abstract The mean shift algorithm is a simple yet very effective clustering method widely used for image and video segmentation as well as other exploratory data analysis applications. Recently, a new algorithm called MeanShift++ (MS++) for low-dimensional clustering was proposed with a speedup of 4000 times over the vanilla mean shift. In this work, starting with a first-of-its-kind theoretical analysis of MS++, we extend its reach to high-dimensional data clustering by integrating the Uniform Manifold Approximation and Projection (UMAP) based dimensionality reduction in the same framework. Analytically, we show that MS++ can indeed converge to a non-critical point. Subsequently, we suggest modifications to MS++ to improve its convergence characteristics. In addition, we propose a way to further speed up MS++ by avoiding the execution of the MS++ iterations for every data point. By incorporating UMAP with modified MS++, we design a faster algorithm, named UMAP embedded quick mean shift (UEQMS), for partitioning data with a relatively large number of recorded features. Through extensive experiments, we showcase the efficacy of UEQMS over other state-of-the-art algorithms in terms of accuracy and runtime.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning
🧭 Keyword Pioneer — umap embedding
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio