2013
NIPS
NeurIPS 2013
Rapid Distance-Based Outlier Detection via Sampling
Abstract
Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search.
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Machine Learning
📈
Trend Setter
— Data Mining
🧭
Keyword Pioneer
— sampling
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio
🐣
Hot Topic Early Bird
— anomaly detection
Authors
Topics
Machine Learning > Core Methods > Classification
Machine Learning > Core Methods > Clustering
Machine Learning > Learning Types > Unsupervised Learning
Data Science & Analytics > Methods > Data Mining
Data Science & Analytics > Applications > Clustering
Machine Learning > Core Methods > Anomaly Detection
Machine Learning > Learning Types > Sampling