Mining Recurring Concept Drifts with Limited Labeled Streaming Data

Peipei Li; Xindong Wu; Xuegang Hu

2010 ACML ACML 2010

Mining Recurring Concept Drifts with Limited Labeled Streaming Data

Abstract

Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream envi- ronment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this paper focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a Semisupervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which, a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k-Means is installed to produce concept clusters and unlabeled data are labeled at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over two state-of-the-art online classification algorithms of CVFDT and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.

🚀 Conference Pioneer — ACML 2010

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

📈 Trend Setter — Data Mining

🧭 Keyword Pioneer — recurring drift

🐣 Hot Topic Early Bird — semi-supervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

Authors

Peipei Li , Xindong Wu , Xuegang Hu

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Semi-Supervised Learning Data Science & Analytics > Methods > Data Mining Machine Learning > Learning Types > Supervised Learning

Keywords

concept drift semi-supervised learning clustering algorithm data stream decision tree data stream mining recurring drift

Download PDF

Related papers

Single versus Multiple Sorting in All Pairs Similarity Search 2010

Multi-task Learning for Recommender System 2010

Adaptive Step-size Policy Gradients with Average Reward Metric 2010

Content-based Image Retrieval with Multinomial Relevance Feedback 2010

Pairwise Measures of Causal Direction in Linear Non-Gaussian Acyclic Models 2010