Efficient Feature Selection via Analysis of Relevance and Redundancy

Lei Yu; Huan Liu

2004 JMLR JMLR 2004

Efficient Feature Selection via Analysis of Relevance and Redundancy

Abstract

Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods. [abs] [ pdf ]

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Information Theory

🧭 Keyword Pioneer — feature relevance

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🐣 Hot Topic Early Bird — dimensionality reduction

Authors

Lei Yu , Huan Liu

Topics

Machine Learning > Core Methods > Classification Mathematics & Optimization > Mathematics > Information Theory Machine Learning > Core Methods > Dimensionality Reduction Machine Learning > Core Methods > Feature Selection

Keywords

dimensionality reduction feature selection feature relevance feature redundancy correlation-based method

Download PDF

Related papers

Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees 2004

Fast String Kernels using Inexact Matching for Protein Sequences 2004

Learning the Kernel Matrix with Semidefinite Programming 2004

Weather Data Mining Using Independent Component Analysis 2004

A Geometric Approach to Multi-Criterion Reinforcement Learning 2004