Heterogeneous Visual Features Fusion via Sparse Multimodal Machine

Hua Wang; Feiping Nie; Heng Huang; Chris Ding

2013 CVPR CVPR 2013

Heterogeneous Visual Features Fusion via Sparse Multimodal Machine

Abstract

To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object categorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.

🚀 Conference Pioneer — CVPR 2013

🌱 Topic Pioneer — Multi-Modal Learning

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

🧭 Keyword Pioneer — sparse multimodal learning

🐣 Hot Topic Early Bird — multimodal learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hua Wang , Feiping Nie , Heng Huang , Chris Ding

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Analysis > Scene Understanding Machine Learning > Core Methods > Feature Learning Computer Vision > Core AI > Multi-Modal Learning

Keywords

sparse coding sparse learning scene understanding multimodal learning object categorization feature fusion sparse multimodal learning

Download PDF

Related papers

Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques 2013

An Approach to Pose-Based Action Recognition 2013

Modeling Actions through State Changes 2013

A Convex Regularizer for Reducing Color Artifact in Color Image Recovery 2013

Deformable Spatial Pyramid Matching for Fast Dense Correspondences 2013