Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

Renchun You; Zhiyao Guo; Lei Cui; Xiang Long; Yingze Bao; Shilei Wen

2020 AAAI AAAI 2020

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

Abstract

Abstract Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative features for each class. In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi-label classification. Based on the constructed label graph, we propose an adjacency-based similarity graph embedding method to learn semantic label embeddings, which explicitly exploit label relationships. Then our novel cross-modality attention maps are generated with the guidance of learned label embeddings. Experiments on two multi-label image classification datasets (MS-COCO and NUS-WIDE) show our method outperforms other existing state-of-the-arts. In addition, we validate our method on a large multi-label video classification dataset (YouTube-8M Segments) and the evaluation results demonstrate the generalization capability of our method.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Multi-Label Classification

🧭 Keyword Pioneer — cross-modality attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Renchun You , Zhiyao Guo , Lei Cui , Xiang Long , Yingze Bao , Shilei Wen

Topics

Machine Learning > Core Methods > Classification Computer Vision > Analysis > Object Detection Deep Learning > Techniques > Attention Computer Vision > Analysis > Image Classification Deep Learning > Learning Types > Multi-Label Classification

Keywords

video classification multi-label classification label embedding cross-modality attention semantic graph embedding

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020