Saliency-based Sequential Image Attention with Multiset Prediction

Sean Welleck; Jialin Mao; Kyunghyun Cho; Zheng Zhang

2017 NIPS NeurIPS 2017

Saliency-based Sequential Image Attention with Multiset Prediction

Abstract

Humans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual attention, and is used for multi-label image classification on a novel multiset task, demonstrating that it achieves high precision and recall while localizing objects with its attention. Unlike conventional multi-label image classification models, the model supports multiset prediction due to a reinforcement-learning based training process that allows for arbitrary label permutation and multiple instances per label.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Multi-Label Classification

🧭 Keyword Pioneer — multiset prediction

🐣 Hot Topic Early Bird — visual attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sean Welleck , Jialin Mao , Kyunghyun Cho , Zheng Zhang

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Object Detection Deep Learning > Techniques > Attention Computer Vision > Analysis > Image Classification Deep Learning > Learning Types > Multi-Label Classification

Keywords

image classification reinforcement learning visual attention saliency map multiset prediction

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017