Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

Zhizheng Zhang; Cuiling Lan; Wenjun Zeng; Zhibo Chen

2020 CVPR CVPR 2020

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

Abstract

Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, i.e.no, pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — video-level representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhizheng Zhang , Cuiling Lan , Wenjun Zeng , Zhibo Chen

Topics

Machine Learning > Core Methods > Metric Learning Computer Vision > Analysis > Person Re-Identification Computer Vision > Processing > Video Processing Computer Vision > Analysis > Video Understanding Deep Learning > Techniques > Attention

Keywords

attention mechanism video understanding person re-identification feature aggregation spatio-temporal feature video-based re-identification video-level representation multi-granularity attention

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020