Identifying First-Person Camera Wearers in Third-Person Videos

Chenyou Fan; Jangwon Lee; Mingze Xu; Krishna Kumar Singh; Yong Jae Lee; David J. Crandall; Michael S. Ryoo

2017 CVPR CVPR 2017

Identifying First-Person Camera Wearers in Third-Person Videos

Abstract

We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in scenarios in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this paper, we propose a new semi-Siamese Convolutional Neural Network architecture to address this novel challenge. We formulate the problem as learning a joint embedding space for first- and third-person videos that considers both spatial- and motion-domain cues. A new triplet loss function is designed to minimize the distance between correct first- and third-person matches while maximizing the distance between incorrect ones. This end-to-end approach performs significantly better than several baselines, in part by learning the first- and third-person features optimized for matching jointly with the distance measure itself.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — semisiamese convolutional neural network

🐣 Hot Topic Early Bird — egocentric vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenyou Fan , Jangwon Lee , Mingze Xu , Krishna Kumar Singh , Yong Jae Lee , David J. Crandall , Michael S. Ryoo

Topics

Machine Learning > Core Methods > Metric Learning Computer Vision > Analysis > Activity Recognition Computer Vision > Analysis > Person Re-Identification Computer Vision > Domain-Specific > Egocentric Vision Deep Learning > Learning Types > Representation Learning

Keywords

metric learning egocentric vision video understanding person re-identification activity recognition convolutional neural network egocentric video triplet loss semisiamese convolutional neural network

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017