Joint-task Self-supervised Learning for Temporal Correspondence

Xueting Li; Sifei Liu; Shalini De Mello; Xiaolong Wang; Jan Kautz; Ming-Hsuan Yang

2019 NIPS NeurIPS 2019

Joint-task Self-supervised Learning for Temporal Correspondence

Abstract

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions and establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — temporal correspondence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xueting Li , Sifei Liu , Shalini De Mello , Xiaolong Wang , Jan Kautz , Ming-Hsuan Yang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > Object Tracking Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Self-Supervised Learning

Keywords

self-supervised learning visual correspondence temporal correspondence video object tracking pixel-level matching region tracking pixel-level tracking

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019