Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Liulei Li; Tianfei Zhou; Wenguan Wang; Lu Yang; Jianwu Li; Yi Yang

2022 CVPR CVPR 2022

Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Abstract

Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a locality-aware inter-and intra-video reconstruction framework that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our LIIR location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — spatial compactness

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Liulei Li , Tianfei Zhou , Wenguan Wang , Lu Yang , Jianwu Li , Yi Yang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Contrastive Learning Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > Object Tracking Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Representation Learning

Keywords

representation learning contrastive learning self-supervised learning video understanding visual correspondence position encoding instance discrimination video representation spatial compactness cross video affinity

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022