Few-Shot Video Classification via Temporal Alignment

Kaidi Cao; Jingwei Ji; ZHANGJIE CAO; Chien-Yi Chang; Juan Carlos Niebles

2020 CVPR CVPR 2020

Few-Shot Video Classification via Temporal Alignment

Abstract

Difficulty in collecting and annotating large-scale video data raises a growing interest in learning models which can recognize novel classes with only a few training examples. In this paper, we propose the Ordered Temporal Alignment Module (OTAM), a novel few-shot learning framework that can learn to classify a previously unseen video. While most previous work neglects long-term temporal ordering information, our proposed model explicitly leverages the temporal ordering information in video data through ordered temporal alignment. This leads to strong data-efficiency for few-shot learning. In concrete, our proposed pipeline learns a deep distance measurement of the query video with respect to novel class proxies over its alignment path. We adopt an episode-based training scheme and directly optimize the few-shot learning objective. We evaluate OTAM on two challenging real-world datasets, Kinetics and Something-Something-V2, and show that our model leads to significant improvement of few-shot video classification over a wide range of competitive baselines and outperforms state-of-the-art benchmarks by a large margin.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — episode-based training

🐣 Hot Topic Early Bird — temporal alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kaidi Cao , Jingwei Ji , ZHANGJIE CAO , Chien-Yi Chang , Juan Carlos Niebles

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Deep Learning > Architectures > Transformers Machine Learning > Learning Types > Few-Shot Learning Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Few-Shot Learning Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

metric learning few-shot learning video classification temporal alignment deep metric learning episode-based training episode training ordered temporal alignment

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020