Spatiotemporal Initialization for 3D CNNs With Generated Motion Patterns

Hirokatsu Kataoka; Kensho Hara; Ryusuke Hayashi; Eisuke Yamagata; Nakamasa Inoue

2022 WACV WACV 2022

Spatiotemporal Initialization for 3D CNNs With Generated Motion Patterns

Abstract

The paper proposes a framework of Formula-Driven Supervised Learning (FDSL) for spatiotemporal initialization. Our FDSL approach enables to automatically and simultaneously generate motion patterns and their video labels with a simple formula which is based on Perlin noise. We designed a dataset of generated motion patterns adequate for the 3D CNNs to learn a better basis set of natural videos. The constructed Video Perlin Noise (VPN) dataset can be applied to initialize a model before pre-training with large-scale video datasets such as Kinetics-400/700, to enhance target task performance. Our spatiotemporal initialization with VPN dataset (VPN initialization) outperforms the previous initialization method with the inflated 3D ConvNet (I3D) using 2D ImageNet dataset. Our proposed method increased the top-1 video-level accuracy of Kinetics-400 pre-trained model on Kinetics-400, UCF-101, HMDB-51, ActivityNet datasets. Especially, the proposed method increased the performance rate of Kinetics-400 pre-trained model by 10.3 pt on ActivityNet. We also report that the relative performance improvements from the baseline are greater in 3D CNNs rather than other models.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hirokatsu Kataoka , Kensho Hara , Ryusuke Hayashi , Eisuke Yamagata , Nakamasa Inoue

Topics

Machine Learning > Application Areas > Data Augmentation Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Pretraining

Keywords

video classification data augmentation convolutional neural network spatiotemporal representation motion pattern

Download PDF

Related papers

A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation 2022

Unsupervised Sounding Object Localization With Bottom-Up and Top-Down Attention 2022

Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation 2022

Deep Photo Scan: Semi-Supervised Learning for Dealing With the Real-World Degradation in Smartphone Photo Scanning 2022

Let There Be a Clock on the Beach: Reducing Object Hallucination in Image Captioning 2022