Use Your Head: Improving Long-Tail Video Recognition

Toby Perrett; Saptarshi Sinha; Tilo Burghardt; Majid Mirmehdi; Dima Damen

2023 CVPR CVPR 2023

Use Your Head: Improving Long-Tail Video Recognition

Abstract

This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction (LMR), which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: github.com/tobyperrett/lmr

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — mixed reconstruction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Toby Perrett , Saptarshi Sinha , Tilo Burghardt , Majid Mirmehdi , Dima Damen

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Adaptation Computer Vision > Analysis > Action Recognition Machine Learning > Learning Types > Representation Learning Machine Learning > Learning Types > Classification Computer Vision > Analysis > Video Understanding Machine Learning > Learning Types > Imbalanced Learning

Keywords

representation learning few-shot learning video recognition data augmentation class imbalance long-tail distribution mixed reconstruction

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023