Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

Tianming Liang; Chaolei Tan; Beihao Xia; Wei-Shi Zheng; Jian-Fang Hu

2024 CVPR CVPR 2024

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

Abstract

This paper focuses on open-ended video question answering which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task since a question may have multiple answers. However due to annotation costs the labels in existing benchmarks are always extremely insufficient typically one answer per question. As a result existing works tend to directly treat all the unlabeled answers as negative labels leading to limited ability for generalization. In this work we introduce a simple yet effective ranking distillation framework (RADI) to mitigate this problem without additional manual annotation. RADI employs a teacher model trained with incomplete labels to generate rankings for potential answers which contain rich knowledge about label priority as well as label-associated visual cues thereby enriching the insufficient labeling information. To avoid overconfidence in the imperfect teacher model we further present two robust and parameter-free ranking distillation approaches: a pairwise approach which introduces adaptive soft margins to dynamically refine the optimization constraints on various pairwise rankings and a listwise approach which adopts sampling-based partial listwise learning to resist the bias in teacher ranking. Extensive experiments on five popular benchmarks consistently show that both our pairwise and listwise RADIs outperform state-of-the-art methods. Further analysis demonstrates the effectiveness of our methods on the insufficient labeling problem.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — insufficient labeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tianming Liang , Chaolei Tan , Beihao Xia , Wei-Shi Zheng , Jian-Fang Hu

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Application Areas > Knowledge Distillation Computer Vision > Processing > Video Understanding Machine Learning > Core Methods > Ranking Machine Learning > Learning Types > Knowledge Distillation Machine Learning > Learning Types > Multi-Modal Learning Machine Learning > Learning Types > Multi-Label Classification Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Knowledge Distillation Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

knowledge distillation multi-label classification video question answering teacher student learning ranking distillation insufficient labeling insufficient label

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024