2024 CVPR CVPR 2024

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

Abstract

This paper focuses on open-ended video question answering which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task since a question may have multiple answers. However due to annotation costs the labels in existing benchmarks are always extremely insufficient typically one answer per question. As a result existing works tend to directly treat all the unlabeled answers as negative labels leading to limited ability for generalization. In this work we introduce a simple yet effective ranking distillation framework (RADI) to mitigate this problem without additional manual annotation. RADI employs a teacher model trained with incomplete labels to generate rankings for potential answers which contain rich knowledge about label priority as well as label-associated visual cues thereby enriching the insufficient labeling information. To avoid overconfidence in the imperfect teacher model we further present two robust and parameter-free ranking distillation approaches: a pairwise approach which introduces adaptive soft margins to dynamically refine the optimization constraints on various pairwise rankings and a listwise approach which adopts sampling-based partial listwise learning to resist the bias in teacher ranking. Extensive experiments on five popular benchmarks consistently show that both our pairwise and listwise RADIs outperform state-of-the-art methods. Further analysis demonstrates the effectiveness of our methods on the insufficient labeling problem.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning
🧭 Keyword Pioneer — insufficient labeling
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio