End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

Mingrui Wu; Jiaxin Gu; Yunhang Shen; Mingbao Lin; Chao Chen; Xiaoshuai Sun

2023 AAAI AAAI 2023

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

Abstract

Abstract Most existing Human-Object Interaction (HOI) Detection methods rely heavily on full annotations with predefined HOI categories, which is limited in diversity and costly to scale further. We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. The fundamental challenges are to discover potential human-object pairs and identify novel HOI categories. To overcome the above challenges, we propose a novel End-to-end zero-shot HOI Detection (EoID) framework via vision-language knowledge distillation. We first design an Interactive Score module combined with a Two-stage Bipartite Matching algorithm to achieve interaction distinguishment for human-object pairs in an action-agnostic manner. Then we transfer the distribution of action probability from the pretrained vision-language teacher as well as the seen ground truth to the HOI model to attain zero-shot HOI classification. Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs. Finally, our method outperforms the previous SOTA under various zero-shot settings. Moreover, our method is generalizable to large-scale object detection data to further scale up the action sets. The source code is available at: https://github.com/mrwu-mac/EoID.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — distinguishable learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mingrui Wu , Jiaxin Gu , Yunhang Shen , Mingbao Lin , Chao Chen , Xiaoshuai Sun

Topics

Machine Learning > Application Areas > Knowledge Distillation Computer Vision > Analysis > Object Detection Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Knowledge Distillation Deep Learning > Learning Types > Zero-Shot Learning

Keywords

zero-shot learning object detection knowledge distillation bipartite matching human-object interaction vision-language model human-object interaction detection distinguishable learning

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023