Exploring Pose-Aware Human-Object Interaction via Hybrid Learning

Eastman Z Y Wu; Yali Li; Yuan Wang; Shengjin Wang

2024 CVPR CVPR 2024

Exploring Pose-Aware Human-Object Interaction via Hybrid Learning

Abstract

Human-Object Interaction (HOI) detection plays a crucial role in visual scene comprehension. In recent advancements two-stage detectors have taken a prominent position. However they are encumbered by two primary challenges. First the misalignment between feature representation and relation reasoning gives rise to a deficiency in discriminative features crucial for interaction detection. Second due to sparse annotation the second-stage interaction head generates numerous candidate <human object> pairs with only a small fraction receiving supervision. Towards these issues we propose a hybrid learning method based on pose-aware HOI feature refinement. Specifically we devise pose-aware feature refinement that encodes spatial features by considering human body pose characteristics. It can direct attention towards key regions ultimately offering a wealth of fine-grained features imperative for HOI detection. Further we introduce a hybrid learning method that combines HOI triplets with probabilistic soft labels supervision which is regenerated from decoupled verb-object pairs. This method explores the implicit connections between the interactions enhancing model generalization without requiring additional data. Our method establishes state-of-the-art performance on HICO-DET benchmark and excels notably in detecting rare HOIs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — pose-aware feature refinement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Eastman Z Y Wu , Yali Li , Yuan Wang , Shengjin Wang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Weakly Supervised Learning Computer Vision > Analysis > Action Recognition Computer Vision > Analysis > Human Analysis Computer Vision > Analysis > Object Detection Machine Learning > Learning Types > Multi-Task Learning Artificial Intelligence > Core AI > Computer Vision Deep Learning > Learning Types > Multi-Task Learning

Keywords

representation learning semantic segmentation feature learning pose estimation object detection human-object interaction interaction detection human-object interaction detection hybrid learning pose-aware feature refinement interaction recognition spatial feature encoding soft label supervision

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024