What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions

A S M Iftekhar; Hao Chen; Kaustav Kundu; Xinyu Li; Joseph Tighe; Davide Modolo

2022 CVPR CVPR 2022

What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions

Abstract

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules to help select the most relevant object-action pairs within an image and refine the queries' representation using rich semantic and spatial features. These enhancements lead to state-of-the-art results on the two most popular HOI benchmarks: V-COCO and HICO-DET.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — semantic refinement

🐣 Hot Topic Early Bird — human-object interaction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

A S M Iftekhar , Hao Chen , Kaustav Kundu , Xinyu Li , Joseph Tighe , Davide Modolo

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > Object Detection Artificial Intelligence > Core AI > Computer Vision Computer Vision > Core AI > Computer Vision

Keywords

object detection human-object interaction human-object interaction detection semantic feature one-stage detection semantic refinement spatial refinement

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022