Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

Zhiyang Chen; Yousong Zhu; Zhaowen Li; Fan Yang; Wei Li; Haixin Wang; Chaoyang Zhao; Liwei Wu; Rui Zhao; Jinqiao Wang; Ming Tang

2022 NIPS NeurIPS 2022

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

Abstract

Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7% AP on object detection, 89.0% AP on multi-label classification and 65.0% AP on human pose estimation. These results demonstrate its potential to be generally applied to different visual tasks. Code has been made available at: https://github.com/CASIA-IVA-Lab/Obj2Seq.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — object-centric vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhiyang Chen , Yousong Zhu , Zhaowen Li , Fan Yang , Wei Li , Haixin Wang , Chaoyang Zhao , Liwei Wu , Rui Zhao , Jinqiao Wang , Ming Tang

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Object Detection

Keywords

sequence generation object detection human pose estimation object-centric vision class prompt

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022