Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild With Pose Annotations

Adel Ahmadyan; Liangkai Zhang; Artsiom Ablavatski; Jianing Wei; Matthias Grundmann

2021 CVPR CVPR 2021

Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild With Pose Annotations

Abstract

3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval. We introduce the Objectron dataset to advance the state of the art in 3D object detection and foster new research and applications, such as 3D object tracking, view synthesis, and improved 3D shape representation. The dataset contains object-centric short videos with pose annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos. We also propose a new evaluation metric, 3D Intersection over Union, for 3D object detection. We demonstrate the usefulness of our dataset in 3D object detection and novel view synthesis tasks by providing baseline models trained on this dataset. Our dataset and evaluation source code are available online at Github.com/google-research-datasets/Objectron.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — object-centric video

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Adel Ahmadyan , Liangkai Zhang , Artsiom Ablavatski , Jianing Wei , Matthias Grundmann

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Object Detection Computer Vision > Domain-Specific > Autonomous Driving Artificial Intelligence > Core AI > Computer Vision

Keywords

video understanding 3d object detection benchmark dataset novel view synthesis view synthesis pose annotation object-centric video

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021