R^2-Art: Category-Level Articulation Pose Estimation from Single RGB Image via Cascade Render Strategy

Li Zhang; Haonan Jiang; Yukang Huo; Yan Zhong; Jianan Wang; xue wang; Rujing Wang; Liu Liu

2025 AAAI AAAI 2025

R^2-Art: Category-Level Articulation Pose Estimation from Single RGB Image via Cascade Render Strategy

Abstract

Abstract Human life is filled with articulated objects. Previous works for estimating the pose of category-level articulated objects rely on costly 3D point clouds or RGB-D images. In this paper, our goal is to estimate category-level articulation poses from a single RGB image, where we propose R2-Art, a novel category-level Articulation pose estimation framework from a single RGB image and a cascade Render strategy. Given an RGB image as input, R2-Art estimates per-part 6D pose for the articulation. Specifically, we design parallel regression branches tailored to generate camera-to-root translation and rotation. Using the predicted joint states, we perform PC prior transformation and deformation with a joint-centric modeling approach. For further refinement, a cascade render strategy is proposed for projecting the 3D deformed prior onto the 2D mask. Extensive experiments are provided to validate our R2-Art on various datasets ranging from synthetic datasets to real-world scenarios, demonstrating the superior performance and robustness of the R2-Art. We believe that this work has the potential to be applied in many fields including robotics, embodied intelligence, and augmented reality.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Interdisciplinary and Robotics

🧭 Keyword Pioneer — cascade render

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Li Zhang , Haonan Jiang , Yukang Huo , Yan Zhong , Jianan Wang , xue wang , Rujing Wang , Liu Liu

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Object Detection Robotics > Capabilities > Manipulation Interdisciplinary > Cognitive Science > Perception Artificial Intelligence > Core AI > Robotics Computer Vision > Domain-Specific > Robotics

Keywords

pose estimation point cloud articulated object pose regression 6d pose estimation category-level pose 6d pose rgb image cascade render render and compare articulation pose estimation render strategy

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025