Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning

Can Kucuksozen; Yucel Yemez

2025 CVPR CVPR 2025

Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning

Abstract

We propose the Compact Clustering Attention (COCA) layer, an effective building block that introduces a hierarchical strategy for object-centric representation learning, while solving the unsupervised object discovery task on single images. COCA is an attention-based clustering module capable of extracting object-centric representations from multi-object scenes, when cascaded into a bottom-up hierarchical network architecture, referred to as COCA-Net. At its core, COCA utilizes a novel clustering algorithm that leverages the physical concept of compactness, to highlight distinct object centroids in a scene, providing a spatial inductive bias. Thanks to this strategy, COCA-Net generates high-quality segmentation masks on both the decoder side and, notably, the encoder side of its pipeline. Additionally, COCA-Net is not bound by a predetermined number of object masks that it generates and handles the segmentation of background elements better than its competitors. We demonstrate COCA-Net's segmentation performance on six widely adopted datasets, achieving superior or competitive results against the state-of-the-art models across nine different evaluation metrics.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — attention-based clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Can Kucuksozen , Yucel Yemez

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Architectures > Transformers Computer Vision > Processing > Image Segmentation Computer Vision > Analysis > Object Segmentation Deep Learning > Learning Types > Unsupervised Learning Deep Learning > Techniques > Attention Mechanism

Keywords

unsupervised learning transformer architecture image segmentation attention mechanism instance segmentation hierarchical network clustering algorithm object-centric learning object-centric representation spatial inductive bia unsupervised object discovery attention-based clustering

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025