2025 ICCV ICCV 2025

Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition

Abstract

Learning fine-grained representations from coarse labels for fine-grained visual recognition (FGVR) is a challenging yet valuable task, as it alleviates the reliance on labor-intensive fine-grained annotations. Early approaches focused primarily on minimizing intra-fine-grained-class variation but overlooked inter-fine-grained-class separability, resulting in limited FGVR performance. Subsequent studies employed a top-down paradigm to enhance separability via deep clustering, yet these methods require predefining the number of fine-grained classes, which is often impractical to obtain. Here, we introduce a bottom-up learning paradigm that constructs a hierarchical dendrogram by iteratively merging similar instances/clusters, inferring higher-level semantics from lowest-level instances without predefining class numbers. Leveraging this, we propose BuCSFR, a novel method that integrates a Bottom-up Construction (BuC) module to build the dendrogram based on a minimal information loss criterion, and a Separable Fine-grained Representation (SFR) module that treats dendrogram nodes as pseudo-labels to ensure representation separability. The synergistic interaction between these modules enables iterative enhancement, grounded theoretically in the Expectation-Maximization (EM) framework. Extensive experiments on five benchmark datasets demonstrate the superiority of our approach, showcasing its effectiveness in learning separable representations for FGVR. The source code is available at: https://github.com/BeCarefulOfYournaoke/BuCSFR.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning
🧭 Keyword Pioneer — dendrogram construction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio