MambaML: Exploring State Space Models for Multi-Label Image Classification

Xuelin Zhu; Jian Liu; Jiuxin Cao; Bing Wang

2025 ICCV ICCV 2025

MambaML: Exploring State Space Models for Multi-Label Image Classification

Abstract

Mamba, a selective state-space model, has recently seen widespread application across various visual tasks due to its exceptional ability to capture long-range dependencies. While promising results have been demonstrated in image classification, its potential in multi-label image classification remains underexplored. To bridge this gap, we propose a novel Mamba-based decoder, which utilizes the intrinsic attention of Mamba to aggregate visual information from image features into label embeddings, yielding label-specific visual representations. Building upon this, a MambaML framework is developed for multi-label image classification, which models the self-correlations of image features and label embeddings with bi-directional Mamba, as well as their cross-correlations with the Mamba-based decoder, allowing visual spatial relationships, label semantic dependencies, and cross-modal associations to be explored in a unified system. In this way, robust label-specific visual representations are acquired, facilitating the training of binary classifiers towards accurate label recognition. Experiments on public benchmarks suggest that our MambaML achieves performance comparable to state-of-the-art methods in multi-label image classification, while requiring fewer parameters and computational overhead.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — visual feature aggregation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xuelin Zhu , Jian Liu , Jiuxin Cao , Bing Wang

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Transformers Computer Vision > Analysis > Object Detection

Keywords

multi-label classification state space model visual feature aggregation label-specific representation

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025