Free Lunch Enhancements for Multi-modal Crowd Counting

Haoliang Meng; Xiaopeng Hong; Zhengqin Lai; Miao Shang

2025 CVPR CVPR 2025

Free Lunch Enhancements for Multi-modal Crowd Counting

Abstract

This paper addresses multi-modal crowd counting with a novel `free lunch' training enhancement strategy that requires no additional data, parameters, or increased inference complexity. First, we introduce a cross-modal alignment technique as a plug-in post-processing step for the pre-trained backbone network, enhancing the model's ability to capture shared information across modalities. Second, we incorporate a regional density supervision mechanism during the fine-tuning stage, which differentiates features in regions with varying crowd densities. Extensive experiments on three multi-modal crowd counting datasets validate our approach, making it the first to achieve an MAE below 10 on RGBT-CC. The code is available at https://github.com/HenryCilence/Free-Lunch-Multimodal-Counting.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — rgbt dataset

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoliang Meng , Xiaopeng Hong , Zhengqin Lai , Miao Shang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Regression Machine Learning > Application Areas > Efficient Computing Computer Vision > Analysis > Object Detection Deep Learning > Learning Types > Multi-Modal Learning

Keywords

feature extraction density estimation multimodal learning multi-modal learning cross-modal alignment crowd counting rgbt dataset

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025