2025 CVPR CVPR 2025

Towards Universal Soccer Video Understanding

Abstract

As a globally celebrated sport, soccer has attracted widespread interest from fans over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding.Specifically, we make the following contributions in this paper:(i) we introduce **SoccerReplay-1988**, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline;(ii) we present the first visual-language foundation model in the soccer domain, **MatchVision**, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks;(iii) we conduct extensive experiments and ablation studies on action classification, commentary generation, and multi-view foul recognition,and demonstrate state-of-the-art performance on all of them, substantially outperforming existing models, which has demonstrated the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research. The code and model will be publicly available for reproduction.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio