Do We Need Large VLMs for Spotting Soccer Actions?

Ritabrata Chakraborty; Rajatsubhra Chakraborty; Avijit Dasgupta; Sandeep Chaurasia

2025 AACL AACL 2025

Do We Need Large VLMs for Spotting Soccer Actions?

Abstract

AbstractTraditional video-based tasks like soccer action spotting rely heavily on visual inputs, often requiring complex and computationally expensive models to process dense video data. We propose a shift from this video-centric approach to a text-based task, making it lightweight and scalable by utilizing Large Language Models (LLMs) instead of Vision-Language Models (VLMs). We posit that expert commentary, which provides rich descriptions and contextual cues contains sufficient information to reliably spot key actions in a match. To demonstrate this, we employ a system of three LLMs acting as judges specializing in outcome, excitement, and tactics for spotting actions in soccer matches. Our experiments show that this language-centric approach performs effectively in detecting critical match events coming close to state-of-the-art video-based spotters while using zero video processing compute and similar amount of time to process the entire match.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — soccer action spotting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ritabrata Chakraborty , Rajatsubhra Chakraborty , Avijit Dasgupta , Sandeep Chaurasia

Topics

Artificial Intelligence > Core AI > Game AI Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Trajectory Prediction Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Generation > Language Modeling Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Machine Reading Comprehension Natural Language Processing > Applications > Question Answering

Keywords

multimodal learning trajectory prediction event detection vision-language model zero-shot classification video processing action detection large language model expert commentary action spotting text-based approach soccer action spotting soccer action soccer match

Download PDF

Related papers

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge 2025

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems 2025

Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection 2025

CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts 2025

A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics 2025