CAST: Evaluating Multi-Object Trackers with Context-Aware Switch and Transfer Scores
Abstract
Multi-object tracking (MOT) has been a subject of intensive research for decades. Multiple standard datasets and benchmarks have been set up, and several evaluation metrics, such as MOTA, IDF1 and HOTA. These metrics have become the de facto standard for comparing and ranking trackers on standardized datasets to measure progress. In this paper, we focus on MOTA and HOTA, and present a study of cases where these metrics' behaviors may not be desirable. In addition, we demonstrate how they might not be ideal when used as a tool to inspect a tracker's failure cases. We point out that these issues are related to the sizes of the context windows in which they measure association quality, where MOTA is too nearsighted while HOTA can be too holistic depending on the task settings. In this paper, we rethink the familiar notion of identity switches (IDSw) proposed in MOTA, and propose a generalized version of it by introducing a context window when evaluating the ID assignment choice for each detection. We show that the proposed metric, CAST, mitigates the limitations of MOTA and HOTA, and demonstrate its usefulness when diagnosing model failures through examples. Our code and toolkit will be made available at https://github.com/bkkm78/cast.