TimeCAP: A Channel-Aware Pre-Training Framework for Multivariate Time Series Forecasting
Abstract
Abstract Amid recent advances for multivariate time series forecasting, self-supervised learning has emerged as a promising paradigm for deriving transferable knowledge from multi-domain data. Despite its effectiveness, existing approaches exhibit two critical limitations: (1) Underestimating the significance of multivariate dependencies in learning generalizable representations and (2) Failing to reconcile the complementary strengths of autoregressive and one-shot generative paradigms. In this work, we propose TimeCAP, a novel channel-aware pre-training framework that internalizes latent causal relationships among variables inherent in multi-domain data, and effectively transfers the acquired knowledge to downstream applications. Technically, we present a flexible channel-grouping learning approach, complemented by an adaptive meta-routing mechanism, enabling TimeCAP to parallel recognize intra-group local patterns while maintaining global coherence. Intra- and inter-group multivariate dependencies are captured through the self- and cross-attention with channel-aware mask, which strictly confine interactions among time-aligned, fine-grained multivariate tokens. To seamlessly unify two advanced generative paradigms, we propose a novel dynamic dual-head decoding and optimization strategy, empowering TimeCAP to leverage critical dependencies in the output series while avoiding cumulative errors over time. In the few-shot evaluation, TimeCAP achieves average MSE and MAE reductions of 11.8% and 6% over leading baselines, while also outperforming state-of-the-art models in full-shot and zero-shot settings by large margins.