2025 CVPR CVPR 2025

How to Merge Your Multimodal Models Over Time?

Abstract

Model merging combines expert models---each finetuned from a shared foundation model on diverse tasks and domains---into a single, more capable base model. However, existing model merging approaches assume all experts to be available simultaneously. In reality, new tasks and domains emerge continuously, prompting the need for a dynamic process of integrating these experts over time, which we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work: At each task, should expert training start from merged previous experts or the original base model? Should all models be merged at every time step? Which merging techniques are best suited for temporal merging? Should different strategies be used for the training initialization and deployment phases? To tackle these questions, we propose a unified framework called TIME---Temporal Integration of Model Expertise---that defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Utilizing TIME, we study temporal model merging across model sizes, tasks, and compute budgets on the large-scale FoMo-in-Flux benchmark for continual multimodal pretraining. Systematic experiments across TIME and FoMo-in-Flux allow us to arrive at several crucial key insights for temporal model merging to better understand current limits and best practices for successful model merging across time.

The Questioner
🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer — temporal model merging
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio