How to Merge Your Multimodal Models Over Time?

Sebastian Dziadzio; Vishaal Udandarao; Karsten Roth; Ameya Prabhu; Zeynep Akata; Samuel Albanie; Matthias Bethge

2025 CVPR CVPR 2025

How to Merge Your Multimodal Models Over Time?

Abstract

Model merging combines expert models---each finetuned from a shared foundation model on diverse tasks and domains---into a single, more capable base model. However, existing model merging approaches assume all experts to be available simultaneously. In reality, new tasks and domains emerge continuously, prompting the need for a dynamic process of integrating these experts over time, which we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work: At each task, should expert training start from merged previous experts or the original base model? Should all models be merged at every time step? Which merging techniques are best suited for temporal merging? Should different strategies be used for the training initialization and deployment phases? To tackle these questions, we propose a unified framework called TIME---Temporal Integration of Model Expertise---that defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Utilizing TIME, we study temporal model merging across model sizes, tasks, and compute budgets on the large-scale FoMo-in-Flux benchmark for continual multimodal pretraining. Systematic experiments across TIME and FoMo-in-Flux allow us to arrive at several crucial key insights for temporal model merging to better understand current limits and best practices for successful model merging across time.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — temporal model merging

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sebastian Dziadzio , Vishaal Udandarao , Karsten Roth , Ameya Prabhu , Zeynep Akata , Samuel Albanie , Matthias Bethge

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Continual Learning Machine Learning > Application Areas > Model Merging

Keywords

continual learning multimodal learning foundation model temporal model merging

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025