2026 WACV WACV 2026

SmoothDiffusion-VE: Real-time Generative Video Editing Using Adaptive Feature Cache

Abstract

Video editing with diffusion models presents significant challenges, especially under real-time constraints. Current methods either enhance temporal consistency at the cost of slow processing or rely on frame-by-frame editing, leading to flickering and temporal artifacts. To address both challenges, we propose SmoothDiffusion-VE, a streaming-based editing approach that improves temporal consistency and processing speed through our proposed Adaptive Feature Cache (AFC) and motion-guided attention. The AFC dynamically adjusts the caching behavior based on perceptual similarity (LPIPS) between frames, i.e., shifting to a mini-cache mode for similar frames to reduce computational load. Conversely, significant frame changes trigger deeper caching to maintain robust temporal coherence. Our motion-guided attention selectively focuses on dynamic regions using optical flow, reducing unnecessary computations in static areas and accelerating processing. SmoothDiffusion-VE can run 28 FPS on one RTX 4090 GPU, achieving a 1564xspeedup over Plug-and-Play Diffusion (PNP) and a 1916xspeedup over Diffusion Motion Transfer (DMT), delivering a powerful solution for fast and consistent video editing.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio