SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment

Yanxiao Sun; Jiafu Wu; Yun Cao; Chengming Xu; Yabiao Wang; Weijian Cao; Donghao Luo; Chengjie Wang; Yanwei Fu

2026 AAAI AAAI 2026

SwiftVideo: A Unified Framework for Few-Step Video Generation Through Trajectory-Distribution Alignment

Abstract

Abstract Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts in few-step settings. To address these limitations, we propose SwiftVideo, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, We propose a dual-perspective alignment encompassing distribution alignment between synthetic and real data along with trajectory alignment across different inference steps. Our method maintains high-quality video generation while substantially reducing the number of inference steps. Quantitative evaluations on the OpenVid-1M benchmark demonstrate that our method significantly outperforms existing approaches in few-step video generation.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yanxiao Sun , Jiafu Wu , Yun Cao , Chengming Xu , Yabiao Wang , Weijian Cao , Donghao Luo , Chengjie Wang , Yanwei Fu

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Models > Diffusion Models Computer Vision > Generation > Video Generation

Keywords

video generation model distillation distribution matching diffusion model few-step generation trajectory alignment

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026