WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation

Xiaoqian Liu; Xiyan Gui; Zhengkun Ge; Yuan Ge; Chang Zou; Jiacheng Liu; Zhikang Niu; Qixi Zheng; Chen Xu; Xie Chen; Tong Xiao; Jingbo Zhu; Linfeng Zhang

2026 AAAI AAAI 2026

WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation

Abstract

Abstract Flow matching-based generative models offer a principled approach to modeling continuous-time dynamics in speech generation. However, inference is often computationally expensive due to repeated neural network evaluations required by ODE solvers. We propose WaveEx, a training-free and plug-in acceleration framework which replaces portions of ODE integration with wavelet-guided extrapolation. By leveraging the multi-scale structure of latent trajectories, WaveEx predicts future states directly in the frequency domain without additional model evaluations or architectural changes. WaveEx consistently accelerates inference across diverse speech generation tasks. The gains are especially pronounced in tasks like speech synthesis (up to 5.73× speedup) and music generation (2.75×), where flow matching plays a central role in alignment modeling and dense ODE integration. Even in tasks with simpler input-output mappings such as speech enhancement (4.55×) and voice conversion (2.75×), WaveEx still achieves notable acceleration, demonstrating the robustness and generalizability of the approach. These results highlight wavelet-guided extrapolation as a lightweight and broadly applicable alternative to full ODE solving for flow matching-based speech generation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — wavelet extrapolation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaoqian Liu , Xiyan Gui , Zhengkun Ge , Yuan Ge , Chang Zou , Jiacheng Liu , Zhikang Niu , Qixi Zheng , Chen Xu , Xie Chen , Tong Xiao , Jingbo Zhu , Linfeng Zhang

Topics

Artificial Intelligence > Core AI > Procedural Generation Machine Learning > Optimization & Theory > Optimization Deep Learning > Models > Diffusion Models

Keywords

speech synthesis flow matching inference acceleration ode solver speech generation wavelet extrapolation

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026