BridgeVoC: Neural Vocoder with Schrödinger Bridge

Tong Lei; Zhiyu Zhang; Rilin Chen; Meng Yu; Jing LU; Chengshi Zheng; Dong Yu; Andong Li

2025 IJCAI IJCAI 2025

BridgeVoC: Neural Vocoder with Schrödinger Bridge

Abstract

While previous diffusion-based neural vocoders typically follow a noise-to-data generation pipe-line, the linear-degradation prior of the mel-spectrogram is often neglected, resulting in limited generation quality. By revisiting the vocoding task and excavating its connection with the signal restoration task, this paper proposes a time-frequency (T-F) domain-based neural vocoder with the Schrödinger Bridge, called BridgeVoC, which is the first to follow the data-to-data generation paradigm. Specifically, the mel-spectrogram can be projected into the target linear-scale domain and regarded as a degraded spectral representation with a deficient rank distribution. Based on this, the Schrödinger Bridge is leveraged to establish a connection between the degraded and target data distributions. During the inference stage, starting from the degraded representation, the target spectrum can be gradually restored rather than generated from a Gaussian noise process. Quantitative experiments on LJSpeech and LibriTTS show that BridgeVoC achieves faster inference and surpasses existing diffusion-based vocoder baselines, while also matching or exceeding non-diffusion state-of-the-art methods across evaluation metrics.

🧭 Keyword Pioneer — signal restoration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

Authors

Tong Lei , Zhiyu Zhang , Rilin Chen , Meng Yu , Jing LU , Chengshi Zheng , Dong Yu , Andong Li

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Stochastic Processes Deep Learning > Models > Diffusion Models Speech & Audio > Synthesis > Text-to-Speech

Keywords

diffusion model schrödinger bridge neural vocoder signal restoration audio generation

Download PDF

Related papers

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain 2025

Responsibility Anticipation and Attribution in LTLf 2025

Argument-based Multi-Issue Negotiation 2025

Online Resource Sharing: Better Robust Guarantees via Randomized Strategies 2025

Equitable Mechanism Design for Facility Location 2025