Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer

Yaxing Li; Eshete Derb Emiru; Shengwu Xiong; Anna Zhu; Pengfei Duan; Yichang Li

2018 INTERSPEECH INTERSPEECH 2018

Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer

Abstract

This paper presents a multi-frame quantization of line spectral frequency (LSF) parameters using a deep autoencoder (DAE) and pyramid vector quantizer (PVQ). The object is to provide sophisticated LSF quantization for the ultra-low bit rate speech coders with moderate delay. For the compression and de-correlation of multiple LSF frames, a DAE possessing linear coder-layer units with Gaussian noise is used. The DAE demonstrates a high degree of modelling flexibility for multiple LSF frames. To quantize the coder-layer vector effectively, a PVQ is considered. Comparing the discrete cosine model (DCM), the DAE-based compression shows better modelling accuracy of multi-frame LSF parameters and possesses an advantage in that the coder-layer dimensions could be any value. The compressed coder-layer dimensions of the DAE govern the trade-off between the modelling distortion and the coder-layer quantization distortion. The experimental results show that the proposed algorithm with determined optimal coder-layer dimension outperforms the DCM-based multi-frame LSF quantization approach in terms of spectral distortion (SD) performance and robustness across different speech segments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — spectral distortion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Yaxing Li , Eshete Derb Emiru , Shengwu Xiong , Anna Zhu , Pengfei Duan , Yichang Li

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization

Keywords

parameter quantization deep autoencoder speech coding spectral distortion pyramid vector quantizer

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018