2018 INTERSPEECH INTERSPEECH 2018

A New Frequency Coverage Metric and a New Subband Encoding Model, with an Application in Pitch Estimation

Abstract

The auditory filterbank has been a well-accepted and important tool for speech feature extraction. It decomposes the speech signal into subbands usually on an equivalent rectangular bandwidth frequency scale before further subband analysis and processing, such as auto-correlation and cross-correlation. However, the choice of the number of subbands and subband center frequencies for a given frequency range has been essentially empirical in the literature. Moreover, correlation of subband signals may not produce distinct peaks for feature extraction. This paper proposes a novel frequency coverage metric to calculate the required number of subbands. It also presents a new subband encoding model for correlation processing, inspired by psychoacoustic studies and statistical analysis. The proposed frequency coverage metric and the subband encoding model are applied to a pitch estimation method as an example of their possible implementations in the speech feature extraction. Compared with state-of-the-art methods, evaluation results demonstrate the benefits of the proposed methods.

🧭 Keyword Pioneer — subband encoding
🐣 Hot Topic Early Bird — frequency analysis
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors