Hierarchical spike coding of sound

Yan Karklin; Chaitanya Ekanadham; Eero P. Simoncelli

2012 NIPS NeurIPS 2012

Hierarchical spike coding of sound

Abstract

We develop a probabilistic generative model for representing acoustic event structure at multiple scales via a two-stage hierarchy. The first stage consists of a spiking representation which encodes a sound with a sparse set of kernels at different frequencies positioned precisely in time. The coarse time and frequency statistical structure of the first-stage spikes is encoded by a second stage spiking representation, while fine-scale statistical regularities are encoded by recurrent interactions within the first-stage. When fitted to speech data, the model encodes acoustic features such as harmonic stacks, sweeps, and frequency modulations, that can be composed to represent complex acoustic events. The model is also able to synthesize sounds from the higher-level representation and provides significant improvement over wavelet thresholding techniques on a denoising task.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

📈 Trend Setter — Prosody Analysis

🧭 Keyword Pioneer — hierarchical audio coding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🐣 Hot Topic Early Bird — sparse coding

Authors

Yan Karklin , Chaitanya Ekanadham , Eero P. Simoncelli

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Models > Generative Models Speech & Audio > Analysis > Prosody Analysis Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Speech & Audio > Analysis > Speech Analysis Speech & Audio > Synthesis > Speech Synthesis

Keywords

sparse coding sparse representation speech processing probabilistic generative model hierarchical model spike coding hierarchical audio coding acoustic event modeling hierarchical spike coding acoustic event representation sound synthesis generative model spiking neural network speech denoising audio denoising acoustic event wavelet thresholding

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012