MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification

Amr Bakry; Ahmed Elgammal

2013 CVPR CVPR 2013

MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification

Abstract

Visual speech recognition is a challenging problem, due to confusion between visual speech features. The speaker identification problem is usually coupled with speech recognition. Moreover, speaker identification is important to several applications, such as automatic access control, biometrics, authentication, and personal privacy issues. In this paper, we propose a novel approach for lipreading and speaker identification. We propose a new approach for manifold parameterization in a low-dimensional latent space, where each manifold is represented as a point in that space. We initially parameterize each instance manifold using a nonlinear mapping from a unified manifold representation. We then factorize the parameter space using Kernel Partial Least Squares (KPLS) to achieve a low-dimension manifold latent space. We use two-way projections to achieve two manifold latent spaces, one for the speech content and one for the speaker. We apply our approach on two public databases: AVLetters and OuluVS. We show the results for three different settings of lipreading: speaker independent, speaker dependent, and speaker semi-dependent. Our approach outperforms for the speaker semi-dependent setting by at least 15% of the baseline, and competes in the other two settings.

🚀 Conference Pioneer — CVPR 2013

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Speech & Audio

📈 Trend Setter — Speech Recognition

🧭 Keyword Pioneer — speaker identification

🐣 Hot Topic Early Bird — latent space

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Amr Bakry , Ahmed Elgammal

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Speech & Audio > Recognition > Speech Recognition Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

manifold learning visual speech recognition latent space kernel partial least square speaker identification

Download PDF

Related papers

Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques 2013

An Approach to Pose-Based Action Recognition 2013

Modeling Actions through State Changes 2013

A Convex Regularizer for Reducing Color Artifact in Color Image Recovery 2013

Deformable Spatial Pyramid Matching for Fast Dense Correspondences 2013