2016 INTERSPEECH INTERSPEECH 2016

Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language

Abstract

We use a feedforward Convolutional Neural Network to classify speakers’ native language for the INTERSPEECH 2016 Computational Paralinguistic Challenge Native Language Sub-Challenge, using no specialized features for computational paralinguistics tasks, but only MFCCs with their first and second order deltas. In addition, we augment the training data by replacing the original examples with shorter overlapping samples extracted from them, thus multiplying the number of training examples by almost 40. With the augmented training dataset and enhancements to neural network models such as Batch Normalization, Dropout, and Maxout activation function, we managed to improve upon the challenge baseline by a large margin, both for the development and the test set.

🚀 Conference Pioneer — INTERSPEECH 2016
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — batch normalization
🐣 Hot Topic Early Bird — data augmentation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio