2017
INTERSPEECH
INTERSPEECH 2017
Backstitch: Counteracting Finite-Sample Bias via Negative Steps
Abstract
In this paper we describe a modification to Stochastic Gradient Descent (SGD) that improves generalization to unseen data. It consists of doing two steps for each minibatch: a backward step with a small negative learning rate, followed by a forward step with a larger learning rate. The idea was initially inspired by ideas from adversarial training, but we show that it can be viewed as a crude way of canceling out certain systematic biases that come from training on finite data sets. The method gives ~ 10% relative improvement over our best acoustic models based on lattice-free MMI, across multiple datasets with 100–300 hours of data.
🧭
Keyword Pioneer
— finite sample bia
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio