SIGUA: Forgetting May Make Learning with Noisy Labels More Robust

Bo Han; Gang Niu; Xingrui Yu; Quanming Yao; Miao Xu; Ivor Tsang; Masashi Sugiyama

2020 ICML ICML 2020

SIGUA: Forgetting May Make Learning with Noisy Labels More Robust

Abstract

Given data with noisy labels, over-parameterized deep networks can gradually memorize the data, and fit everything in the end. Although equipped with corrections for noisy labels, many learning methods in this area still suffer overfitting due to undesired memorization. In this paper, to relieve this issue, we propose stochastic integrated gradient underweighted ascent (SIGUA): in a mini-batch, we adopt gradient descent on good data as usual, and learning-rate-reduced gradient ascent on bad data; the proposal is a versatile approach where data goodness or badness is w.r.t. desired or undesired memorization given a base learning method. Technically, SIGUA pulls optimization back for generalization when their goals conflict with each other; philosophically, SIGUA shows forgetting undesired memorization can reinforce desired memorization. Experiments demonstrate that SIGUA successfully robustifies two typical base learning methods, so that their performance is often significantly improved.

🐣 Hot Topic Early Bird — robust learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bo Han , Gang Niu , Xingrui Yu , Quanming Yao , Miao Xu , Ivor Tsang , Masashi Sugiyama

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Risk Management Machine Learning > Learning Types > Supervised Learning Machine Learning > Learning Types > Deep Learning

Keywords

gradient descent robust learning noisy label gradient ascent

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020