The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Peter Kocsis; Peter Súkeník; Guillem Braso; Matthias Niessner; Laura Leal-Taixé; Ismail Elezi

2022 NIPS NeurIPS 2022

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Abstract

Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.

🧭 Keyword Pioneer — fully-connected layer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peter Kocsis , Peter Súkeník , Guillem Braso , Matthias Niessner , Laura Leal-Taixé , Ismail Elezi

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Active Learning Machine Learning > Application Areas > Knowledge Distillation

Keywords

active learning knowledge distillation convolutional neural network low-data regime fully-connected layer

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022