2023
EMNLP
EMNLP 2023
SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks
Abstract
AbstractWe introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning
🧭
Keyword Pioneer
— dynamic width routing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Application Areas > Efficient Computing
Deep Learning > Architectures > Transformers
Deep Learning > Techniques > Model Architecture
Machine Learning > Application Areas > Model Compression
Artificial Intelligence > Core AI > Efficient Computing
Deep Learning > Optimization & Theory > Efficient Computing