Mitigating Transformer Overconfidence via Lipschitz Regularization

Wenqian Ye; Yunsheng Ma; Xu Cao; Kun Tang

2023 UAI UAI 2023

Mitigating Transformer Overconfidence via Lipschitz Regularization

Abstract

Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy

Authors

Wenqian Ye , Yunsheng Ma , Xu Cao , Kun Tang

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Theory

Keywords

optimistic exploration generalized linear model no-regret algorithm contextual reinforcement learning contextual mdp

Download PDF

Related papers

Memory Mechanism for Unsupervised Anomaly Detection 2023

Semi-supervised learning of partial differential operators and dynamical flows 2023

Composing Efficient, Robust Tests for Policy Selection 2023

Inference for mark-censored temporal point processes 2023

Increasing effect sizes of pairwise conditional independence tests between random vectors 2023