Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Stanislas Ducotterd; Alexis Goujon; Pakshal Bohra; Dimitris Perdios; Sebastian Neumayer; Michael Unser

2024 JMLR JMLR 2024

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Abstract

Lipschitz-constrained neural networks have several advantages over unconstrained ones and can be applied to a variety of problems, making them a topic of attention in the deep learning community. Unfortunately, it has been shown both theoretically and empirically that they perform poorly when equipped with ReLU activation functions. By contrast, neural networks with learnable 1-Lipschitz linear splines are known to be more expressive. In this paper, we show that such networks correspond to global optima of a constrained functional optimization problem that consists of the training of a neural network composed of 1-Lipschitz linear layers and 1-Lipschitz freeform activation functions with second-order total-variation regularization. Further, we propose an efficient method to train these neural networks. Our numerical experiments show that our trained networks compare favorably with existing 1-Lipschitz neural architectures. [abs] [ pdf ][ bib ] [ code ] © JMLR 2024. (edit, beta)

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Stanislas Ducotterd , Alexis Goujon , Pakshal Bohra , Dimitris Perdios , Sebastian Neumayer , Michael Unser

Topics

Deep Learning > Architectures > Neural Networks

Keywords

neural network architecture relu activation activation function total variation regularization lipshitz constraint

Download PDF

Related papers

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks 2024

Convergence for nonconvex ADMM, with applications to CT imaging 2024

Functional Directed Acyclic Graphs 2024

Sum-of-norms clustering does not separate nearby balls 2024

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning 2024