Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

Zhucheng Tu; Mengping Li; Jimmy Lin

2018 NAACL NAACL 2018

Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

Abstract

AbstractWe demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon’s Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. All virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.

🧭 Keyword Pioneer — serverless computing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Zhucheng Tu , Mengping Li , Jimmy Lin

Topics

Machine Learning > Application Areas > Efficient Computing

Keywords

serverless computing scalable architecture neural network deployment

Download PDF

Related papers

A Melody-Conditioned Lyrics Language Model 2018

Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation 2018

Automated Essay Scoring in the Presence of Biased Ratings 2018

Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input 2018

QuickEdit: Editing Text & Translations by Crossing Words Out 2018