Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun; Soumya Batra; Vikas Bhardwaj; Ashwini Challa; Pinar Donmez; Peyman Heidari; Hakan Inan; Shashank Jain; Anuj Kumar; Shawn Mei; Karthik Mohan; Michael White

2020 COLING COLING 2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Abstract

AbstractNatural language generation (NLG) is a critical component in conversational systems, owing to its role of formulating a correct and natural text response. Traditionally, NLG components have been deployed using template-based solutions. Although neural network solutions recently developed in the research community have been shown to provide several benefits, deployment of such model-based solutions has been challenging due to high latency, correctness issues, and high data needs. In this paper, we present approaches that have helped us deploy data-efficient neural solutions for NLG in conversational systems to production. We describe a family of sampling and modeling techniques to attain production quality with light-weight neural network models using only a fraction of the data that would be necessary otherwise, and show a thorough comparison between each. Our results show that domain complexity dictates the appropriate approach to achieve high data efficiency. Finally, we distill the lessons from our experimental findings into a list of best practices for production-level NLG model development, and present them in a brief runbook. Importantly, the end products of all of the techniques are small sequence-to-sequence models (~2Mb) that we can reliably deploy in production. These models achieve the same quality as large pretrained models (~1Gb) as judged by human raters.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — production deployment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ankit Arun , Soumya Batra , Vikas Bhardwaj , Ashwini Challa , Pinar Donmez , Peyman Heidari , Hakan Inan , Shashank Jain , Anuj Kumar , Shawn Mei , Karthik Mohan , Michael White

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Efficient Computing Deep Learning > Models > Generative Models

Keywords

model compression knowledge distillation natural language generation data efficiency sequence-to-sequence model production deployment

Download PDF

Related papers

Persuasiveness of News Editorials depending on Ideology and Personality 2020

A Graph Representation of Semi-structured Data for Web Question Answering 2020

Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations 2020

Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism 2020

End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network 2020