Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding

Tobias Falke; Patrick Lehnen

2021 EMNLP EMNLP 2021

Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding

Abstract

AbstractWith counterfactual bandit learning, models can be trained based on positive and negative feedback received for historical predictions, with no labeled data needed. Such feedback is often available in real-world dialog systems, however, the modularized architecture commonly used in large-scale systems prevents the direct application of such algorithms. In this paper, we study the feedback attribution problem that arises when using counterfactual bandit learning for multi-domain spoken language understanding. We introduce an experimental setup to simulate the problem on small-scale public datasets, propose attribution methods inspired by multi-agent reinforcement learning and evaluate them against multiple baselines. We find that while directly using overall feedback leads to disastrous performance, our proposed attribution methods can allow training competitive models from user feedback.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing and Reinforcement Learning

📈 Trend Setter — Spoken Language Understanding

🧭 Keyword Pioneer — counterfactual bandit learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tobias Falke , Patrick Lehnen

Topics

Artificial Intelligence > Core AI > Agent Systems Reinforcement Learning > Methods > Multi-Agent Systems Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Agent Systems Machine Learning > Learning Types > Multi-Armed Bandits Natural Language Processing > Applications > Spoken Language Understanding

Keywords

multi-agent reinforcement learning reinforcement learning policy learning spoken language understanding multi-domain learning dialog system multi-agent system counterfactual bandit learning feedback attribution counterfactual bandit

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021