2019 ICML ICML 2019

Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

Abstract

The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this โ€™Gumbel-Top-$k$โ€™ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Machine Learning and Mathematics & Optimization
๐Ÿงญ Keyword Pioneer โ€” sequence sampling
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning