2021 ACL ACL 2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers