2021 IJCNLP IJCNLP 2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers