CountSketches, Feature Hashing and the Median of Three

Kasper Green Larsen; Rasmus Pagh; Jakub Tětek

2021 ICML ICML 2021

CountSketches, Feature Hashing and the Median of Three

Abstract

In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector $v$ to a vector of dimension $(2t-1) s$, where $t, s > 0$ are integer parameters. It is known that a CountSketch allows estimating coordinates of $v$ with variance bounded by $\|v\|_2^2/s$. For $t > 1$, the estimator takes the median of $2t-1$ independent estimates, and the probability that the estimate is off by more than $2 \|v\|_2/\sqrt{s}$ is exponentially small in $t$. This suggests choosing $t$ to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant $t$. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to $O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\})$ when $t > 1$. That is, the variance decreases proportionally to $s^{-2}$, asymptotically for large enough $s$.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy

Authors

Kasper Green Larsen , Rasmus Pagh , Jakub Tětek

Topics

Machine Learning > Application Areas > Efficient Computing Mathematics & Optimization > Mathematics > Linear Algebra

Keywords

feature hashing sparse projection variance analysis count sketch

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021