2023 COLT COLT 2023

Testing of Index-Invariant Properties in the Huge Object Model

Abstract

Distribution testing is a central part of property testing, with applications to various research areas, such as computational and statistical learning, information theory, and probabilistic program checking. The original distribution testing model relies on samples drawn independently from the distribution to be tested. However, when the distribution in question is over the $n$-dimensional Hamming cube $\left\{0,1\right\}^{n}$ for a large $n$, even reading a few samples is infeasible. To address this, Goldreich and Ron [ITCS 2022] have defined a model called the \emph{huge object model}, in which the samples may only be queried in a few places.For any sample/query model, the following three questions are considered fundamental: {\bf (i)} understand what classes of objects can be “learned \emph{easily}", {\bf (ii)} characterize {\em testable properties}, that is, properties that can be tested in the given sample/query model using a constant number of samples/queries, and {\bf (iii)} understand the {\em gap} between {\em adaptive} and {\em non-adaptive} query/sample complexities.In this work, we study these questions for the huge object model for distribution testing. To do so, we initiate a study of a general class of distribution properties that are invariant under a permutation of the indices of the vectors in $\left\{0,1\right\}^{n}$, while still not being necessarily fully symmetric as per the definition used in traditional distribution testing.We prove that every distribution over $\left\{0,1\right\}^{n}$ whose support has a bounded VC-dimension can be efficiently learned up to a permutation. The number of queries made by the algorithm depends only on the VC-dimension of the support of the distribution and is independent of $n$. This gives efficient testers for index-invariant distribution properties that admit a global VC-dimension bound. To complement this result, we argue that satisfying only index-invariance or only a VC-dimension bound is insufficient to g

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer — huge object model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy