## Annals of Useless Statistics Theorems

When I learn of statistics theorems I often try to plug some numbers in to see roughly what kind of error rate is attained or how large of a sample size is necessary to get good results. I think this is a good practice for understanding the limitations of the theory. The title of this post may seem inflammatory, but I should clarify that I don’t think “useless” things have no value. A theorem which is practically useless may still be interesting for the mathematical techniques involved. It may show the way to refinements that become increasingly useful. Or it may suggest that a procedure is not unreasonable, and further empirical study can analyze the finite sample performance of said procedure.

All that being said, I also think that because mathematics is more technically difficult than empirical study it gets a bit too much prestige and respect. So, I intend to highlight some examples of theoretical results which are considered high impact, but which are severely limited in their practical usefulness.

My first example comes from a series of recent papers appearing in what is considered the most prestigious journal of statistical theory, the Annals of Statistics. These papers (referenced below) apply Stein’s method type arguments to obtain explicit bounds on the approximation error caused by assuming non-Gaussian data is actually Gaussian. The methods would be practically useful if the error bounds were small enough…

The papers are so technical that the forms of the error bounds are very complicated, so I’m referencing a version in a tutorial by Larry Wasserman instead of the more complicated version in the original papers. If there are $n$ (i.i.d.) data points in $d$-dimensional space, and we approximate the distance of their sample mean from the true mean (and take the maximum over all coordinates) by assuming the corresponding probability can be computed from a multivariate Gaussian, then the absolute error between these two probabilities $P_1$ and $P_2$ is upper bounded by

$|P_1 - P_2| \leq \left( \log(dn)^7 / n \right)^{1/8}$

Here’s where it gets useless. Suppose we want the error to be less or equal to 1/10, and we are only in $d = 2$ dimensions. How large does $n$ have to be? Over $4.2 \times 10^{19}$.

The power of these theorems is that they give asymptotic results for when $d$ is increasing with $n$. So for example, suppose $d = n^{1/10}$. Then the error bound is less or equal to 1/10 if $n \geq 4.2 \times 10^{19}$, and the corresponding value of $d \approx 91$.

Again, the theory in these papers is beautiful and probably setting a good direction for further inquiry. But we desperately need to supplement impressive looking math with more empirical study, and perhaps make it the norm for such empirical study to be included in the theoretical papers that make it into the top journals.

References

Larry Wasserman, Stein’s Method and The Bootstrap in Low and High Dimensions: A Tutorial, http://www.cs.cmu.edu/~aarti/SMLRG/Stein.pdf

Victor Chernozhukov, Denis Chetverikov, Kengo Kato (2012). Central Limit Theorems and Multiplier Bootstrap when p is much larger than n. http://arxiv.org/abs/1212.6906

Victor Chernozhukov, Denis Chetverikov, Kengo Kato (2013). Comparison and anticoncentration bounds for maxima of Gaussian random vectors. http://arxiv.org/abs/1301.4807.