Kolmogorov-Smirnov statistic

Kolmogorov-Smirnov statistic

Consider any distribution DD on R\mathbb{R}, and CDF F(t)=PxD(xt)F(t) = \mathbb{P}_{x \sim D}(x \leq t).

Let X=(xj)i[n]X = (x_j)_{i \in [n]} be nn samples drawn from DD.

Def: The empirical CDF (eCDF) of XX is defined as F^n(t)=1nj=1n1xjt\hat{F}_n(t) = \frac{1}{n} \sum_{j=1}^n \mathbb{1}_{x_j \leq t}.

Def: The Kolmogorov-Smirnov statistic is defined as Dn=suptR{F^n(t)F(t)}D_n = \sup_{t \in \mathbb{R}} \{|\hat{F}_n(t) - F(t)|\}.

Theorem [DKW, 1956]: P(Dn>ϵ)ce2nϵ2\mathbb{P}(D_n > \epsilon) \leq c e^{-2n\epsilon^2}, where cc is a constant.

Theorem [Massart, 1990]: P(Dn>ϵ)2e2nϵ2\mathbb{P}(D_n > \epsilon) \leq 2 e^{-2n\epsilon^2}.

Theorem [Harvey, 2020]: P(Dn>ϵ)4ϵe12nϵ2\mathbb{P}(D_n > \epsilon) \leq \frac{4}{\epsilon} e^{-\frac{1}{2}n\epsilon^2}.

  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2021-2026 Xue Yu
  • Visitors: | Views: