## Concentration Inequalities

Letâ€™s recall Chebyshevâ€™s inequality.

**Chebyshevâ€™s Inequality:** Let \(X\) be a random variable with expectation
\(\mu=\mathbb{E}[X]\) and variance
\(\sigma^2 = \textrm{Var}[X]\). Then
for any \(k > 0\),

\[
\Pr(|X - \mu| > k \sigma) \leq \frac{1}{k^2}.
\]

Last time, we applied Chebyshevâ€™s inequality to the load balancing
problem. In particular, we showed if we assign \(n\) requests to \(n\) servers, the server with the maximum
load has \(O(\sqrt{n})\) requests with
high probability. We proved this result by applying Chebyshevâ€™s to a
particular server and then applying the union bound to get a bound on
the maximum load across all servers. Recall that we proved both
Chebyshevâ€™s inequality and the union bound with Markovâ€™s inequality. So,
if you squint, we just used Markovâ€™s inequality twice.

Today, weâ€™ll prove a stronger result that the server with the maximum
load has \(O(\log n)\) requests with
high probability. For this result, weâ€™ll need a stronger concentration
inquality than Chebyshevâ€™s.

### Improving Chebyshevâ€™s Inequality

Weâ€™ll see that Chebyshevâ€™s inequality is accurate for *some*
random variables. But, for many other random variables, the inequality
is loose.

One random variable for which Chebyshevâ€™s inequality is loose is the
normal distribution.

**Gaussian Tail Bound:** Consider a random variable
\(X\) drawn from the normal
distribution \(\mathcal{N}(\mu,
\sigma^2)\) with mean \(\mu\)
and standard deviation \(\sigma\). Then
for any \(k > 0\), \[
\Pr \left( | X - \mu | \geq k \sigma \right)
\leq 2 e^{-k^2/2}.
\]

Comparing the Gaussian tail bound to Chebyshevâ€™s inequality, we see
that the Gaussian Tail Bound is *exponentially* better. Letâ€™s see
the difference graphically in the figure below. (Notice that the
vertical access is on a logarithmic scale.) By \(10\) standard deviations above the mean,
the Gaussian tail bound gives a bound that is 18 orders of magnitude
smaller than the bound from Chebyshevâ€™s inequality!