Let’s recall Chebyshev’s inequality.
Chebyshev’s Inequality: Let
Last time, we applied Chebyshev’s inequality to the load balancing
problem. In particular, we showed if we assign
Today, we’ll prove a stronger result that the server with the maximum
load has
We’ll see that Chebyshev’s inequality is accurate for some random variables. But, for many other random variables, the inequality is loose.
One random variable for which Chebyshev’s inequality is loose is the normal distribution.
Gaussian Tail Bound: Consider a random variable
Comparing the Gaussian tail bound to Chebyshev’s inequality, we see
that the Gaussian Tail Bound is exponentially better. Let’s see
the difference graphically in the figure below. (Notice that the
vertical access is on a logarithmic scale.) By
Based on how loosely Chebyshev’s inequality bounds the Gaussian distribution, we might suspect that Chebyshev’s inequality is loose in general. But, there are examples of random variables where Chebyshev’s inequality gives the exactly the right probabilities.
Example Random Variable: Fix a particular value
We should first check that
Then Chebyshev’s inequality tells us that
While Chebyshev’s inequality is tight for some random variables, it is loose for many other random variables. We may therefore suspect that if we make stronger assumptions on the random variables, we can get better concentration inequalities. The central limit theorem gives us a hint about what type of random variables we should consider.
Central Limit Theorem: Any sum of mutually
independent and identically distributed random variables
By linearity of expectation and linearity of variance, we know what the mean and variance of the sum should be. The interesting part of the central limit theorem is that the sum converges to a Gaussian distribution.
We stated the central limit theorem for random variables that are identically distributed so that we could cleanly describe the expectation and variance of the sum. But the central limit theorem also holds for random variables that are not identically distributed.
For the central limit theorem to hold, we assumed that the random
variables are mutually independent. Recall that
Let’s consider the central limit in the context of the coin flip
example. The figure below shows how closely the sum of
Using Chebyshev’s inequality, we showed that if we flip a fair coin
100 times, the probability we see fewer than 30 or more than 70 heads is
at most
Luckily, there are concentration inequalities that allow us to formally get exponentially small probability bounds.
There are many exponential concentration inequalities. Each one makes a different set of assumptions on the random variable and, depending on how stringent the assumptions, gives a different bound. Sometimes the bounds are stated with multiplicative error and sometimes the bounds are stated with additive error. The trick is determining which bound is appropriate for the application at hand. We’ll state three exponential concentration inequalities today but there are many more. If you’re having trouble finding an appropriate bound for your case, Wikipedia is your friend.
Chernoff Bound: Let
Notice that the first inequality gives an upper bound on the
pobability that
Using union bound, we can combine them into a single inequality.
Looking back, we may realize that the last expression looks similar
to the Gaussian tail bound. This is because
The Chernoff bound may seem overly restrictive because we require
that each variable is binary. The Bernstein inequality relaxes the
assumption so that we can consider random variables defined on the
inverval from
Bernstein Inequality: Let
Of course, while weaker than Chernoff bound, the assumption in the Bernstein bound is still not very general. Hoeffding’s inequality relaxes it further.
Hoeffding’s Inequality: Let
Notice that
We won’t see the proofs of these concentration inequalities. The
general techniques are to apply Markov’s inequality to cleverly chosen
random variables. Recall that we proved Chebyshev’s inequality by
applying Markov’s inequality to the random variable
Now that we have rigorous exponential concentration inequalities, let’s apply them to the coin flip problem.
Biased Coin Bound: Let the probability that a coin
lands heads be
Proof: We use the Chernoff bound because the number
of heads is a sum of binary random variables. From the Chernoff bound,
we know that for any
Notice that we have a very gentle dependence on
Let’s apply our new tool to the load balancing problem. Recall we
randomly assigned
We defined
Using Chebyshev’s, we got
Consider a single server
So the server with the maximum load has at most
In practice, there’s another strategy for assigning jobs to servers
that works surprisingly well. The idea is called the power of two
choices. Instead of assigning a job to a random server, we choose
two random servers and assign the job to the least loaded server. With
probability
In the rest of the class, we’ll discuss a randomized algorithm called fingerprinting. But first, we’ll revisit universal hash functions and explore prime numbers.
Universal Hash Functions: Consider a random hash
function
Recall that we can efficiently construct a universal hash function
with the following approach. Let
We won’t prove that
Notice that once we have a prime number, we only have to store
Finding a prime number seems pretty tough so let’s consider the
simpler problem of checking whether a number is prime. Given a number
Suppose we have an integer represented as a length
Fortunately, there is a much faster algorithm. In papers published in
1976 and 1980, Miller and Rabin presented a randomized algorithm that
runs in time
Why was it such a big deal to get an efficient algorithm for checking if a number is prime? Well, one big reason is that prime numbers form the basis for modern public key cryptography. In cryptography, we imagine there are two parties, Alice and Bob, that communicate with each other. Bob wants to send Alice a message so that only Alice can read it. If someone else intercepts it, the message should be unreadable.
The obvious way for Alice and Bob to communicate securely is to share a secret key in advance. This is the approach that persisted for centuries. However, physically meeting to share a secret key is impractical if there are many senders and receivers.
A more clever way for Alice and Bob to communicate securely is to use a lock box. The lock box has the property that anyone can deliver mail but only Alice can read the mail.
The way we implement the lock box in practice is with RSA encryption.
The idea is to have a private key and a public key. The private key
consists of two very large prime numbers
The challenge of RSA encryption is to find two large primes. This is the same problem we faced when we wanted to construct a hash function!
Here’s a naive algorithm for finding primes: Pick a random large
number
Prime Number Theorem: Let
This is somewhat surprising because as numbers get larger, there are more smaller numbers that could be their factors.
Let’s plot the number of primes and the bound in the prime number
thoerem. (Note the upper bound only makes sense for
The prime number theorem tells us that if we select a random 128 bit
number
After a few hundred tries, we will almost definitely find a prime
number. In general, we need
In the remainder of the class, we’ll discuss a simple but important application of prime numbers to hashing.
Our goal is to construct a compact “fingerprint”
The fingerprints
If the contents of
Fingerprinting is useful for quickly checking if two versions of the same file are identical. This is quite helpful for version control on systems like Git. The advantage is that we do not need to communicate the entire file between the server and local computer to perform the check; we only need to communicate the small fingerprint.
Another application is to check if two images are identical. This is useful in contexts where we want to remove duplicate pictures. However, if the images are changed at all (e.g. compressed or converted to a different format), the fingerprints will be different. In a later class, we’ll see a method which is robust to these changes.
The approach for we’ll learn about today is called a Rabin
fingerprint. Let the file
We’ll construct the fingerprint function
Let’s analyze this fingerprint function.
Claim: If
Since our fingerprint only takes
Observe that if
We’ll analyze the chance that
The first step is to upper bound the number of distinct prime factors
of
The second step is to lower bound the number of primes less than
Let’s see how much space we need for the fingerprint in practice. Set