# Talagrand's concentration inequality

In this post we basically go over a proof of Talagrand's concentration inequality, following material from Bela Bollobas' part III course on probabilistic combinatorics. Things can get rather messy, so hopefully writing this will force me to properly understand it.

As is usual for concentration inequalities, our setting is a product probability space $ (\Omega,\mathcal{F},\mathop{\rm \mathbf{Pr}}\nolimits) = \prod_{i=1}^{n}(\Omega_i,\mathcal{F}_i, \mathop{\rm \mathbf{Pr}}\nolimits_{i})$. We want to say something of the form `if we take a substantially probable set and we blow it up by a tiny bit, we get almost all the mass. So we need some notion of distance on $ \Omega$; we're going to take one that seems rather weird at first (and yet turns out to exploit certain properties of many natural examples better than other concentration inequalities), so bear with me!

We will base the distance only on the pattern in which two points' coordinates are equal or different. Given $ x\in\Omega$, we define a map $ \delta_x:\Omega\to \{0,1\}^n$ which measures how our points differ from $ x$: \begin{align*} (\delta_x(y))_i = \begin{cases} 1, &\text{ if }y_i\neq x_i\\ 0, &\text{ otherwise} \end{cases} \end{align*} Next, we define the set $ \delta_x(A)\subseteq\{0,1\}^n$ in the obvious way as $ \left\{ \delta_x(y) \ \big| \ y\in A\right\} $. Now let $ U_x(A)$ be the up-set of $ \delta_x(A)$ viewed as a subset of the unit cube via $ \{0,1\}^n\hookrightarrow [0,1]^n$, i.e. the set \begin{align*} U_x(A) = \left\{ (y_1,\ldots,y_n)\in [0,1]^n \big| \exists (x_1,\ldots,x_n)\in\delta_x(A) \text{ s.t. } x_1\leq y_1,\ldots,x_n\leq y_n\right\} \end{align*} And next we define \begin{align*} T_x(A) = \text{the convex hull of $ U_x(A)$} \end{align*} and let the **Talagrand distance** from $ x$ to $ A$ be \begin{align*} d_T(x,A) = d(\sigma_x(x),T_x(A)) = \displaystyle\min_{v\in T_x(A)} \left\| v \right\|, \end{align*} i.e. the normal Euclidean distance from $ 0$ to the set $ T_x(A)$.

As with the tons of other concentration inequalities we've seen (haha I wish), we're after something of the form \begin{align*} \mathop{\rm \mathbf{E}}\nolimits_{x\in\Omega}\left[\exp\left( f(d_T(x,A) \right) \right] = \int_{\Omega}^{} \exp\left( f(d_T(x,A) \right) \, dx\leq \frac{1}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A\right]} \end{align*} for some increasing function $ f$. We want to do it by induction, by writing the integral as the integral of something over $ \Omega_1\times\ldots\times\Omega_{n-1}$, so the question arises how that crazy distance we defined behaves with respect to restrictions.

So let's fix $ x$ for now. The idea we're going to use is to bound $ d_T(x,A)$ from above by replacing $ T_x(A)$ from the definition of the Talagrand distance with a certain subset of $ T_x(A)$ which is essentially determined by what's going on in $ \Omega_1\times\ldots\Omega_{n-1}$; it will turn out to be a set which is the convex hull of two $ n-1$-dimensional subsets of the cube.

So let's cut off that last coordinate and see what happens. Let $ x=(x',\omega_n)$ where $ \omega\in\Omega_n$, and let $ A|_{\omega}$ denote the slice of $ A$ at $ \omega$, i.e. $ \left\{ y \ \big| \ (y,\omega)\in A\right\}$.

- The first thing to observe is that when the last coordinates match, we get a set in $ T_x(A)$ near the bottom of the cube. Indeed, \begin{align*} \left(\delta_{x'}(A|_\omega),0\right)\subseteq \delta_x(A|_\omega)\subseteq \delta_x(A) \end{align*} where the first inclusion holds precisely because $ x$ and anything in $ A|_\omega$ have the same last coordinate, and the second inclusion is by definition. Since taking upsets distributes over the coordinates, we in particular get $ \left(U_{x'}(A|_\omega),0\right)\subseteq U_x(A)$, and then taking convex hulls we end up with \begin{align*} (T_{x'}(A|_\omega),0)\subseteq T_x(A) \end{align*}
- The second thing to observe is that when the last coordinates don't match, we at least have some subset of $ T_x(A)$ at the top of the cube. Indeed, let $ \pi(A)= \cup_{\omega\in\Omega_n}^{} A|_\omega$ be the projection of $ A$ that forgets the last coordinate. Then certainly \begin{align*} \left(\delta_{x'}(\pi(A)),1\right)\subseteq U_x(A) \end{align*} because for every $ y\in\pi(A)$, there is some $ \omega'$ such that $ (y,\omega')\in A$, and we have $ \delta_x(y,\omega')=(\delta_{x'}(y),\delta_{\omega,\omega'})$, which means that in the upset we have $ (\delta_{x'}(y),1)\in U_x(A)$. Again taking upsets and convex hulls we get \begin{align*} \left(T_{x'}(\pi(A)), 1\right)\subseteq T_x(A) \end{align*}

So we just proved the following:

Since $ T_x(A)$ is convex, it follows that anything in the convex hull of these two subsets is also in $ T_x(A)$, and this gives us a knob to play with, which we can choose later to optimize some quantity (as it often happens when proving concentration inequalities):

Let $ u,v$ be such that $ \left\| u \right\| = d_T(x',A|_\omega),\left\| v \right\|=d_T(x',\pi(A))$ and consider the guys $ (u,0)\in(T_{x'}(A|_\omega),0),(v,1)\in (T_{x'}(\pi(A)),1)$. From the previous lemma it follows that $ (1-\lambda)u + \lambda v\in T_x(A)$, and thus \begin{align*} d_T^2(x,A)&\leq \left\| (1-\lambda)(u,0)+\lambda (v,1) \right\|^2 = \lambda^2 + \left\| (1-\lambda)u+\lambda v \right\|^2 \\ &\leq \lambda^2 + \left((1-\lambda)\left\| u \right\|+\lambda \left\| v \right\|\right)^2\leq \lambda^2 + (1-\lambda)\left\| u \right\|^2 + \lambda \left\| v \right\|^2 \end{align*} which is exactly what we wanted.

This looks sufficiently nice for us to jump into the induction, so perhaps it's not super surprising that $ f(x)=x^2$ and we have

We proceed by induction; for $ n=1$, it's clear, and for the induction step, we write $ x=(x',\omega)$, $ \Omega_1\times\ldots\times \Omega_{n-1}=\Omega'$. Notice that our inequality for $ d_T^2(x,A)$ above works for all $ \omega$ and all $ \lambda$; so in order to use all the flexibility we have, we'll pick $ \lambda$ based on $ \omega$ - we expect it to give us a better bound. So fixing $ \omega$ for now, we have \begin{align*} &\int_{\Omega'}^{} \exp\left( \frac{1}{4}d_T^2(x,A) \right) \, dx' \leq \int_{\Omega'}^{} \exp\left( \frac{1}{4}\left((1-\lambda)d_T^2(x',A|_\omega) +\lambda d_T^2(x',\pi(A)) + \lambda^2\right) \right) \, dx' \\ &=e^{\lambda^2/4} \int_{\Omega'}^{} \left(\exp\left( \frac{1}{4}d_T^2(x',A|_\omega) \right)\right)^{1-\lambda} \left(\exp\left( \frac{1}{4}d_T^2(x',\pi(A)) \right)\right)^{\lambda} \, dx' \\ &\leq e^{\lambda^2/4} \left(\int_{\Omega'}^{} \exp\left( \frac{1}{4}d_T^2(x',A|_\omega) \right) \, dx'\right)^{1-\lambda} \left(\int_{\Omega'}^{} \exp\left( \frac{1}{4}d_T^2(x',\pi(A)) \right) \, dx'\right)^\lambda \\ &\leq e^{\lambda^2/4} \mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A|_\omega\right]^{\lambda-1}\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]^{-\lambda} \end{align*} Now we want to pick $ \lambda=\lambda(\omega)$ to bound this from above by some small linear function of $ \mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A|_\omega\right]$, because we know that $ \int_{\Omega_n}^{} \mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A|_\omega\right] \, d\omega= \mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A\right]$.

As it turns out, we have the following fact from analysis: for any $ r\in[0,1]$, \begin{align*} \min_{0\leq\lambda\leq1} e^{\lambda^2/4}r^{\lambda-1}\leq 2-r \end{align*} Using this with $ r= \mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A|_\omega\right]/\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]$, we get \begin{align*} \int_{\Omega'}^{} \exp\left( \frac{1}{4}d_T^2(x,A) \right) \, dx' \leq \left(2-\frac{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A|_\omega\right]}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]}\right)\frac{1}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]} \end{align*} and hence \begin{align*} \int_{\Omega}^{} \exp\left( \frac{1}{4}d_T^2(x,A) \right) \, dx&\leq \int_{\Omega_n}^{} \left(2-\frac{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A|_\omega\right]}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]}\right)\frac{1}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]}\, d\omega \\ &= \left(2-\frac{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A\right]}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]}\right)\frac{1}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[\pi(A)\right]}\leq \frac{1}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A\right]} \end{align*} so we're done!

As an immediate corollary, we get the following `blowup' statement:

We have \begin{align*} \frac{1}{\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[B\right]}&\geq\int_{\Omega}^{} \exp\left( \frac{1}{4}d_T^2(x,B) \right) \, dx\geq \int_{A}^{} \exp\left( \frac{1}{4}d_T^2(x,B) \right) \, dx \\ &\geq \int_{A}^{} e^{t^2/4} \, dx = e^{t^2/4}\mathop{\rm \mathbf{Pr}}\nolimits_{}\left[A\right] \end{align*}

From this it follows directly that if a set $ A$ has, say, half the measure, then blowing it up by a tiny bit takes almost all the measure.

Now, dealing with the Talagrand distance is kind of awkward, so in practice we'd also like some quick and dirty way to get some reasonable lower bound on it. Namely, suppose that there is some hyperplane given by $ \left<\varphi_x,y\right>=t$ such that all points from $ \sigma_x(B)$ are on the far side of that plane, i.e. every $ y\in\sigma_x(B)$ satisfies $ \left<\varphi_x,y\right>\geq t$. Moreover suppose that $ \varphi_x$ has all nonnegative coordinates. Then clearly after taking the upset and the convex hull this remains true, so that in fact $ T_x(B)$ is on the far side of the hyperplane. Hence, $ d_T(x,B)\geq t/ \left\| \varphi_x \right\|$.

This observation in fact extends to all $ \varphi_x$ (not just the ones with nonnegative coefficients) using the fact that $ \sigma_x(B)$ is a finite set, so the distance $ d_T(x,B)$ is realized as the distance to some hyperplane, but let's not worry about the technical details here; what we get in the end is the following: