# Strong duality for linear programs

In this post, we prove the strong duality theorem for linear programs, which says that if either the primal or dual is feasible, then their optimal values are equal.

I will take it for granted that lots of natural problems can be written as linear programs. The maximum flow problem is one example of historic significance: it dates back to 1954 when it was used to model Soviet railway traffic.

Assuming you've covered some basic maximum flow stuff, you will have noticed the nice thing about that optimization problem: we can easily certify the optimality of a maximum flow by exhibiting a flow which saturates some cut. Moreover, the value of any flow is $ \leq$ the value of any cut, and these two sets touch at the max-flow and min-cut. So the maximization problem for flows and the minimization problem for cuts are coupled in a special way. That's very neat when you start poking around trying to prove things; for example, you can easily prove some non-trivial combinatorial theorems, such as Hall's and Menger's theorem, if you know just that fact.

It turns out that this complementarity between flows and cuts is an instance of a broader phenomenon, known as duality. Maximum flow is a special case of a more general class of problems: linear programs. One of the standard ways to write a linear program $ P$ is as the following problem: \begin{align*} \text{find } z^*=\min \left<c,x\right> \quad \text{ subject to }x\geq 0 \text{ and }Ax=b \end{align*} where $ A\in\mathbb{R}^{m\times n},c\in\mathbb{R}^n, b\in \mathbb{R}^m$. Suppose $ A$ has rows $ r_1,\ldots,r_m$ and we treat $ Ax=b$ as a linear system \begin{align*} \left<r_1,x\right> &= b_1 \\ &\vdots \\ \left<r_m,x\right>&=b_m \end{align*} Then, to reason about how small $ \left<c,x\right>$ can get for $ x\geq 0$ a solution to that system, we can try refuting possible values of $ \left<c,x\right>$ by considering linear combinations of the equations to get some sort of contradiction: if we multiply the $ i$-th equation by $ y_i$, this corresponds to getting the new equation \begin{align*} \sum_{i=1}^{m} y_i \left<r_i,x\right> = \sum_{i=1}^{m} y_ib_i \end{align*} or in other words, simply \begin{align*} \left<y,Ax\right> = \left<y,b\right> \end{align*} The twist is that this is equivalent to \begin{align*} \left<A^Ty,x\right>=\left<y,b\right> \end{align*} and so if we find some $ y$ such that $ A^Ty\leq c$, by $ x\geq 0$ we will have a witness that \begin{align*} \left<c,x\right> \geq \left<A^Ty,x\right> = \left<y,b\right> \end{align*} or in other words $ z^* \geq \left<y,b\right>$.

The content of strong duality for linear programs is that, under some reasonable assumptions, this approach will in fact be successful, in the sense that we will be able to refute all the impossible values of the objective function $ \left<c,x\right>$ by finding contradictions of the above form. Formally, if we define the **dual linear program** $ D$ by \begin{align*} \text{find } w^* = \max \left<b,y\right> \quad \text{ subject to }A^Ty\leq c \end{align*} then $ z^* = w^*$. In this context, $ P$ is called the **primal** linear program. Thus, not only is every feasible solution to the dual a bound on the optimum of the primal and vice-versa, but we can *always* certify optimality in various convenient ways:

- by exhibiting a feasible solution of the dual with value $ \geq z^*$
- by exhibiting a feasible solution of the primal with value $ \leq w^*$
- by exhibiting feasible solutions of the primal and dual with the same value.

In fact, the intuition above already gives us some basic duality observations:

Taking all the feasibility constraints into account, we have \begin{align*} \left<c,x\right> - \left<b,y\right> = \left<c,x\right>-\left<Ax,y\right> = \left<c,x\right>-\left<x,A^Ty\right>=\left<x,c-A^Ty\right>\geq 0 \end{align*} So if the values of $ \left<c,x\right>$ and $ \left<b,y\right>$ meet feasibly, it's necessarily at optimums for both $ P$ and $ D$.

This has some other nice consequences:

- if $ P$ is feasible, then $ D$ cannot be unbounded;
- conversely, if $ P$ is unbounded, $ D$ is infeasible.
- if $ D$ is feasible, then $ P$ cannot be unbounded;
- conversely, if $ D$ is unbounded, $ P$ is infeasible.

The quantity $ \left<c,x\right>-\left<b,y\right>$ is called the **duality gap**, and it turns out to be a useful guide for how close we are to the solution, as we will see in the posts on the interior point method.

The theorem we will be focusing on in the rest of this post is the following:

Suppose we want to refute solutions for a simpler problem: just the linear system $ Ax=b$. What would a refutation look like? Let's follow the above ideas but think more geometrically.

The set of vectors $ \left\{ Ax \ \big| \ x\in\mathbb{R}^n\right\} $ is a subspace of $ \mathbb{R}^m$. How do we certify $ b$ is *not* in that subspace? If it's not, it is at some nonzero distance from that subspace. In other words, if we write $ b=Ax_0+b^\perp$ for some $ b^\perp$ in the orthogonal complement $ \left\{ Ax \ \big| \ x\in\mathbb{R}^n\right\}^\perp$, we have $ \left\| b^\perp \right\|\neq 0$. Now observe that \begin{align*} \left\| b^\perp \right\| = \left<b^\perp,b^\perp\right> = \left<b^\perp,Ax_0+b^\perp\right>=\left<b,b^\perp\right> \end{align*} Hence, if $ Ax=b$ has no solution, we will have some $ y\in \left\{ Ax \ \big| \ x\in\mathbb{R}^n\right\}^\perp$ with $ \left<b,y\right>\neq 0$. Recalling that the orthogonal complement of $ \left\{ Ax \ \big| \ x\in\mathbb{R}^n\right\}$ is the kernel of $ A^T$, this gives us the following baby duality theorem:

Suppose both have a solution - then taking those solutions, we have \begin{align*} 0\neq \left<b,y\right> = \left<Ax,y\right> = \left<x,A^Ty\right>=0 \end{align*} Conversely, if $ Ax=b$ has no solution, then as we saw above there is a $ y=b^\perp$ which is a solution to the other problem.

To warm up, we will verify that `dual' is a good name in the sense that by taking the dual of the dual we retrieve the primal; this symmetry will also be useful later on in our proofs.

To do that, we first need to write the dual in the standard form. This will naturally lead us to a extremely handy bag of tricks for `syntactic' manipulation of the form of linear programs. For starters, we can easily cast the dual as a minimization problem by negating things (notice that the value of this problem will be $ -w^*$): \begin{align*} \text{find } \min \left<-b,y\right> \quad \text{ subject to }A^Ty\leq c \end{align*} Now we need to deal with $ A^Ty\leq c$. It's equivalent to the constraints $ A^Ty+s=c, s\geq 0$ - this is the trick of using a **slack variable** $ s$ to pass from an inequality to an equality constraint. But we're still not there. A key trick in such a situation is to write \begin{align*} y = y^+ - y^- \text{ where $ y^+\geq0$ and $ y^-\geq0$} \end{align*} to enforce the nonnegativity constraints on the variable. So we're reduced to \begin{align*} \text{find } \min \left<-b,y\right> \quad \text{ subject to }A^Ty^+-A^Ty^-+s = c, y^+,y^-,s\geq0 \end{align*} The final trick is to group the variables into a single vector. Let $ y'=(y^+,y^-,s)^T$ and let $ A'= \begin{pmatrix} A^T & -A^T & I \end{pmatrix}$ for an identity matrix $ I$ such that $ A'$ acts on $ y'$. Also, let $ b'=(-b,b,0)^T$. Then we're reduced to \begin{align*} \text{find } \min \left<b',y'\right> \quad \text{ subject to }A'y'=c, y'\geq0 \end{align*}

Now taking the dual of *that*, we end up with the problem \begin{align*} \text{find }\max \left<c,x'\right> \quad \text{ subject to }(A')^Tx'\leq b' \end{align*}

But now the constraint $ (A')^Tx\leq b'$ gives $ Ax\leq -b, -Ax\leq b, x\leq 0$, which gives $ Ax=-b$ and $ x\leq 0$. Taking $ x=-x'$ as our variable, we recover the primal.

The reduction from the dual form to the primal form is common, so let's single it out as a formulaic statement:

The linear program \begin{align*} \text{find } \max \left<b,y\right> \quad \text{ subject to }A^Ty\leq c \end{align*} is equivalent to the linear program \begin{align*} \text{find } \min \left<b',y'\right> \quad \text{ subject to }A'y'=c, y'\geq0 \end{align*} where $ y'=(y^+,y^-,s),A'=\begin{pmatrix} A^T & -A^T & I \end{pmatrix},b'=(-b,b,0)$ with appropriately compatible dimensions.

It's worth noting that the techniques we used in this section to convert a linear program from one form to another apply much more generally (as we'll even see in this post).

It turns out that there's a generally useful (beyond linear programming) fact with a simple geometric interpretation from which duality follows (it is in fact equivalent to duality), known as Farkas' lemma. It is in turn an extension of the simple idea of refuting solutions to a linear system $ Ax=b$ by exhibiting a contradiction $ y$ such that $ A^Ty=0$ but $ \left<y,b\right>\neq 0$, which we saw above.

Showing that both can't have a solution is easy; so suppose $ Ax=b,x\geq 0$ doesn't have a solution. Here's some intuition for why the lemma should be true.

Let's think of the set $ P=\left\{ Ax \ \big| \ x\geq 0\right\} $ geometrically: it's what's called a **convex cone**, in the sense that if $ x,y\in P$ then $ \alpha x + \beta y\in P$ for all $ \alpha,\beta> 0$. More specifically, $ P$ is the set of linear combinations of the columns $ c_1,\ldots,c_n$ of $ A$ with nonnegative coefficients: pictorially, it is the `solid angle' bounded by the vectors $ c_1,\ldots,c_n$. Of course, some of the columns may lie inside the solid angle defined by some other columns: that would mean that such columns are a **conical linear combination** of some other columns.

We are assuming $ b\notin P$, so that $ b$ is somewhere outside of that solid angle. Given that, we want to find $ y$ such that $ A^Ty\geq 0$ and $ \left<y,b\right><0$. Geometrically, this amounts to $ \left<c_1,y\right>\geq 0, \ldots, \left<c_n,y\right>\geq 0$, i.e., $ y$ is not at an obtuse angle with any of the columns of $ A$, yet $ \left<y,b\right><0$, i.e. $ y$ is at an obtuse angle with $ b$.

But this means that the points $ c_1,\ldots,c_n$ are on one side of the plane normal to $ y$ at the origin, and $ b$ is on the other side of that plane. So it suffices to find a plane through the origin with this separation property! From this geometric interpretation, it's fairly intuitive this should exist. Here are the technical details:

First, if both have a solution, then \begin{align*} 0 \leq \left<A^Ty,x\right>=\left<y,Ax\right>= \left<y,b\right><0 \end{align*} so we get a contradiction. Now suppose $ b\notin P$ has no solution. It's easy to see that $ P$ is closed. Informed by the above intuition, we apply a hyperplane separation theorem to $ P$ and the point $ b$. We get a direction $ y$ and a real number $ \alpha$ such that $ \left<p,y\right>\geq \alpha$ for $ p\in P$ and $ \left<b,y\right><\alpha$. Since $ 0\in P$, $ \alpha\leq 0$, and if $ \alpha<0$, we can just shift the plane so that $ \alpha$ becomes zero.

If you don't want to rely on hyperplane separation theorems, here's a direct argument: let $ p$ be the projection of $ b$ on $ P$ - then intuitively $ p-b$ will be the direction we need, since you can convince yourself by pictures that it should make acute angles with the $ c_i$ and an obtuse angle with $ b$.

Indeed, the key technical fact is that for all $ z\in P$ we have $ \left<z-p,b-p\right>\leq0$ since otherwise there's a point in $ P$ closer to $ b$ than $ p$. Now if we write $ p=Aw$ and choose $ y=b-p$, we easily get the properties we need from this.

This theorem opens the doors to many other results where it might be harder to think about the geometry directly, but we can reduce to the above setting with the tricks we used to convert between different forms of linear programs. Relevant to us is the following:

Intuitively, if we're trying to certify that $ Ax=d<b$ has no solution, then we can try looking for a vector $ y$ such that $ A^Ty=0$ but $ \left<d,y\right>\neq0$; now fixing the directions, we see that if $ \left<b,y\right><0$ and $ y\geq 0$ that would break all $ d<b$.

To see why both can't happen, assuming that we get \begin{align*} 0>\left<b,y\right> \geq \left<Ax,y\right>= \left<x,A^Ty\right>=0 \end{align*} Now we want to reduce to something of the form of Farkas' lemma. We're now experienced with those tricks: suppose there's no $ x$ such that $ Ax\leq b$, then there's no $ x' = (x^+,x^-,s)$ such that \begin{align*} A'x'=b \quad \text{ subject to }x'\geq 0 \end{align*} where $ A'=\begin{pmatrix} A & -A & I \end{pmatrix}$ and thus by Farkas' lemma we get some $ y$ such that \begin{align*} A'^Ty\geq 0 \quad \text{ and }\left<b,y\right><0 \end{align*} which amounts to exactly what we want.

With all these observations, proving the strong duality theorem is now easy:

By the symmetry between the primal and dual, it suffices to prove this when the primal is feasible. If the primal is unbounded, then by the weak duality theorem it follows that the dual is infeasible, and in this case $ z^*=w^*=-\infty$.

So, the primal has a finite optimum. We will now show that a solution of the dual with value $ \geq z^*$ must exist, which by weak duality will mean that $ z^*=w^*$.

For sake of contradiction, suppose such a solution doesn't exist, i.e. there is no $ y$ such that $ A^Ty\leq c$ and $ \left<b,y\right>\geq z^*$. This amounts to $ \begin{pmatrix} A & -b \end{pmatrix}^Ty\leq (c, -z^*)$ not having a solution. By corollary 1, this means there exists some $ (x,\alpha)\geq 0$ with $ Ax- \alpha b = 0$ and $ \left<c,x\right> - z^*\alpha<0$.

There are two cases:

- If $ \alpha=0$, then $ Ax=0$ and $ \left<c,x\right><0$. But then if $ x'$ is a feasible solution of $ P$ (one exists), $ x'+kx$ is feasible and can drive the objective function arbitrarily low; so $ P$ is unbounded, contradicting the case we're in.
- If $ \alpha>0$, rescaling $ x$ by $ 1/\alpha$ we get a feasible solution of $ P$ which beats $ z^*$, contradiction again.

We will see that duality, and the duality gap in particular, gives us a useful handle on how close we are to a solution of the linear program. This will be a key part of the analysis of a polynomial-time algorithm to solve linear programs approximately, the interior point method.

As another application, the duality theorem has many combinatorial avatars, such as the famously equivalent theorems of Hall, Koenig and Menger from graph theory, to name some. We'll (hopefully) see some of these applications too in a subsequent post.