proof of Birkhoff-von Neumann theorem


First, we prove the following lemma:
Lemma:
A convex combinationMathworldPlanetmath of doubly stochastic matrices is doubly stochastic. Proof:
Let {Ai}i=1m be a collection of n×n doubly-stochastic matrices, and suppose {λi}i=1m is a collection of scalars satisfying i=1mλi=1 and λi0 for each i=1,,m. We claim that A=i=1mλiAi is doubly stochastic.
Take any i{1,,m}. Since Ai is doubly stochastic, each of its rows and columns sum to 1. Thus each of the rows and columns of λiAi sum to λi.
By the definition of elementwise summation, given matrices N=M1+M2, the sum of the entries in the ith column of N is clearly the sum of the sums of entries of the ith columns of M1 and M2 respectively. A similar result holds for the jth row.
Hence the sum of the entries in the ith column of A is the sum of the sums of entries of the ith columns of λkAk for each i, that is, k=1mλk=1. The sum of entries of the jth row of A is the same. Hence A is doubly stochastic.
Observe that since a permutation matrixMathworldPlanetmath has a single nonzero entry, equal to 1, in each row and column, so the sum of entries in any row or column must be 1. So a permutation matrix is doubly stochastic, and on applying the lemma, we see that a convex combination of permutation matrices is doubly stochastic. This provides one direction of our proof; we now prove the more difficult direction: suppose B is doubly-stochastic. Define a weighted graph G=(V,E) with vertex set V={r1,,rn,c1,,cn}, edge set E, where eij=(ri,ci)E if Bij0, and edge weight ω, where ω(eij)=Bij. Clearly G is a bipartite graphMathworldPlanetmath, with partitionsMathworldPlanetmath R={r1,,rn} and C={c1,,cn}, since the only edges in E are between ri and cj for some i,j{1,,n}. Furthermore, since Bij0, then ω(e)>0 for every eE. For any AV define N(A), the neighbourhood of A, to be the set of vertices uV such that there is some vA such that (u,v)E. We claim that, for any vV, uN({v})ω(u,v)=1. Take any vV; either vR or vC. Since G is bipartite, vR implies N({v})C, and vC implies N({v})R. Now,

v=riuN(ri)ω(ri,u)=eijEj=1,nω(ri,cj)=Bij0j=1,nBij=j=1nBij=1v=cjuN(cj)ω(u,cj)=eijEi=1,nω(ri,cj)=Bij0i=1,nBij=i=1nBij=1

since B is doubly stochastic. Now, take any AR. We have

wN(A)vAω(v,w)=vAwN({v})ω(v,w)=vA1=|A|

Let B=N(A). But then clearly AN(B), by definition of neighbourhood. So

|N(A)|=|B|=wN(B)vBω(v,w)wAvBω(v,w)=vN(A)wAω(v,w)=|A|

So |N(A)||A|. We may therefore apply the graph-theoretic version of Hall’s marriage theoremMathworldPlanetmath to G to conclude that G has a perfect matching. So let ME be a perfect matching for G. Define an n×n matrix P by

Pij={1if eijM0otherwise

Note that Bij=0 implies Pij=0: if Bij=0, then (ri,cj)E, so (ri,cj)M, which implies Pij=0. Further, we claim that P is a permutation matrix:
Let i be any row of P. Since M is a perfect matching of G, there exists e0M such that ri is an end of e0. Let the other end be cj for some i; then Pij=1.
Suppose ii,i2{1,,n} with i1i2 and Pi1,j=Pi2,j=1 for some j. This implies (ri1,cj),(ri2,cj)M, but this implies the vertex i is the end of two distinct edges, which contradicts the fact that M is a matching.
Hence, for each row and column of P, there is exactly one nonzero entry, whose value is 1. So P is a permutation matrix. Define λ=mini,j{1,,n}{BijPij0}. We see that λ>0 since Bij0, and Pij0Bij0. Further, λ=Bpq for some p,q. Let D=B-λP. If D=0, then λ=1 and B is a permutation matrix, so we are done. Otherwise, note that D is nonnegative; this is clear since λPijλBij for any Bij0. Notice that Dpq=Bpq-λPpq=λ-λ1=0. Note that since every row and column of B and P sums to 1, that every row and column of D=B-λP sums to 1-λ. Define B=11-λD. Then every row and column of B sums to 1, so B is doubly stochastic. Rearranging, we have B=λP+(1-λ)B. Clearly Bij=0 implies that Pij=0 which implies that Bij=0, so the zero entries of B are a superset of those of B. But notice that Bpq=11-λDpq=0, so the zero entries of B are a strict superset of those of B. We have decomposed B into a convex combination of a permutation matrix and another doubly stochastic matrix with strictly more zero entries than B. Thus we may apply this procedure repeatedly on the doubly stochastic matrix obtained from the previous step, and the number of zero entries will increase with each step. Since B has at most n2 nonzero entries, we will obtain a convex combination of permutation matrices in at most n2 steps. Thus B is indeed expressible as a convex combination of permutation matrices.

References

G. Birkhoff, “Tres observaciones sobre el algebra lineal,” Univ. Nac. Tucumán Rev, Ser. A, no. 5, pp147–151. (1946)

Title proof of Birkhoff-von Neumann theorem
Canonical name ProofOfBirkhoffvonNeumannTheorem
Date of creation 2013-03-22 13:10:03
Last modified on 2013-03-22 13:10:03
Owner Andrea Ambrosio (7332)
Last modified by Andrea Ambrosio (7332)
Numerical id 14
Author Andrea Ambrosio (7332)
Entry type Proof
Classification msc 15A51