A Coin Flip by Any Other Name

A Coin Flip by Any Other Name… Consider the following graph. If we remove each edge with independent probability 1 / 2 , 1/2, 1/2, what is the probability that there is still a path from the top to the bottom vertex? Since the removal of each subset of edges is equally likely, this kind of question is equivalent to counting the subsets of edges that we can remove without disconnecting two vertices. How easily we crunch through this enumeration depends on how we manage to decompose the problem. However, suppose we learn that the probability of a path remaining in this particular graph is exactly 1 / 2. 1/2. 1/2. Forget combinatorics—something strange is happening! In fact, it turns out that our special case can be solved with a "duality" trick, illustrated below. (Click the graph to generate a new random configuration.) There will be a path between the top and bottom nodes exactly when there is no path between the left and the right nodes. Furthermore, the top-to-bottom and left-to-right graphs are the same and, once identified, have the same distribution of missing edges. In other words, we have found a rearrangement of the outcomes of our random process—the subsets of missing edges—that swaps the event that a path exists with the event that no path exists. Since this rearrangement preserves probability, it follows that each event must have probability 1 / 2. 1/2. 1/2. (I learned this beautiful idea from this mathpages article.) Let's take a look at another suspicious problem. Suppose we choose four points on the unit circle. What is the probability that the origin lies in their convex hull? (Again, click for random configurations.) As it turns out, the answer is again exactly 1 / 2. 1/2. 1/2. If you're skeptical, you can press the "play" icon above to resample the experiment at high speed. (The black lines under the bar graph delineate the ± 3 σ \pm 3 \sigma ±3σ interval that the empirical proportion should stay within about 99 % 99\% 99% of the time for large n n n.) In my experience, things don't happen with even odds unless they have a very good reason to, and that reason is usually a symmetry. For example, a coin will land heads as often as it lands tails because it is approximately symmetric, and there is no reason for the laws of physics to privilege one side of a symmetric coin so long as we have given it enough energy to make its initial condition approximately irrelevant. Is there any appropriate symmetry in the problem of the four points? In such questions, it is important to have a degree of faith. Of course there is a symmetry; symmetry is everywhere! Unfortunately, symmetry does occasionally work in mysterious ways, and here it is not exactly obvious. Given a pair ( x , y ) (x, y) (x,y) of unit vectors, define the function normal ( x , y ) \text{normal}(x, y) normal(x,y) to give the unique pair ( x ′ , y ′ ) (x', y') (x′,y′) of unit vectors that satisfy the relations ⟨ x ′ , x ⟩ = 0 , ⟨ y ′ , y ⟩ = 0 , ⟨ x ′ , y ⟩ ≤ 0 , ⟨ y ′ , x ⟩ ≤ 0. \begin{align*} & \langle x', x \rangle = 0, & \langle y', y \rangle = 0, \\ & \langle x', y \rangle \le 0, & \langle y', x \rangle \le 0. \end{align*} ⟨x′,x⟩=0,⟨x′,y⟩≤0,⟨y′,y⟩=0,⟨y′,x⟩≤0. Now, consider the operation taking the four points ( ( a , b ) , ( c , d ) ) ((a, b), (c, d)) ((a,b),(c,d)) to ( normal ( a , b ) , normal ( − c , − d ) ) . (\text{normal}(a, b), \text{normal}(-c, -d)). (normal(a,b),normal(−c,−d)). We claim that this is measure-preserving for the distribution under which each point is independent and uniformly distributed. Furthermore, ignoring an event of probability 0 , 0, 0, we claim that the original four points generate a convex hull that contains the origin iff our four new points do not. So, our polygon contains the origin with the same probability that it does not. This observation solves the four-points problem in the same way that we solved the graph problem. We can visualize this construction, although the picture is admittedly a little messy. In the diagram below, vectors with the same color are in the same quadruplet. (Note that which quadruplet is which does not matter, since the operation we have defined is its own inverse.) Meanwhile, the vectors drawn with solid lines are the pairs ( a , b ) (a, b) (a,b) and normal ( a , b ) , \text{normal}(a, b), normal(a,b), while the vectors with dashed lines are the pairs ( c , d ) (c, d) (c,d) and normal ( − c , − d ) . \text{normal}(-c, -d). normal(−c,−d). 90 90 90 (After writing this post, I learned another geometric argument for this problem from Ian Agol on mathstodon.) The remainder of this post is divided into three parts. First, we explain how this kind of problem can be solved in other dimensions and for other numbers of points using a combinatorial argument. Second, we investigate some particularly memorable consequences of this general solution. Finally, we return to the problem of finding a "duality-like" rearrangement to explain the special fact that n n n points in R 2 n \R^{2n} R2n have the origin in their convex hull with probability 1 / 2. 1/2. 1/2. We also make note of some remaining mathematical mysteries. We begin with an apparently unrelated combinatorial problem: how many cells do k k k linear hyperplanes cut R n \R^n Rn into? This is related to the classic lazy caterer problem, which asks how many pieces we can cut out of a pizza pie with a certain number of cuts. The lazy caterer problem is solved by observing that each cut increases the number of pieces of pizza by exactly 1 + c , 1 + c, 1+c, where c c c is the number of existing cuts that our new cut intersects. In fact, any number of cuts can be made in such a way that each cut intersects all the ones made before, so the maximum number of pieces we can make with n n n cuts is 1 + ( 1 + 2 + ⋯ + n ) = 1 + ( n + 1 2 ) . 1 + (1 + 2 + \dots + n) = 1 + \binom{n + 1}{2}. 1+(1+2+⋯+n)=1+(2n+1). The case of k k k linear hyperplanes cutting R n \R^n Rn can be analyzed similarly. It is only important to note that, while in the lazy caterer problem a new cut may or may not intersect a given existing cut, two non-equal hyperplanes always intersect, and so it will happen that k k k hyperplanes always make the same number of cells in R n . \R^n. Rn. Let F k n F^n_k Fkn be this number. The increase in cells cut out of R n \R^n Rn caused by the addition of one new hyperplane H H H equals the number of cells that H H H intersects. This is the same as the number of cells that the existing k k k hyperplanes cut H H H into. Since H H H is ( n − 1 ) (n - 1) (n−1)-dimensional, we have the recurrence F k + 1 n = F k n + F k n − 1 . F^n_{k + 1} = F^n_k + F^{n - 1}_k. Fk+1n=Fkn+Fkn−1. Especially in view of this recurrence, it makes sense to define F 0 n = 1 F^n_0 = 1 F0n=1 for all n , n, n, and F k 0 = 0 F^0_k = 0 Fk0=0 for k > 1. k > 1. k>1. (It's slightly odd to imagine a "hyperplane" cutting R 0 \R^0 R0 into 0 0 0 cells, but it's what's demanded by the recurrence relation we've observed.) We can now compute F k n F^n_k Fkn in general using a generating function. Let's put F k ( x ) = F k n x n . F_k(x) = F^n_k x^n. Fk(x)=Fknxn. Then the recurrence above means that, as far as the non-constant terms of F k ( x ) F_k(x) Fk(x) are concerned, F k + 1 ( x ) = F k ( x ) + x F k ( x ) = ( 1 + x ) F k ( x ) . F_{k + 1}(x) = F_k(x) + x F_k(x) = (1 + x) F_k(x). Fk+1(x)=Fk(x)+xFk(x)=(1+x)Fk(x). Keeping in mind that F k 0 = 0 F^0_k = 0 Fk0=0 for k > 1 , k > 1, k>1, we find that F 0 ( x ) = 1 1 − x , F 1 ( x ) = 2 x ( 1 + x ) 1 − x , … F_0(x) = \frac{1}{1-x}, \quad F_1(x) = \frac{2x(1 + x)}{1 - x}, \quad \dots F0(x)=1−x1,F1(x)=1−x2x(1+x),… and in general, F k ( x ) = 2 x ( 1 + x ) k − 1 1 − x . F_k(x) = \frac{2x(1 + x)^{k - 1}}{1 - x}. Fk(x)=1−x2x(1+x)k−1. We arrive at the following. Proposition: Generically, k > 0 k > 0 k>0 hyperplanes cut R n \R^n Rn into F k n = [ x n ] F k ( x ) = [ x n ] 2 x ( 1 + x ) k − 1 1 − x = 2 ∑ j < n ( k − 1 j ) \begin{align*} F^n_k & = [x^n] F_k(x) = [x^n] \frac{2 x (1 + x)^{k - 1}}{1 - x} \\ & = 2 \sum_{j < n} \binom{k - 1}{j} \end{align*} Fkn=[xn]Fk(x)=[xn]1−x2x(1+x)k−1=2j 0 ∧ M x = 0 , (1) \exists x \in \R^k, x > 0 \land M x = 0, \tag{1} ∃x∈Rk,x>0∧Mx=0,(1) which is equivalent to the origin belonging to the convex hull generated by the columns of M . M. M. Let's take a moment to look at this problem in a few different ways. Ignoring the singular event that the origin is on the boundary of our convex hull, Farkas' lemma tells us that the unsatisfiability of ( 1 ) (1) (1) is equivalent to the condition ∃ y ∈ R n , − M T y > 0. (2) \exists y \in \R^n, - M^T y > 0. \tag{2} ∃y∈Rn,−MTy>0.(2) Geometrically, this would mean that all the columns of M M M lie strictly within some half-space. We can also read this as the event that the column space of − M T -M^T −MT contains a strictly positive element, or as the event that a system of k k k strict linear inequalities over R n \R^n Rn admits a solution. As it turns out, it is most straightforward to compute the probability of this last event—the existence of a solution for a random system of strict inequalities. In fact, we need to know remarkably little about the distribution of M M M. We will assume merely that the distribution of its columns λ i \lambda_i λi is invariant under negation of individual columns; that is, under maps of the form ( λ 1 , … , λ k ) ↦ ( λ 1 , … , − λ i , … , λ n ) . (\lambda_1, \dots, \lambda_k) \mapsto (\lambda_1, \dots, -\lambda_i, \dots, \lambda_n). (λ1,…,λk)↦(λ1,…,−λi,…,λn). Consider the k k k hyperplanes defined by equalities ⟨ λ i , x ⟩ = 0. \langle \lambda_i, x \rangle = 0. ⟨λi,x⟩=0. As we explained above, any set of k k k generic hyperplanes cut R n \R^n Rn into a certain number F k n F^n_k Fkn of cells. Let's condition on the value of the vectors λ i \lambda_i λi modulo their sign, which we'll call the variable L . L. L. The F k n F^n_k Fkn cells of R n \R^n Rn are now fixed, and our event space is discrete; its elements correspond to choices of directions for each inequality. We will prove that the probability of the event A A A that our system admits a solution is constant when conditioned on L . L. L. In particular, this proves that the overall probability equals this constant, since P ( A ) = E ( P ( A ∣ L ) ) . P(A) = E(P(A | L)). P(A)=E(P(A∣L)). Given choices of directions, we can assign each cell to a boolean vector in { 0 , 1 } k \{0, 1\}^k {0,1}k that records which inequalities hold over it. We'll write 1 1 1 to mean an inequality holds and 0 0 0 to mean it doesn't. We're interested in the probability that the assignment S S S of cells to binary vectors assigns some cell to the all-ones vector 1 = ( 1 , 1 , … , 1 ) . \mathbf 1 = (1, 1, \dots, 1). 1=(1,1,…,1). Consider the group Z 2 k \Z^k_2 Z2k that acts by flipping directions of our inequalities. An element g ∈ Z 2 k g \in \Z^k_2 g∈Z2k will act on S S S by flipping bits in the binary vectors it assigns to each cell. The set of all possible assignments is clearly the orbit of any particular assignment S 0 S_0 S0 under this action, and by assumption of invariance of our distribution, each of the assignments g S 0 g S_0 gS0 is equally likely. Now, notice that g S 0 g S_0 gS0 contains 1 \mathbf 1 1 iff S 0 S_0 S0 contains g 1. g \mathbf 1. g1. So, we're looking at the probability that the random binary vector g 1 g \mathbf 1 g1 belongs to some fixed set S 0 S_0 S0 with F n k F^k_n Fnk elements, which is simply F k n / 2 k . F^n_k/2^k. Fkn/2k. Combined with our computation of F k n F^n_k Fkn from above, we have proven the following. Proposition: Suppose M M M is an n × k n \times k n×k random matrix with distribution invariant under negation of any individual column. Then there exists a vector y ∈ R n y \in \R^n y∈Rn verifying M T y < 0 M^T y < 0 MTy<0 with probability F k n 2 k = 2 − ( k − 1 ) ∑ j < n ( k − 1 j ) . \frac{F^n_k}{2^k} = 2^{-(k - 1)} \sum_{j < n} \binom{k - 1}{j}. 2kFkn=2−(k−1)j 0 ∧ M x = 0 (1) \exists x \in \R^k, x > 0 \land M x = 0 \tag{1} ∃x∈Rk,x>0∧Mx=0(1) for a matrix M M M with k = 4 k = 4 k=4 columns and n = 2 n = 2 n=2 rows. According to our Farkas' lemma argument, this is equivalent (with probability 1 1 1) to the non-existence of a solution to the system − M T y > 0 \, - M^T y > 0 −MTy>0 of 4 4 4 inequalities in R 2 . \R^2. R2. (This corresponds to the fact that the polygon spanned by our vectors does not contain the origin exactly when there exists some half-plane containing all of them.) Since 4 4 4 lines cut R 2 \R^2 R2 into F 4 2 = 8 F^2_4 = 8 F42=8 pieces and there are 2 4 = 16 2^4 = 16 24=16 possible binary vectors with length 4 , 4, 4, our invariance argument tells us that our polygon contains the origin with probability 1 / 2. 1/2. 1/2. In general, a distribution of our four points that is invariant under Z 2 4 \Z^4_2 Z24 is a mixture of distributions like the one depicted below in the circle on the left. We've also depicted the dual problem—the system of inequalities—on the right by drawing the half-space normal to each vector. Farkas' lemma says that our polygon on the left will contain the origin exactly when one of the cells on the right is not colored by any half-space. It's interesting to keep in mind that the method we used to compute this probability is very different from the duality-type bijection we mentioned at the beginning of this post. Rather than transforming the event space as a whole, here we are fixing a particular way that the vectors λ i \lambda_i λi decompose R n \R^n Rn into cells and using an invariance that acts on the relationship between those particular cells and inequalities. Now, let's take a look at the probabilities we have computed for general k k k and n . n. n. To mix things up, let's also look at condition ( 1 ) (1) (1) another way: it is the event that k k k randomly chosen hemispheres cover the entire unit sphere in R n \R^n Rn. How many hemispheres do we need to paint at random until the whole sphere is painted? Let X X X be the number of hemispheres required. Using the artificial variable S k − 1 S_{k - 1} Sk−1 that gives a sum of k − 1 k - 1 k−1 Bernoulli variables, we have P ( X − 1 > k ) = P ( S k < n ) . P(X - 1> k) = P(S_k < n). P(X−1>k)=P(Sk 0 , (2) \exists y \in \R^n, - M^T y > 0 \tag{2}, ∃y∈Rn,−MTy>0,(2) which we will view as the probability that n n n random vectors in R k \R^k Rk admit a strictly positive linear combination. What can we say about how many random vectors we need to draw from R k \R^k Rk until we can get a positive linear combination? Again, we use the variables S k . S_k. Sk. We proved above that condition ( 2 ) (2) (2) is satisfied with probability P ( S k − 1 < n ) . P(S_{k - 1} < n). P(Sk−1 k / 2 − 1 / 2. n > k/2 - 1/2. n>k/2−1/2. When k k k is even, n = k / 2 n = k/2 n=k/2 will have exactly even odds. On the other hand, applying the law of large numbers to the variable S k − 1 / ( k − 1 ) , S_{k - 1}/(k - 1), Sk−1/(k−1), we find that a subspace of dimension n > ( 1 2 + ϵ ) k n > \left( \frac{1}{2} + \epsilon \right)k n>(21+ϵ)k will have overwhelming probability to contain a positive vector for large k . k. k. Informally, a large random matrix with dimensions n × k n \times k n×k will have a positive element in its column space with probability close to 0 0 0 if k < n / 2 k < n/2 k n / 2. k > n / 2. k>n/2. Finally, let's return to the original mystery of the 4 4 4 points. Our combinatorial proof above shows that, indeed, k k k points in R 2 k \R^{2k} R2k with distribution invariant under the action of Z 2 k \Z^k_2 Z2k will generate the origin in convex combination with probability 1 / 2. 1/2. 1/2. However, we wonder if there might be another way of proving this fact. Specifically, we would like to prove it in the same manner as the bridge puzzle—by finding a natural way to rearrange our event space that swaps our event with its converse. One idea is to try looking at duality. Throughout our analysis above, we used the fact that the conditions ∃ x ∈ R k , x > 0 ∧ M x = 0 , (1) \exists x \in \R^k, x > 0 \land M x = 0, \tag{1} ∃x∈Rk,x>0∧Mx=0,(1) and ∃ y ∈ R n , − M T y > 0. (2) \exists y \in \R^n, - M^T y > 0. \tag{2} ∃y∈Rn,−MTy>0.(2) are mutually exclusive with probability 1 1 1. Geometrically, the satisfiability of one gives a convex certificate for the unsatisfiability of the other. Unfortunately for us, the two conditions here have an inherently different structure; unlike the graph of our earlier puzzle, our condition is not "self-dual." On the other hand, consider the similar-looking problem ∃ x ∈ R k , x > 0 ∧ M x > 0. (3) \exists x \in \R^k, x > 0 \land M x > 0. \tag{3} ∃x∈Rk,x>0∧Mx>0.(3) Excluding a probability 0 0 0 event as usual, the convex certificate for infeasibility of this problem would be ∃ y ∈ R n , y > 0 ∧ − M T y > 0. (4) \exists y \in \R^n, y > 0 \land - M^T y > 0. \tag{4} ∃y∈Rn,y>0∧−MTy>0.(4) If M M M is a square matrix so that k = n k = n k=n, this problem is just the condition above with M M M replaced by − M T . - M^T. −MT. So, for distributions on M M M invariant under this transformation, we have proved that the probability of condition ( 3 ) (3) (3) is 1 / 2. 1/2. 1/2. Can we fit our problem into this format? Given a matrix M M M with dimensions n × 2 n , n \times 2n, n×2n, and let's divide it into blocks M = [ M 0 M 1 ] . M = \begin{bmatrix} M_0 & M_1 \end{bmatrix}. M=[M0M1]. We're interested in knowing when there exists a positive vector x = ( x 0 x 1 ) x = (x_0 \; x_1) x=(x0x1) so that M x = M 0 x 0 + M 1 x 1 = 0. (5) M x = M_0 x_0 + M_1 x_1 = 0. \tag{5} Mx=M0x0+M1x1=0.(5) Since M 0 M_0 M0 is invertible with probability 1 , 1, 1, we can multiply this condition on the left to reach the equivalent equation M 0 − 1 M 1 x 1 = x 0 . M_0^{-1} M_1 x_1 = x_0. M0−1M1x1=x0. We will have positive vectors ( x 0 , x 1 ) (x_0, x_1) (x0,x1) solving this equation exactly when M 0 − 1 M 1 M_0^{-1} M_1 M0−1M1 satisfies condition ( 3 ) (3) (3)—that is, its columns generate a positive vector in conic combination. Finally, we need M 0 − 1 M 1 M_0^{-1} M_1 M0−1M1 to coincide in distribution with − ( M 0 − 1 M 1 ) T . -(M_0^{-1} M_1)^T. −(M0−1M1)T. Unfortunately, this is a bit of a tricky step! I see that it is true when M 0 M_0 M0 and M 1 M_1 M1 are independently distributed over the Haar measure on orthogonal matrices, but not when they are, say, Gaussian random matrices. (Write me an email if you know whether this is true!) Fortunately, there is another way ahead. From another point of view, the existence of x > 0 x > 0 x>0 satisfying Equation ( 5 ) (5) (5) means that the interior of the cone C 0 C_0 C0 generated by the columns of M 0 M_0 M0 intersects the interior of the cone C 1 C_1 C1 generated by the columns of − M 1 . -M_1. −M1. (Keep in mind that an element is in the interior of a cone generated by n n n vectors in R n \R^n Rn exactly when it can be written as a positive conic combination of them.) Meanwhile, the convex certificate for the non-existence of such an intersection would be a hyperplane strictly separating C 0 C_0 C0 from C 1 , C_1, C1, which is the same as an element simultaneously in the interiors of C 0 ⊥ C_0^\perp C0⊥ and − C 1 ⊥ , -C_1^\perp, −C1⊥, where C ⊥ = { x ∈ R n : ∀ y ∈ C , ⟨ x , y ⟩ ≤ 0 } C^\perp = \{ x \in \R^n : \forall y \in C, \langle x, y \rangle \le 0 \} C⊥={x∈Rn:∀y∈C,⟨x,y⟩≤0} denotes the dual cone. Now, with probability 1 , 1, 1, the dual of a cone generated by n n n vectors in R n \R^n Rn is also generated by n n n vectors. So, let M 0 ⊥ M_0^\perp M0⊥ and M 1 ⊥ M_1^\perp M1⊥ be matrices whose columns give us such generators. Altogether we have argued that the matrix M = [ M 0 M 1 ] M = \begin{bmatrix} M_0 & M_1 \end{bmatrix} M=[M0M1] contains the origin exactly when M ′ = [ M 0 ⊥ − M 1 ⊥ ] M' = \begin{bmatrix} M_0^\perp & -M_1^\perp \end{bmatrix} M′=[M0⊥−M1⊥] does not. (We described exactly this construction for n = 2 n = 2 n=2 at the beginning of this post.) To finish the argument, we would just need to argue that the operation ( − ) ⊥ (-)^\perp (−)⊥ can be defined in a way that leaves a certain distribution over sets of vectors invariant. For example, this is true (and not too hard to prove) for sets of vectors selected independently at random from the unit sphere. (My colleague Reynaldo Gil Pons from across the hall came up with this argument to solve a problem I had taped to my office door.) Over all the topics of we've dealt with in this post, two unresolved issues stand out to me. First, I wonder if it is possible to interpret the "artificial" binomial variable S k − 1 S_{k - 1} Sk−1 in a sensible way. For example, it there a "reasonable" way to map ordered bases of R n \R^n Rn to sequences of ( n − 1 ) (n - 1) (n−1) bits in such a way that each bit ends up being independently and uniformly distributed and so that, where c c c is the number of non-zero bits produced, c + 1 c + 1 c+1 is the number of vectors we need to take from the sequence in order to get a positive linear combination? Because the probabilities agree, there are at least "strange and unreasonable" ways of making this assignment. Are there, say, any ways that we can compute in practice?

A Coin Flip by Any Other Name

Share this article

Related Articles