Sparse Cholesky Elimination Tree

Here I derive the elimination tree for the (right-looking) sparse Cholesky algorithm for computing A = LL^T for lower triangular L and sparse matrices A . This tree forms the foundation for most sparse factorization software, even when the underlying assumptions of Cholesky (symmetric and definite) do not apply. Ultimately this tree tells us two things: 1. where nonzeros appear in the matrix L even if not present in the original A (i.e. “fill-in”) and 2. the task dependency graph of our resulting factorization. Traditionally this concept is usually presented in the context sparse triangular solves which is then used as a building-block to a Cholesky factorization. I wanted to instead work directly from a Cholesky factorization, which is what I do below.

Motivating the Elimination Tree

Suppose at first we start with the plain dense right-looking Cholesky algorithm in pseudocode below:

// Input: symmetric positive definite A, assigned to an output matrix L // Output: lower triangular part of L is such that A = LL^T for ( int k = 0 ; k < n ; ++ k ) { // 1. Factor the pivot L [ k ][ k ] = sqrt ( A [ k ][ k ]); // 2. Scale the column below the pivot for ( int i = k + 1 ; i < n ; ++ i ) { L [ i ][ k ] /= L [ k ][ k ]; } // 3. Right-looking rank-1 update of trailing matrix // Note that it is only this step which will create fill-in. for ( int j = k + 1 ; j < n ; ++ j ) { for ( int i = j ; i < n ; ++ i ) { // lower triangle only L [ i ][ j ] -= L [ i ][ k ] * L [ j ][ k ]; } } }

The implied order and loop structure results in a task DAG. In the event that A is sparse it turns out that the DAG as well as the implied fill-in from step (3) can be completely determined by the initial nonzero pattern of A and a simple tree data structure, which is known as the column elimination tree. I will illustrate with a simple 5x5 matrix (only showing the lower triangular part to make it easier to read)

Full Cholesky Task Graph

Taking the above sparse matrix we get

As expected, because A is sparse there are some operations here that we never need to do. If we prune these we get a much smaller graph.

Note the column grouping here doesn’t serve a practical purpose yet, but we may interpret a column group as meaning “when we have completed all operations in a column group i , no future operation references column i ”.

... continue reading