Did you know that there is a mathematical formula that simplifies finding the derivative of a determinant? You can compute the derivative of a determinant of an n x n matrix by using the sum of n other determinants. The n determinants are for matrices that are equal to the original matrix except that you modify the i_th row in the i_th matrix by taking its derivative.

This method is especially useful when you want to evaluate the derivative at a point. Without the formula, you need to compute a derivative in symbolic form, compute the derivative (which will require many applications of the product rule!), and then evaluate the derivative of the determinant at the point. By using the formula, you can compute the derivative of individual elements, evaluate n matrices at a point, and then compute the sum of the n numerical determinants.

This article shows the formula and applies it to computing the derivative or structured covariance matrices that arise in statistical models. However, the formula is applicable to any square matrix.

What is the derivative of a determinant?

If A is a square matrix of numbers, the determinant of A is a scalar number. Geometrically, the determinant tells you how the matrix expands or contracts a cube of volume in the domain when it maps the cube into the range. If det(A) > 1, the linear transformation expands volume; if det(A) < 1, the linear transformation contracts volume. Algebraically, the determinant tells you whether the transformation is invertible (det(A) ≠ 0) or is singular (det(A) = 0).

When A is a constant matrix, det(A) is a number. But if some cells in the matrix depend on a parameter, then the determinant is a function of that parameter. A familiar example from statistics is a structured covariance matrix such as the autoregressive AR(1; ρ) correlation matrix. A 4 x 4 correlation matrix with an AR(1) structure is shown to the right. The value of an off-diagonal element is given as a function of the parameter ρ

The AR(1) correlation structure is used in statistics to model correlated errors. If Σ is AR(1) correlation matrix, then its elements are constant along diagonals. The (i,j)th element of an AR(1) correlation matrix has the form Σ[i,j] = ρ|i – j|, where ρ is a constant that determines the geometric rate at which correlations decay between successive time intervals.

The determinant of an AR(1) matrix

Because the AR(1) matrix has a structure, you can compute the determinant in symbolic form as a function of the parameter, ρ. You will get a polynomial in ρ, and it is easy to take the derivative of a polynomial. The computations are long but not complicated. If A is an n x n AR(1) matrix, the following list gives the analytical derivatives of the determinant of A for a few small matrices:

• n=2: d/dρ( det(A) ) = -2 ρ
• n=3: d/dρ( det(A) ) = +4 ρ (ρ2 – 1)
• n=4: d/dρ( det(A) ) = -6 ρ (ρ2 – 1)2
• n=5: d/dρ( det(A) ) = +8 ρ (ρ2 – 1)3

A rowwise method to find the derivative of a determinant

I learned the derivative formula from a YouTube video by Maksym Zubkov ("Derivative of Determinant (for nxn Matrix)"). To find the derivative of the determinant of A, you can compute the determinant of n auxiliary matrices, Di. The matrix Di is equal to A except for the i_th row. For the i_th row, you replace the elements of A with their derivatives. Because auxiliary matrices are formed by taking the derivatives of rows, this method is sometimes called the rowwise-method of differentiating the determinant.

For example, let's look at n=2. Typesetting matrices in this blog software is tedious, so I will use the SAS/IML notation where elements on the same row are separated by spaces and commas are used to indicate each row. Thus, the 2 x 2 AR(1) matrix is denoted as
A={1 ρ, ρ 1}.
To compute the derivative of the determinant of A, you form the following auxiliary matrices:

• D1 = {0 1, ρ 1}. The first row of D1 contains the derivatives of the first row of A. The determinant of D1 is det(D1) = -ρ
• D2 = {1 ρ, 1 0}. The second row of D2 contains the derivatives of the second row of A. The determinant is det(D2) = -ρ
• The derivative of the determinant of A is the sum of the determinants of the auxiliary matrices, which is -2 ρ. This matches the analytical derivative from the previous section.

Flushed with victory, let's try n=3, which is
A = {1 ρ ρ2, ρ 1 ρ, ρ2 ρ 1}.

• D1 = {0 1 ρ, ρ 1 ρ, ρ2 ρ 1}. The first row of D1 contains the derivatives of the first row of A. The determinant is det(D1) = ρ(ρ2 – 1).
• D2 = {1 ρ ρ2, 1 0 1, ρ2 ρ 1}. The determinant is det(D2) = 2 ρ(ρ2 – 1).
• D3 = {1 ρ ρ2, ρ 1 ρ, 2ρ 1 0}. The determinant of D3 is ρ(ρ2 – 1).
• The derivative of the determinant of A is the sum of the determinants of the auxiliary matrices, which is +4 ρ (ρ2 – 1). Again, this matches the analytical derivative from the previous section.

The following figure shows the mathematical formulas for the derivative of the determinant of a 3 x 3 AR(1) matrix:

The same method works for any other matrices. To find the derivative of det(A), find the sum of the determinants of the auxiliary matrices, where the i_th auxiliary matrix (Di) is obtained by taking the derivative of the i_th row of A.

Numerical computation of derivatives of determinants

Let's graph the determinant function for a 3 x 3 AR(1) matrix. You can do this by evaluating the AR(1) matrices for values of ρ in the interval [0, 1] and graphing the determinants as a function of ρ:

proc iml; /* Construct AR(1; rho): See https://blogs.sas.com/content/iml/2018/10/03/ar1-cholesky-root-simulation.html */ /* return dim x dim matrix whose (i,j)th element is rho^|i - j| */ start AR1Corr(rho, dim); u = cuprod(j(1,dim-1,rho)); /* cumulative product */ return( toeplitz(1 || u) ); finish;   /* compute the DET at each value of rho and graph the determinant function */ rho = do(0, 1, 0.01); det = j(1, ncol(rho), .); do i = 1 to ncol(rho); det[i] = det(AR1Corr(rho[i],3)); end; title "Det(A(rho))"; title2 "A is a 3x3 AR(1) Correlation Matrix"; call series(rho, det) grid={x y} other="refline 0;";

This is the graph of the determinant as a function of the parameter, ρ. From the graph, you can tell that the derivative of this function is 0 at ρ=0 and ρ=1. Furthermore, the derivative is negative on the interval (0, 1) and is most negative near ρ=0.6.

Let's use the derivative formula to compute and graph the derivative of the determinant function:

/* Use the rowwise formula for the derivative of a 3x3 AR(1) matrix: The i_th row of D_i is the derivative of the i_th row of A */ start dAR_3D(rho, i); /* A=matrix; i = row number */ A = AR1Corr(rho, 3); if i=1 then A[1,] = {0 1} || 2*rho; /* derivative of row 1 */ else if i=2 then A[2,] = {1 0 1}; /* derivative of row 2 */ else if i=3 then A[3,] = 2*rho || {1 0}; /* derivative of row 3 */ return( A ); finish;   /* compute derivative of DET at each value of rho and graph the derivative function */ rho = do(0, 1, 0.01); dDet = j(1, ncol(rho), .); do i = 1 to ncol(rho); det = 0; do j = 1 to 3; det = det + det(dAR_3D(rho[i],j)); /* sum of the determinants of the auxiliary matrices */ end; dDet[i] = det; end;   title "Derivative of Det(A(rho))"; title2 "Sum of Determinants of Auxiliary Matrices"; call series(rho, Ddet) grid={x y} other="refline 0;";

This is the graph of the DERIVATIVE of the determinant as a function of the parameter, ρ. As expected, the derivative is 0 at ρ=0 and ρ=1 and is negative on the interval (0, 1). The derivative reaches a minimum value near ρ=0.6.

Summary

The determinant of a square matrix provides useful information about the linear transformation that the matrix represents. The derivative of the determinant tells you how the determinant changes as a parameter changes. This article shows how to use a rowwise method to compute the derivative of the determinant as the sum of auxiliary matrices. The i_th auxiliary matrix is obtained from the original matrix by differentiating the i_th row. An AR(1) correlation matrix is used as an example to demonstrate the rowwise method.

The post The derivative of the determinant of a matrix appeared first on The DO Loop.

Some matrices are so special that they have names. The identity matrix is the most famous, but many are named after a researcher who studied them such as the Hadamard, Hilbert, Sylvester, Toeplitz, and Vandermonde matrices. This article is about the Pascal matrix, which is formed by using elements from Pascal's triangle.

As I've previously discussed, Pascal's triangle has many interesting properties. If you put Pascal's triangle into the elements of a matrix, there are two ways you can do it. The first is to use a lower triangular matrix, L: put Pascal's triangle in the lower triangular elements and zeros in the upper triangular elements. I call this triangular matrix a Pascal matrix of the first kind. The second is to use a full matrix, P, for which the i_th antidiagonal is the i_th row of Pascal's triangle. I call this full matrix a Pascal matrix of the second kind. These two definitions are shown below for Pascal matrices that have N=8 rows:

Interestingly, these two matrices are intimately connected. I have previously discussed that the full matrix (which is symmetric) has a Cholesky factorization, and that factorization is precisely P = L Lt. That's pretty cool!

Why Cholesky factorization is important

I've written about the geometry of the Cholesky transformation and how it correlates and uncorrelates variables in statistical applications. But from the perspective of numerical linear algebra, matrix factorizations are useful because they enable us to solve certain problems efficiently. For example, if you have a symmetric matrix, P, then the Cholesky root enables you to solve the linear system P*v = b by solving two linear systems:

• Solve L w = b for the vector w.
• Solve Lt v = w for the vector v, which is also the solution to P v = b.

In the SAS IML language, you can use the TRISOLV function to solve linear systems. You use various flags to tell the TRISOLV function whether to solve a lower- or upper-triangular system, as follows:

b = {-4, 24, 150, 528, 1452, 3432, 7293, 14300}; v1 = solve(P, b); /* general solution: solve P*v = b */   /* If you have Cholesky root (P=L*L), you can use TRISOLV */ w = trisolv(4, L, b); /* solve L *w = b for w */ v = trisolv(3, L, w); /* solve L*v = w for v */ print v1 v;

As the output shows, the two different ways to solve the problem give similar solutions. In fact, when you call the SOLVE function, it internally performs a decomposition and solves two triangular systems, so the two perform similar computations, although the SOLVE method does not exploit the symmetry of the Pascal matrix.

An explicit inverse for the Pascal matrix

Some matrices have names because they have amazing mathematical properties. We've already seen that Pascal matrices (of the second kind) have Cholesky roots that are also Pascal matrices (of the first kind). But did you know that you can explicitly write down the inverse matrix for the Pascal matrices of the first kind? The following figure shows the inverse of the 8 x8 8 Pascal matrices of the first kind:

The figure shows that the inverse of L looks similar to L itself except that the signs of certain elements are negative. If G is the inverse matrix, then G[i,j] = L[i,j] (-1)i + j where i=1,2,3,... is the row index and j=1,2,3,... is the column index. There are several ways to form the inverse matrix from L. The following statements create a "sign matrix," H, defined by H[i,j] = (-1)i + j. The inverse matrix for L is the elementwise multiplication of L and H, as follows:

/* There is an EXACT inverse for L */ i = row(P); /* n x n matrix where i_th row equals i */ j = col(P); /* n x n matrix where j_th col equals j */ Sign = (-1)##(i+j); /* n x n sign matrix +/-1 */ Linv = Sign#L; print Linv[F=5. r=rname c=cname L="Inverse of L"];

The matrix was shown earlier. You can directly solve the system P v = b by using the inverse of the Cholesky roots and two well-known results from linear algebra:

• The inverse of the product equals the (reversed) product of the inverses: inv(A*B) = inv(B)*inv(A)
• The inverse of the transpose equals the transpose of the inverse: inv(A) = inv(A)
Putting these results together gives a direct method to solve a linear system that involves a Pascal matrix, as follows:

/* The inverse of the product equals the (reversed) product of the inverses. The inverse of the transpose equals the transpose of the inverse. To solve P*v = b, use v = inv(L*L)*b = inv(L)*inv(L)*b = (inv(L))*inv(L)*b */ v = Linv*Linv*b; print v;

The solution vector is identical to the solutions from other methods. However, having an explicit formula for the inverse means that you can solve a linear system that involves a Pascal matrix without performing any numerical linear algebra. For a Pascal matrix (of the second kind) You can construct the Cholesky roots without doing any linear algebra because the roots are the Pascal matrix of the first kind. Furthermore, you can explicitly construct the inverse of the Cholesky roots without doing any linear algebra. Put these facts together and you can solve linear systems that involve Pascal matrices by using only matrix multiplication.

Summary

You might never need to solve a system of linear equations that involves a Pascal matrix, but there are some general principles that statistical programmers can learn from this article:

• If you have a matrix that has a special structure, exploit that structure. Examples include symmetric, banded, tridiagonal, and special "named" matrices. Often you can solve linear systems more efficiently if you use a specific solver rather than a general solver.
• If your matrix is a "named" matrix, read the literature to find out if there are special formulas that enable you to directly write down the form of a root or inverse matrix. Doing so can lead to an efficient way to solve linear systems.
• An interactive matrix language such as SAS/IML enables you to implement mathematical formulas in a few lines of code.

For more information about Pascal matrices, see Edelman, A. and Strang, G. (2004) "Pascal Matrices," The American Mathematical Monthly.

The post Pascal matrices and inverses appeared first on The DO Loop.

Some matrices are so special that they have names. The identity matrix is the most famous, but many are named after a researcher who studied them such as the Hadamard, Hilbert, Sylvester, Toeplitz, and Vandermonde matrices. This article is about the Pascal matrix, which is formed by using elements from Pascal's triangle.

As I've previously discussed, Pascal's triangle has many interesting properties. If you put Pascal's triangle into the elements of a matrix, there are two ways you can do it. The first is to use a lower triangular matrix, L: put Pascal's triangle in the lower triangular elements and zeros in the upper triangular elements. I call this triangular matrix a Pascal matrix of the first kind. The second is to use a full matrix, P, for which the i_th antidiagonal is the i_th row of Pascal's triangle. I call this full matrix a Pascal matrix of the second kind. These two definitions are shown below for Pascal matrices that have N=8 rows:

Interestingly, these two matrices are intimately connected. I have previously discussed that the full matrix (which is symmetric) has a Cholesky factorization, and that factorization is precisely P = L Lt. That's pretty cool!

Why Cholesky factorization is important

I've written about the geometry of the Cholesky transformation and how it correlates and uncorrelates variables in statistical applications. But from the perspective of numerical linear algebra, matrix factorizations are useful because they enable us to solve certain problems efficiently. For example, if you have a symmetric matrix, P, then the Cholesky root enables you to solve the linear system P*v = b by solving two linear systems:

• Solve L w = b for the vector w.
• Solve Lt v = w for the vector v, which is also the solution to P v = b.

In the SAS IML language, you can use the TRISOLV function to solve linear systems. You use various flags to tell the TRISOLV function whether to solve a lower- or upper-triangular system, as follows:

b = {-4, 24, 150, 528, 1452, 3432, 7293, 14300}; v1 = solve(P, b); /* general solution: solve P*v = b */   /* If you have Cholesky root (P=L*L), you can use TRISOLV */ w = trisolv(4, L, b); /* solve L *w = b for w */ v = trisolv(3, L, w); /* solve L*v = w for v */ print v1 v;

As the output shows, the two different ways to solve the problem give similar solutions. In fact, when you call the SOLVE function, it internally performs a decomposition and solves two triangular systems, so the two perform similar computations, although the SOLVE method does not exploit the symmetry of the Pascal matrix.

An explicit inverse for the Pascal matrix

Some matrices have names because they have amazing mathematical properties. We've already seen that Pascal matrices (of the second kind) have Cholesky roots that are also Pascal matrices (of the first kind). But did you know that you can explicitly write down the inverse matrix for the Pascal matrices of the first kind? The following figure shows the inverse of the 8 x8 8 Pascal matrices of the first kind:

The figure shows that the inverse of L looks similar to L itself except that the signs of certain elements are negative. If G is the inverse matrix, then G[i,j] = L[i,j] (-1)i + j where i=1,2,3,... is the row index and j=1,2,3,... is the column index. There are several ways to form the inverse matrix from L. The following statements create a "sign matrix," H, defined by H[i,j] = (-1)i + j. The inverse matrix for L is the elementwise multiplication of L and H, as follows:

/* There is an EXACT inverse for L */ i = row(P); /* n x n matrix where i_th row equals i */ j = col(P); /* n x n matrix where j_th col equals j */ Sign = (-1)##(i+j); /* n x n sign matrix +/-1 */ Linv = Sign#L; print Linv[F=5. r=rname c=cname L="Inverse of L"];

The matrix was shown earlier. You can directly solve the system P v = b by using the inverse of the Cholesky roots and two well-known results from linear algebra:

• The inverse of the product equals the (reversed) product of the inverses: inv(A*B) = inv(B)*inv(A)
• The inverse of the transpose equals the transpose of the inverse: inv(A) = inv(A)
Putting these results together gives a direct method to solve a linear system that involves a Pascal matrix, as follows:

/* The inverse of the product equals the (reversed) product of the inverses. The inverse of the transpose equals the transpose of the inverse. To solve P*v = b, use v = inv(L*L)*b = inv(L)*inv(L)*b = (inv(L))*inv(L)*b */ v = Linv*Linv*b; print v;

The solution vector is identical to the solutions from other methods. However, having an explicit formula for the inverse means that you can solve a linear system that involves a Pascal matrix without performing any numerical linear algebra. For a Pascal matrix (of the second kind) You can construct the Cholesky roots without doing any linear algebra because the roots are the Pascal matrix of the first kind. Furthermore, you can explicitly construct the inverse of the Cholesky roots without doing any linear algebra. Put these facts together and you can solve linear systems that involve Pascal matrices by using only matrix multiplication.

Summary

You might never need to solve a system of linear equations that involves a Pascal matrix, but there are some general principles that statistical programmers can learn from this article:

• If you have a matrix that has a special structure, exploit that structure. Examples include symmetric, banded, tridiagonal, and special "named" matrices. Often you can solve linear systems more efficiently if you use a specific solver rather than a general solver.
• If your matrix is a "named" matrix, read the literature to find out if there are special formulas that enable you to directly write down the form of a root or inverse matrix. Doing so can lead to an efficient way to solve linear systems.
• An interactive matrix language such as SAS/IML enables you to implement mathematical formulas in a few lines of code.

For more information about Pascal matrices, see Edelman, A. and Strang, G. (2004) "Pascal Matrices," The American Mathematical Monthly.

The post Pascal matrices and inverses appeared first on The DO Loop.

Some matrices are so special that they have names. The identity matrix is the most famous, but many are named after a researcher who studied them such as the Hadamard, Hilbert, Sylvester, Toeplitz, and Vandermonde matrices. This article is about the Pascal matrix, which is formed by using elements from Pascal's triangle.

As I've previously discussed, Pascal's triangle has many interesting properties. If you put Pascal's triangle into the elements of a matrix, there are two ways you can do it. The first is to use a lower triangular matrix, L: put Pascal's triangle in the lower triangular elements and zeros in the upper triangular elements. I call this triangular matrix a Pascal matrix of the first kind. The second is to use a full matrix, P, for which the i_th antidiagonal is the i_th row of Pascal's triangle. I call this full matrix a Pascal matrix of the second kind. These two definitions are shown below for Pascal matrices that have N=8 rows:

Interestingly, these two matrices are intimately connected. I have previously discussed that the full matrix (which is symmetric) has a Cholesky factorization, and that factorization is precisely P = L Lt. That's pretty cool!

Why Cholesky factorization is important

I've written about the geometry of the Cholesky transformation and how it correlates and uncorrelates variables in statistical applications. But from the perspective of numerical linear algebra, matrix factorizations are useful because they enable us to solve certain problems efficiently. For example, if you have a symmetric matrix, P, then the Cholesky root enables you to solve the linear system P*v = b by solving two linear systems:

• Solve L w = b for the vector w.
• Solve Lt v = w for the vector v, which is also the solution to P v = b.

In the SAS IML language, you can use the TRISOLV function to solve linear systems. You use various flags to tell the TRISOLV function whether to solve a lower- or upper-triangular system, as follows:

b = {-4, 24, 150, 528, 1452, 3432, 7293, 14300}; v1 = solve(P, b); /* general solution: solve P*v = b */   /* If you have Cholesky root (P=L*L), you can use TRISOLV */ w = trisolv(4, L, b); /* solve L *w = b for w */ v = trisolv(3, L, w); /* solve L*v = w for v */ print v1 v;

As the output shows, the two different ways to solve the problem give similar solutions. In fact, when you call the SOLVE function, it internally performs a decomposition and solves two triangular systems, so the two perform similar computations, although the SOLVE method does not exploit the symmetry of the Pascal matrix.

An explicit inverse for the Pascal matrix

Some matrices have names because they have amazing mathematical properties. We've already seen that Pascal matrices (of the second kind) have Cholesky roots that are also Pascal matrices (of the first kind). But did you know that you can explicitly write down the inverse matrix for the Pascal matrices of the first kind? The following figure shows the inverse of the 8 x8 8 Pascal matrices of the first kind:

The figure shows that the inverse of L looks similar to L itself except that the signs of certain elements are negative. If G is the inverse matrix, then G[i,j] = L[i,j] (-1)i + j where i=1,2,3,... is the row index and j=1,2,3,... is the column index. There are several ways to form the inverse matrix from L. The following statements create a "sign matrix," H, defined by H[i,j] = (-1)i + j. The inverse matrix for L is the elementwise multiplication of L and H, as follows:

/* There is an EXACT inverse for L */ i = row(P); /* n x n matrix where i_th row equals i */ j = col(P); /* n x n matrix where j_th col equals j */ Sign = (-1)##(i+j); /* n x n sign matrix +/-1 */ Linv = Sign#L; print Linv[F=5. r=rname c=cname L="Inverse of L"];

The matrix was shown earlier. You can directly solve the system P v = b by using the inverse of the Cholesky roots and two well-known results from linear algebra:

• The inverse of the product equals the (reversed) product of the inverses: inv(A*B) = inv(B)*inv(A)
• The inverse of the transpose equals the transpose of the inverse: inv(A) = inv(A)
Putting these results together gives a direct method to solve a linear system that involves a Pascal matrix, as follows:

/* The inverse of the product equals the (reversed) product of the inverses. The inverse of the transpose equals the transpose of the inverse. To solve P*v = b, use v = inv(L*L)*b = inv(L)*inv(L)*b = (inv(L))*inv(L)*b */ v = Linv*Linv*b; print v;

The solution vector is identical to the solutions from other methods. However, having an explicit formula for the inverse means that you can solve a linear system that involves a Pascal matrix without performing any numerical linear algebra. For a Pascal matrix (of the second kind) You can construct the Cholesky roots without doing any linear algebra because the roots are the Pascal matrix of the first kind. Furthermore, you can explicitly construct the inverse of the Cholesky roots without doing any linear algebra. Put these facts together and you can solve linear systems that involve Pascal matrices by using only matrix multiplication.

Summary

You might never need to solve a system of linear equations that involves a Pascal matrix, but there are some general principles that statistical programmers can learn from this article:

• If you have a matrix that has a special structure, exploit that structure. Examples include symmetric, banded, tridiagonal, and special "named" matrices. Often you can solve linear systems more efficiently if you use a specific solver rather than a general solver.
• If your matrix is a "named" matrix, read the literature to find out if there are special formulas that enable you to directly write down the form of a root or inverse matrix. Doing so can lead to an efficient way to solve linear systems.
• An interactive matrix language such as SAS/IML enables you to implement mathematical formulas in a few lines of code.

For more information about Pascal matrices, see Edelman, A. and Strang, G. (2004) "Pascal Matrices," The American Mathematical Monthly.

The post Pascal matrices and inverses appeared first on The DO Loop.

You can use the Cholesky decomposition of a covariance matrix to simulate data from a correlated multivariate normal distribution. This method is encapsulated in the RANDNORMAL function in SAS/IML software, but you can also perform the computations manually by calling the ROOT function to get the Cholesky root and then use matrix multiplication to correlate multivariate norm (MVN) data, as follows:

%let d = 12; /* number of variables */ %let N = 500; /* number of observations */ proc iml; d = &d; /* want to generate d variables that are MVN */ N = &N; /* sample size */   /* easy to generate uncorrelated MVN variables */ call randseed(1234); z = j(d, N); call randgen(z, "Normal"); /* z is MVN(0,I(d)) */   /* use Cholesky root to transform to correlated variables */ Sigma = toeplitz(d:1); /* example of covariance matrix */ G = root(Sigma); /* the Cholesky root of the full covariance matrix */ x = G*z; /* x is MVN(0, Sigma) */ title "Two Components of MVN(0,Sigma) Data"; call scatter(x[1,], x[2,]) label={x1 x2};

The simulation creates d=12 correlated variables and 500 observations. The first two variables are plotted against each other so that you can see that they are correlated. The other pairs of variables are also correlated according to the entries in the Σ covariance matrix.

Generating a huge number of MVN variables

In the comments to one of my previous articles, a SAS programmer asked whether it is possible to generate MVN data even if the covariance matrix is so large that it cannot be factored by the ROOT function. For example, suppose you want to generate d=20,000 MVN variables. A d x d covariance matrix of that size requires 3 GB of RAM, and on some operating system you cannot form the Cholesky root of a matrix that large. However, I have developed an algorithm that enables you to deal with the covariance matrix (Σ) as a block matrix.

To introduce the algorithm, represent the Σ matrix as a 2 x 2 block matrix. (See the last section for a more general representation.) For a 2 x 2 block algorithm, you only need to form the Cholesky roots of matrices of size (d/2), which are 10000 x 10000 matrices for this example. Matrices of that size require only 0.75 GB and can be quickly factored. Of course, there is no such thing as a free lunch: The new algorithm is more complicated and requires additional computations, albeit on smaller matrices.

Let's look at how to simulate MVN data by treating Σ as a block matrix of size d/2. For ease of exposition, assume d is even and each block is of size d/2. However, the algorithm easily handles the case where d is odd. When d is odd, choose the upper blocks to have floor(d/2) rows and choose the lower clocks to have ceil(d/2) rows. The SAS code (below) handles the general case. (In fact, the algorithm doesn't care about the block size. For any p in (0, 1), one block can be size pd and the other size (1-p)d.

In block form, the covariance matrix is
$\Sigma = \begin{bmatrix}A & B\\B^\prime & C\end{bmatrix}$
There is a matrix decomposition called the LDLT decomposition that enables you to write Σ as the product of three block matrices:

$\begin{bmatrix}A & B\\B^\prime & C\end{bmatrix} = \begin{bmatrix}I & 0\\B^\prime A^{-1} & I\end{bmatrix} \begin{bmatrix}A & 0\\0 & S\end{bmatrix} \begin{bmatrix}I & A^{-1}B\\0 & I\end{bmatrix}$
where $S = C - B^\prime A^{-1} B$ is the Schur complement of the block matrix C. There is a theorem that says that if Σ is symmetric positive definite (SPD), then so is every principal submatrix, so A is SPD. Thus, we can use the Cholesky decomposition and write $A = G^\prime_A G_A$ for an upper triangular matrix, $G_A$. By reordering the variables, C is SPD as well. I don't have a proof that S is PD, but it seems to be true in the examples I've tried, so let's list that as an assumption and assume that $S = G^\prime_S G_S$ for an upper triangular matrix $G_S$.

Using these definitions, the Cholesky decomposition of Σ can be written in block form as
$\Sigma = \begin{bmatrix}G^\prime_A & 0\\B^\prime G^{-1}_A & G^\prime_S\end{bmatrix} \begin{bmatrix}G_A & (G^\prime_A)^{-1} B \\ 0 & G_S\end{bmatrix}$

To simulate MVN data, we let z be uncorrelated MVN(0, I) data and define
$x = \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix}G^\prime_A & 0\\B^\prime G^{-1}_A & G^\prime_S\end{bmatrix} \begin{bmatrix}z_1 \\ z_2\end{bmatrix} = \begin{bmatrix}G^\prime_A z_1 \\ B^\prime G^{-1}_A z_1 + G^\prime_S z_2\end{bmatrix}$

The previous equation indicates that you can generate correlated MVN(0, Σ) data if you know the Cholesky decomposition of the upper-left block of Σ (which is GA) and the Cholesky decomposition of the Schur complement (which is GS). Each of the matrices in these equations is of size d/2, which means that the computations are performed on matrices that are half the size of Σ. However, notice that generating the x2 components requires some extra work: the Schur complement involves the inverse of A and the formula for x2 also involves the inverse of GA. These issues are not insurmountable, but it means that the block algorithm is more complicated than the original algorithm, which merely multiplies z by the Cholesky root of Σ.

A SAS program for block simulation of MVN data

The SAS/IML program in a previous section shows how to generate correlated MVN(0, Σ) data (x) from uncorrelated MVN(0, I) data (z) by using the Cholesky root (G) of Σ. This section shows how to get exactly the same x values by performing block operations. For the block operations, each matrix is size d/2, so has 1/4 of the elements of Σ.

The first step is to represent Σ and z in block form.

/* The Main Idea: get the same MVN data by performing block operations on matrices that are size d/2 */ /* 1. Represent Sigma={A B, B C} and z={z1, z2} in block form */ d1 = ceil(d/2); /* half the variables */ d2 = d - d1; /* the other half of the vars */ dp1 = d1 + 1;   A = Sigma[1:d1, 1:d1]; /* break up the symmetric (d x d) covariance */ B = Sigma[1:d1, dp1:d]; /* matrix into blocks of size d/2 */ C = Sigma[dp1:d, dp1:d]; z1 = z[1:d1, ]; /* extract the first d1 obs for MVN(0,I) data */ z2 = z[dp1:d, ]; /* extract the last d2 obs for MVN(0,I) data */

It is easy to generate x1, which contains the first d/2 components of the MVN(0, Σ) simulated data. You simply use the Cholesky decomposition of A, which is the upper-left block of Σ:

/* 2. Compute Cholesky root of A and compute x1 z1 */ G_A = root(A); /* Cholesky of upper left block */ x1 = G_A*z1; /* generate first half of variables */

It is not as easy to generate x2, which contains the last d/2 components. In block form, $x_2 = v + u$, where $v = B^\prime G_A^{-1} z_1$ and $u = G_S^\prime z_2$, and where $S = G_S^\prime G_S$ is the Cholesky decomposition of the Shur complement. This is shown in the following statements:

/* 3. Generate the second half of variables */ S = C - B*inv(A)*B; /* S is the Schur complement */ G_S = root(S); /* Cholesky root of Schur complement */ v = B*inv(G_A)*z1; u = G_S*z2; x2 = v + u;   /* verify that the computation gives exactly the same simulated data */ print (max(abs(x[1:d1,] - x1)))[L="Diff(x - x1)"]; print (max(abs(x[dp1:d,] - x2)))[L="Diff(x - x2)"];

The output shows that the block computation, which uses multiple matrices of size d/2, gives exactly the same answer as the original computation, which uses matrices of size d.

Analysis of the block algorithm

Now that the algorithm runs on a small example, you can increase the dimensions of the matrices. For example, try rerunning the algorithm with the following parameters:

%let d = 5000; /* number of variables */ %let N = 1000; /* number of observations */

When I run this larger program on my PC, it takes about 2.3 seconds to simulate the MVN data by using the full Cholesky decomposition. It takes about 17 seconds to simulate the data by using the block method. Most of the time for the block method is spent on the computation v = B*inv(G_A)*z1, which takes about 15 seconds. So if you want to improve the performance, you should focus on improving that computation.

You can improve the performance of the block algorithm in several ways:
• You don't need to find the inverse of A. You have the Cholesky root of A, which is triangular, so you can define H = inv(GA)*B and compute the Schur complement as C – H*H.
• You don't need to explicitly form any inverses. Ultimately, all these matrices are going to operate on the vector z2, which means that you can simulate the data by using only matrix multiplication and solving linear equations.

I tried several techniques, but I could not make the block algorithm competitive with the performance of the original Cholesky algorithm. The issue is that the block algorithm computes two Cholesky decompositions, solves some linear equations, and performs matrix multiplication. Even though each computation is on a matrix that is half the size of the original matrix, the additional computations result in a longer runtime.

The algorithm generalizes to other block sizes. I will outline the case for a 3 x 3 block matrix. The general case of k x k blocks follows by induction.

The figure to the right shows the Σ covariance matrix as a 3 x 3 block matrix. By using 2 x 2 block algorithm, you can obtain multivariate normal vectors x1 and x2, which correspond to the covariances in the four blocks in the upper-left corner of the matrix. You can then view the upper four blocks as a single (larger) block, thus obtaining a 2 x 2 block matrix where the upper-left block has size 2d/3 and the lower-right block (Σ33) has size d/3.

There are many details to work through, but the main idea is that you can repeat the computation on this new block matrix where the block sizes are unequal. To proceed, you need the Cholesky decomposition of the large upper-left block, but we have already shown how to compute that in terms of the Cholesky root of Σ11, its inverse, and the Cholesky root of the Schur complement of Σ22. You also need to form the Schur complement of Σ33, but that is feasible because we have the Cholesky decomposition of the large upper-left block.

I have not implemented the details because it is not clear to me whether three or more blocks provides a practical advantage. Even if I work out the details for k=3, the algebra for k > 3 blocks seems to be daunting.

Summary

It is well known that you can use the Cholesky decomposition of a covariance matrix to simulate data from a correlated multivariate normal distribution. This article shows how to break up the task by using a block Cholesky method. The method is implemented for k=2 blocks. The block method requires more matrix operations, but each operation is performed on a smaller matrix. Although the block method takes longer to run, it can be used to generate a large number of variables by generating k variables at a time, where k is small enough that the k x k matrices can be easily handled.

The block method generalizes to k > 2 blocks, but it is not clear whether more blocks provide any practical benefits.

The post A block-Cholesky method to simulate multivariate normal data appeared first on The DO Loop.

In a previous article, I discussed various ways to solve a least-square linear regression model. I discussed the SWEEP operator (used by many SAS regression routines), the LU-based methods (SOLVE and INV in SAS/IML), and the QR decomposition (CALL QR in SAS/IML). Each method computes the estimates for the regression coefficients, b, by using the normal equations (XX) b = Xy, where X is a design matrix for the data.

This article describes a QR-based method that does not use the normal equations but works directly with the overdetermined system X b = y. It then compares the performance of the direct QR method to the computational methods that use the normal equations.

The QR solution of an overdetermined system

As shown in the previous article, you can use the QR algorithm to solve the normal equations. However, if you search the internet for "QR algorithm and least squares," you find many articles that show how you can use the QR decomposition to directly solve the overdetermined system X b = y. How does the direct QR method compare to the methods that use the normal equations?

Recall that X is an n x m design matrix, where n > m and X is assumed to be full rank of m. For simplicity, I will ignore column pivoting. If you decompose X = QRL, the orthogonal matrix Q is n x n, but the matrix RL is not square. ("L" stands for "long.") However, RL is the vertical concatenation of a square triangular matrix and a rectangular matrix of zeros:
${\bf R_L} = \begin{bmatrix} {\bf R} \\ {\bf 0} \end{bmatrix}$
If you let Q1 be the first m columns of Q and let Q2 be the remaining (n-m) columns, you get a partitioned matrix equation:
$\begin{bmatrix} {\bf Q_1} & {\bf Q_2} \end{bmatrix} \begin{bmatrix} {\bf R} \\ {\bf 0} \end{bmatrix} {\bf b} = {\bf y}$
If you multiply both sides by Q (the inverse of the orthogonal matrix, Q), you find out that the important matrix equation to solve is ${\bf R b} = {\bf Q_1^{\prime} y}$. The vector ${\bf Q_1^{\prime} y}$ is the first m rows of the vector ${\bf Q^{\prime} y}$. The QR call in SAS/IML enables you to obtain the triangular R matrix and the vector Q y directly from the data matrix and the observed vector. The following program uses the same design matrix as for my previous article. Assuming X has rank m, the call to the QR subroutine returns the m x m triangular matrix, R, and the vector Q y. You can then extract the first m rows of that vector and solve the triangular system, as follows:

/* Use PROC GLMSELECT to write a design matrix */ proc glmselect data=Sashelp.Class outdesign=DesignMat; class Sex; model Weight = Height Sex Height*Sex/ selection=none; run;   proc iml; use DesignMat; read all var {'Intercept' 'Height' 'Sex_F' 'Height_Sex_F'} into X; read all var {'Weight'} into Y; close;   /* The QR algorithm can work directly with the design matrix and the observed responses. */ call QR(Qty, R, piv, lindep, X, , y); /* return Q*y and R (and piv) */ m = ncol(X); c = QTy[1:m]; /* we only need the first m rows of Q*y */ b = trisolv(1, R, c, piv); /* solve triangular system */ print b[L="Direct QR" F=D10.4];

This is the same least-squares solution that was found by using the normal equations in my previous article.

Compare the performance of least-squares solutions

How does this direct method compare with the methods that use the normal equations? You can download a program that creates simulated data and runs each algorithm to estimate the least-squares regression coefficients. The simulated data has 100,000 observations; the number of variables is chosen to be m={10, 25, 50, 75, 100, 250, 500}. The program uses SAS/IML 15.1 on a desktop PC to time the algorithms. The results are shown below:

The most obvious feature of the graph is that the "Direct QR" method that is described in this article is not as fast as the methods that use the normal equations. For 100 variables and 100,000 observations, the "Direct QR" call takes more than 12 seconds on my PC. (It's faster on a Linux server). The graph shows that the direct method shown in this article is not competitive with the normal-equation-based algorithms when using the linear algebra routines in SAS/IML 15.1.

The graph shows that the algorithms that use the normal equations are relatively faster. For the SAS/IML calls on my PC, you can compute the regression estimates for 500 variables in about 2.6 seconds. The graph has a separate line for the time required to form the normal equations (which you can think of as forming the XX matrix). Most of the time is spent computing the normal equations; only a fraction of the time is spent actually solving the normal equations. The following table shows computations on my PC for the case of 500 variables:

The table shows that it takes about 2.6 seconds to compute the XX matrix and the vector Xy. After you form the normal equations, solving them is very fast. For this example, the SOLVE and INV methods take only a few milliseconds to solve a 500 x 500. The QR algorithms take 0.1–0.2 seconds longer. So, for this example, forming the normal equations accounts for more than 90% of the total time.

SAS regression procedures

These results are not the best that SAS can do. SAS/IML is a general-purpose tool. SAS regression procedures like PROC REG are optimized to compute regression estimates even faster. They also use the SWEEP operator, which is faster than the SOLVE function. For more than 20 years, SAS regression procedures have used multithreaded computations to optimize the performance of regression computations (Cohen, 2002). More recently, SAS Viya added the capability for parallel processing, which can speed up the computations even more. And, of course, they compute much more than only the coefficient estimates! They also compute standard errors, p-values, related statistics (MSE, R square,....), diagnostic plots, and more.

Summary

This article compares several methods for obtaining least-squares regression estimates. It uses simulated data where the number of observations is much greater than the number of variables. It shows that methods that use the normal equations are faster than a "Direct QR" method, which does not use the normal equations. When you use the normal equations, most of the time is spent actually forming the normal equations. After you have done that, the time required to solve the system is relatively fast.

The post Compare computational methods for least squares regression appeared first on The DO Loop.

In computational statistics, there are often several ways to solve the same problem. For example, there are many ways to solve for the least-squares solution of a linear regression model. A SAS programmer recently mentioned that some open-source software uses the QR algorithm to solve least-squares regression problems and asked how that compares with SAS. This article shows how to estimate a least-squares regression model by using the QR method in SAS. It shows how to use the QR method in an efficient way.

Solving the least-squares problem

Before discussing the QR method, let's briefly review other ways to construct a least-squares solution to a regression problem. In a regression problem, you have an n x m data matrix, X, and an n x 1 observed vector of responses, y. When n > m, this is an overdetermined system and typically there is no exact solution. Consequently, the goal is to find the m x 1 vector of regression coefficients, b, such that the predicted values (X b) are as close as possible to the observed values. If b is a least-squares solution, then b minimizes the vector norm || X b - y ||2, where ||v||2 is the sum of the squares of the components of a vector.

Using calculus, you can show that the solution vector, b, satisfies the normal equations (XX) b = Xy. The normal equations have a unique solution when the crossproduct matrix XX is nonsingular. Most numerical algorithms for least-squares regression start with the normal equations, which have nice numerical properties that can be exploited.

Creating a design matrix

The first step of solving a regression problem is to create the design matrix. For continuous explanatory variables, this is easy: You merely append a column of ones (the intercept column) to the matrix of the explanatory variables. For more complicated regression models that contain manufactured effects such as classification effects, interaction effects, or spline effects, you need to create a design matrix. The GLMSELECT procedure is the best way to create a design matrix for fixed effects in SAS. The following call to PROC GLMSELECT writes the design matrix to the DesignMat data set. The call to PROC REG estimates the regression coefficients:

proc glmselect data=Sashelp.Class outdesign=DesignMat; /* write design matrix */ class Sex; model Weight = Height Sex Height*Sex/ selection=none; run; /* test: use the design matrix to estimate the regression coefficients */ proc reg data=DesignMat plots=none; model Weight = Height Sex_F Height_Sex_F; run;

The goal of the rest of this article is to reproduce the regression estimates by using various other linear algebra operations. For each method, we want to produce the same estimates: {-141.10, 3.91, 23.73, -0.49}.

Linear regression: A linear algebraic approach

I've previously discussed the fact that most SAS regression procedures use the sweep operator to construct least-squares solutions. Another alternative is to use the SOLVE function in SAS/IML:

proc iml; use DesignMat; read all var {'Intercept' 'Height' 'Sex_F' 'Height_Sex_F'} into X; read all var {'Weight'} into Y; close;   /* form normal equations */ XpX = X*X; Xpy = X*y;   /* an efficient numerical solution: solve a particular RHS */ b = solve(XpX, Xpy); print b[L="SOLVE" F=D10.4];

The SOLVE function is very efficient and gives the same parameter estimates as the SWEEP operator (which was used by PROC REG). Using the SOLVE function on the system A*b=z is mathematically equivalent to using the matrix inverse to find b=A-1z. However, the INV function explicitly forms the inverse matrix, whereas the SOLVE function does not. Consequently, the SOLVE function is faster and more efficient than using the following SAS/IML statement:

/* a less efficient numerical solution */ Ainv = inv(XpX); /* explicitly form the (m x m) inverse matrix */ b = Ainv*Xpy; /* apply the inverse matrix to RHS */ print b[L="INV" F=D10.4];

For a performance comparison of the SOLVE and INV functions, see the article, "Solving linear systems: Which technique is fastest?"

An introduction to the QR method for solving linear equations

There are many ways to solve the normal equations. The SOLVE (and INV) functions use the LU decomposition. An alternative is the QR algorithm, which is slower but can be more accurate for ill-conditioned systems. You can apply the QR decomposition to the normal equations or to the original design matrix. This section discusses the QR decomposition of the design matrix. A subsequent article discusses decomposing the data matrix directly.

Let's see how the QR algorithm solves the normal equations. You can decompose the crossproduct matrix as the product of an orthogonal matrix, Q, and an upper-triangular matrix, R. If A = XX, then A = QR. (I am ignoring column pivoting, which is briefly discussed below.) The beauty of an orthogonal matrix is that the transpose equals the inverse. This means that you can reduce the system A b = (X y) to a triangular system R b = Q (X y), which is easily solved by using the TRISOLV function in SAS/IML.

The SAS/IML version of the QR algorithm is a version known as "QR with column pivoting." Using column pivoting improves solving rank-deficient systems and provides better numerical accuracy. However, when the algorithm uses column pivoting, you are actually solving the system AP = QR, where P is a permutation matrix. This potentially adds some complexity to dealing with the QR algorithm. Fortunately, the TRISOLVE function supports column pivoting. If you use the TRISOLV function, you do not need to worry about whether pivoting occurred or not.

Using the QR method in SAS/IML

Recall from the earlier example that it is more efficient to use the SOLVE function than the INV function. The reason is that the INV function explicitly constructs an m x m matrix, which then is multiplied with the right-hand side (RHS) vector to obtain the answer. In contrast, the SOLVE function never forms the inverse matrix. Instead, it directly applies transformations to the RHS vector.

The QR call in SAS/IML works similarly. If you do not supply a RHS vector, v, then the QR call returns the full m x m matrix, Q. You then must multiply Q v yourself. If you do supply the RHS vector, then the QR call returns Q v without ever forming Q. This second method is more efficient,

See the documentation of the QR call for the complete syntax. The following call uses the inefficient method in which the Q matrix is explicitly constructed:

/* It is inefficient to factor A=QR and explicitly solve the equation R*b = Q*c */ call QR(Q, R, piv, lindep, XpX); /* explicitly form the (m x m) matrix Q */ c = Q*Xpy; /* apply the inverse Q matrix to RHS */ b = trisolv(1, R, c, piv); /* equivalent to b = inv(R)*Q*c */ print b[L="General QR" F=D10.4];

In contrast, the next call is more efficient because it never explicitly forms the Q matrix:

/* It is more efficient to solve for a specific RHS */ call QR(c, R, piv, lindep, XpX, ,Xpy); /* returns c = Q*Xpy */ b = trisolv(1, R, c, piv); /* equivalent to b = inv(R)*Q*c */ print b[L="Specific QR" F=D10.4];

The output is not shown, but both calls produce estimates for the regression coefficients that are exactly the same as for the earlier examples. Notice that the TRISOLV function takes the pivot vector (which represents a permutation matrix) as the fourth argument. That means you do not need to worry about whether column pivoting occurred during the QR algorithm.

QR applied to the design matrix

As mentioned earlier, you can also apply the QR algorithm to the design matrix, X, and the QR algorithm will return the least-square solution without ever forming the normal equations. This is shown in a subsequent article, which also compares the speed of the various methods for solving the least-squares problem.

Summary

This article discusses three ways to solve a least-squares regression problem. All start by constructing the normal equations: (XX) b = X y. The solution of the normal equations (b) is the vector that minimizes the squared differences between the predicted values, X b, and the observed responses, y. This article discusses

• the SWEEP operator, which is used by many SAS regression procedures
• the SOLVE and INV function, which use the LU factorization
• the QR call, which implements the QR algorithm with column pivoting

A follow-up article compares the performance of these methods and of another QR algorithm, which does not use the normal equations.

The post The QR algorithm for least-squares regression appeared first on The DO Loop.

Look at the following matrices. Do you notice anything that these matrices have in common?

If you noticed that the rows of each matrix are arithmetic progressions, good for you. For each row, there is a constant difference (also called the "increment") between adjacent elements. For these examples:

• In the first matrix, each row is an arithmetic progression with a difference of 1. That is, the value of the (j+1)th column is one more than the value of the j_th column.
• In the second matrix, each row is an arithmetic progression with a difference of 2. That is, the value of the (j+1)th column is two more than the value of the j_th column.
• In the third matrix, each row is an arithmetic progression, but the increments for the rows are not identical. The increment for the first row is -3. The increment for the second row is +1. The increments for the remaining rows are -2 and +3, respectively.

However, there is another characteristic that these matrices have in common: they are all singular matrices. It turns out this is not a coincidence. An n x n matrix (n>2) is singular if each row is an arithmetic progression.

Why are these matrices singular?

Recall that a matrix will be singular if one of the columns can be written as a linear combination of other columns. For these three matrices, you can use mental arithmetic to verify that the third column is a linear combination of the first two columns. Specifically, you can verify that the third column can be written as
X3 = 2 X2 – X1
where X1, X2, X3, ... are the columns of a matrix. Thus, these matrices are singular.

It turns out that this relationship holds in general for any matrix whose rows are arithmetic progressions. Suppose the elements of the i_th row satisfy the progression with increment di so that the elements of the i_th row satisfy X[i, j+1] = X[i, j] + d[i]  for j=1,2,...,n. Let d be the vector of differences. Then
X2 = X1 + d
which you can solve for d to obtain
d = X2 – X1.
By substitution,
X3 = X2 + d = X2 + (X2 – X1), or
X3 = 2*X2 – X1.
Therefore, the third column is always a linear combination of the first two columns, and the coefficients are independent of the differences in the arithmetic progressions of the rows. It doesn't matter what increments are used in the arithmetic progressions of the rows.

In fact, even more is true. You can use similar arguments to show that the j_th column can be written as a linear combination of the first two columns for any j > 2. Thus, regardless of the matrix size, matrices whose rows are arithmetic progressions are rank-2 matrices: they have only two linearly independent columns.

1. It's mathematically interesting, and I never noticed it before.
2. When I write a program, I often need to quickly create an example matrix to test the program. I often use a matrix such as {1 2 3, 4 5 6, 7 8 9} or SHAPE(1:25,5,5). This article shows that both matrices are singular, which is important to know if the program requires a nonsingular matrix.
3. Lastly, I wanted to emphasize the difference between a "random matrix" and an "arbitrary matrix." In a previous article, I discussed the probability that a random matrix is singular and concluded that a random matrix is almost always nonsingular. Here, "random matrix" means that the matrix elements are drawn randomly from a continuous probability distribution such as the uniform or normal distribution.

However, the other day I received a comment on that blog post. Essentially, the comment said, "I wrote down this matrix at random, but it is singular. How do you explain that?" The matrix was similar to the matrices at the top of this article. The answer is that the elements of the matrix were not generated randomly. The rows of the matrix form an arithmetic progression (very NON-random!), which, as we have seen, implies that the matrix is singular.

The post A matrix is singular if its rows are arithmetic sequences appeared first on The DO Loop.

A data analyst recently asked a question about restricted least square regression in SAS. Recall that a restricted regression puts linear constraints on the coefficients in the model. Examples include forcing a coefficient to be 1 or forcing two coefficients to equal each other. Each of these problems can be solved by using PROC REG in SAS. The analyst's question was, "Why can I use PROC REG to solve a restricted regression problem when the restriction involves EQUALITY constraints, but PROC REG can't solve a regression problem that involves an INEQUALITY constraint?"

This article shows how to use the RESTRICT statement in PROC REG to solve a restricted regression problem that has linear constraints on the parameters. I also use geometry and linear algebra to show that solving an equality-constrained regression problem is mathematically similar to solving an unrestricted least squares system of equations. This explains why PROC REG can solve equality-constrained regression problems. In a future article, I will show how to solve regression problems that involve inequality constraints on the parameters.

The geometry of restricted regression

The linear algebra for restricted least squares regression gets messy, but the geometry is easy to picture. A schematic depiction of restricted regression is shown to the right.

Geometrically, ordinary least-squares (OLS) regression is the orthogonal projection of the observed response (Y) onto the column space of the design matrix. (For continuous regressors, this is the span of the X variables, plus an "intercept column.") If you introduce equality constraints among the parameters, you are restricting the solution to a linear subspace of the span (shown in green). But the geometry doesn't change. The restricted least squares (RLS) solution is still a projection of the observed response, but this time onto the restricted subspace.

In terms of linear algebra, the challenge is to write down the projection matrix for the restricted problem. As shown in the diagram, you can obtain the RLS solution in two ways:

• Starting from Y: You can directly project the Y vector onto the restricted subspace. The linear algebra for this method is straightforward and is often formulated in terms of Lagrange multipliers.
• Starting from the OLS solution: If you already know the OLS solution, you can project that solution vector onto the restricted subspace. Writing down the projection matrix is complicated, so I refer you to the online textbook Practical Econometrics and Data Science by A. Buteikis.

Restricted models and the TEST statement in PROC REG

Let's start with an example, which is based on an example in the online textbook by A. Buteikis. The following SAS DATA step simulates data from a regression model
Y = B0 + B1*X1 + B2*X2 + B3*X3 + B4*X4 + ε
where B3 = B1 and B4 = –2*B2 and ε ~ N(0, 3) is a random error term.

/* Model: Y = B0 + B1*X1 + B2*X2 + B3*X3 + B4*X4 where B3 = B1 and B4 = -2*B2 */ data RegSim; array beta[0:4] (-5, 2, 3, 2, -6); /* parameter values for regression coefficients */ N = 100; /* sample size */ call streaminit(123); do i = 1 to N; x1 = rand("Normal", 5, 2); /* X1 ~ N(5,2) */ x2 = rand("Integer", 1, 50); /* X2 ~ random integer 1:50 */ x3 = (i-1)*10/(N-1); /* X3 = sequence: 0 to 10 by 1/N */ x4 = rand("Normal", 10, 3); /* X4 ~ N(10, 3) */ eps = rand("Normal", 0, 3); /* RMSE = 3 */ Y = beta[0] + beta[1]*x1 + beta[2]*x2 + beta[3]*x3 + beta[4]*x4 + eps; output; end; run;

Although there are five parameters in the model (B0-B4), there are only three free parameters because the values of B2 and B4 are determined by the values of B1 and B4, respectively.

If you suspect that a parameter in your model is a known constant or is equal to another parameter, you can use the RESTRICT statement in PROC REG to incorporate that restriction into the model. Before you do, however, it is usually a good idea to perform a statistical test to see if the data supports the model. For this example, you can use the TEST statement in PROC REG to hypothesize that B3 = B1 and B4 = –2*B2. Recall that the syntax for the TEST statement uses the variable names (X1-X4) to represent the coefficients of the variable. The following call to PROC REG carries out this analysis:

proc reg data=RegSim plots=none; model Y = x1 - x4; test x3 = x1, x4 = -2*x2; run;

The output shows that the parameter estimates for the simulated data are close to the parameter values used to generate the data. Specifically, the root MSE is close to 3, which is the standard deviation for the error term. The Intercept estimate is close to -5. The coefficients for the X1 and X3 term are each approximately 2. The coefficient for X4 is approximately –2 times the coefficient of X2. The F test for the test of hypotheses has a large p-value, so you should not reject the hypotheses that B3 = B1 and B4 = –2*B2.

The RESTRICT statement in PROC REG

If the hypothesis on the TEST statement fails to reject, then you can use the RESTRICT statement to recompute the parameter estimates subject to the constraints. You could rerun the entire analysis or, if you are running SAS Studio in interactive mode, you can simply submit a RESTRICT statement and PROC REG will compute the new parameter estimates without rereading the data:

 restrict x3 = x1, x4 = -2*x2; /* restricted least squares (RLS) */ print; quit;

The new parameter estimates enforce the restrictions among the regression coefficients. In the new model, the coefficients for X1 and X3 are exactly equal, and the coefficient for X4 is exactly –2 times the coefficient for X2.

Notice that the ParameterEstimates table is augmented by two rows labeled "RESTRICT". These rows have negative degrees of freedom because they represent constraints. The "Parameter Estimate" column shows the values of the Lagrange multipliers that are used to enforce equality constraints while solving the least squares system. Usually, a data analyst does not care about the actual values in these rows, although the Wikipedia article on Lagrange multipliers discusses ways to interpret the values.

The linear algebra of restricted regression

This section shows the linear algebra behind the restricted least squares solution by using SAS/IML. Recall that the usual way to compute the unrestricted OLS solution is the solve the "normal equations" (X*X)*b = X*Y for the parameter estimates, b. Although textbooks often solve this equation by using the inverse of the XX matrix, it is more efficient to use the SOLVE function in SAS/IML to solve the equation, as follows:

proc iml; varNames = {x1 x2 x3 x4}; use RegSim; /* read the data */ read all var varNames into X; read all var "Y"; close; X = j(nrow(X), 1, 1) || X; /* add intercept column to form design matrix */ XpX = X*X;   /* Solve the unrestricted OLS system */ b_ols = solve(XpX, X*Y); /* normal equations X*X * b = X*Y */ ParamNames = "Intercept" || varNames; print b_ols[r=ParamNames F=10.5];

The OLS solution is equal to the first (unrestricted) solution from PROC REG.

If you want to require that the solution satisfies B3 = B1 and B4 = –2*B2, then you can augment the XX matrix to enforce these constraints. If L*λ = c is the matrix equation for the linear constraints, then the augmented system is
$\begin{pmatrix} X^{\prime}X & L^{\prime} \\ L & 0 \end{pmatrix} \begin{pmatrix} b_{\small\mathrm{RLS}} \\ \lambda \end{pmatrix} = \begin{pmatrix} X^{\prime}Y \\ c \end{pmatrix}$

The following statements solve the augmented system:

/* Direct method: Find b_RLS by solving augmented system */ /* Int X1 X2 X3 X4 */ L = {0 1 0 -1 0, /* X1 - X3 = 0 */ 0 0 -2 0 -1}; /* -2*X2 - X4 = 0 */ c = {0, 0};   zero = j(nrow(L), nrow(L), 0); A = (XpX || L ) // (L || zero ); z = X*Y // c; b_rls = solve(A, z); rNames = ParamNames||{'Lambda1' 'Lambda2'}; print b_rls[r=rNames F=10.5];

The parameter estimates are the same as are produced by using PROC REG. You can usually ignore the last two rows, which are the values of the Lagrange multipliers that enforce the constraints.

By slogging through some complicated linear algebra, you can also obtain the restricted least squares solution from the OLS solution and the constraint matrix (L). The following equations use the formulas in the online textbook by A. Buteikis to compute the restricted solution:

/* Adjust the OLS solution to obtain the RLS solution. b_RLS = b_OLS - b_adjustment Follows http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/4-4-Multiple-RLS.html */ RA_1 = solve(XpX, L); RA_3 = solve(L*RA_1, L*b_ols - c); b_adj = RA_1 * RA_3; b_rls = b_ols - b_adj; print b_rls[r=ParamNames F=10.5];

This answer agrees with the previous answer but does not compute the Lagrange multipliers. Instead, it computes the RLS solution as the projection of the OLS solution onto the restricted linear subspace.

Summary

This article shows how to use the RESTRICT statement in PROC REG to solve a restricted regression problem that involves linear constraints on the parameters. Solving an equality-constrained regression problem is very similar to solving an unrestricted least squares system of equations. Geometrically, they are both projections onto a linear subspace. Algebraically, you can solve the restricted problem directly or as the projection of the OLS solution.

Although changing one of the equality constraints into an INEQUALITY seems like a minor change, it means that the restricted region is no longer a linear subspace. Ordinary least squares can no longer solve the problem because the solution might not be an orthogonal projection. The next article discusses ways to solve problems where the regression coefficients are related by linear inequalities.

The post Restricted least squares regression in SAS appeared first on The DO Loop.

Matrix balancing is an interesting problem that has a long history. Matrix balancing refers to adjusting the cells of a frequency table to match known values of the row and column sums. One of the early algorithms for matrix balancing is known as the RAS algorithm, but it is also called the raking algorithm in some fields. The presentation in this article is inspired by a paper by Carol Alderman (1992).

The problem: Only marginal totals are known

Alderman shows a typical example (p. 83), but let me give a smaller example. Suppose a troop of six Girl Scouts sells seven different types of cookies. At the end of the campaign, each girl reports how many boxes she sold. Also, the troop leader knows how many boxes of each cookie type were sold. However, no one kept track of how many boxes of each type were sold by each girl.

The situation is summarized by the following 7 x 6 table. We know the row totals, which are the numbers of boxes that were sold for each cookie type. We also know the column totals, which are the numbers of boxes that were sold by each girl. But we do not know the values of the individual cells, which are the type-by-girl totals.

Guessing the cells of a matrix

As stated, the problem usually has infinitely many solutions. For an r x c table, there are r*c cells, but the marginal totals only provide r + c linear constraints. Except for 2 x 2 tables, there are more unknowns than equations, which leads to an underdetermined linear system that (typically) has infinitely many solutions.

Fortunately, often obtain a unique solution if we provide an informed guess for the cells in the table. Perhaps we can ask the girls to provide estimates of the number of boxes of each type that they sold. Or perhaps we have values for the cells from the previous year. In either case, if we can estimate the cell values, we can use a matrix balancing algorithm to adjust the cells so that they match the observed marginal totals.

Let's suppose that the troop leader asks the girls to estimate (from memory) how many boxes of each type they sold. The estimates are shown in the following SAS/IML program. The row and column sums of the estimates do not match the known marginal sums, but that's okay. The program uses subscript reduction operators to compute the marginal sums of the girls' estimates. These are displayed next to the actual totals, along with the differences.

proc iml; /* Known marginal totals: */ u = {260, 214, 178, 148, 75, 67, 59}; /* total boxes sold, by type */ v = {272 180 152 163 134 100}; /* total boxes sold, by girl */   /* We don't know the cell values that produce these marginal totals, but we can ask each girl to estimate how many boxes of each type she sold */ /* G1 G2 G3 G4 G5 G6 */ A = {75 45 40 40 40 30 , /* C1 */ 40 35 45 35 30 30 , /* C2 */ 40 25 30 40 30 20 , /* C3 */ 40 25 25 20 20 20 , /* C4 */ 30 25 0 10 10 0 , /* C5 */ 20 10 10 10 10 0 , /* C6 */ 20 10 0 10 0 0 }; /* C7 */   /* notice that the row/col totals for A do not match the truth: */ rowTotEst = A[ ,+]; /* for each row, sum across all cols */ colTotEst = A[+, ]; /* for each col, sum down all rows */ Items = 'Cookie1':'Cookie7'; /* row labels */ Person = 'Girl1':'Girl6'; /* column labels */   /* print known totals, estimated totals, and difference */ rowSums = (u || rowTotEst || (u-rowTotEst)); print rowSums[r=Items c={'rowTot (u)' 'rowTotEst' 'Diff'}]; colSums = (v // colTotEst // (v-colTotEst)); print colSums[r={'colTot (v)' 'colTotEst' 'Diff'} c=Person];

You can see that the marginal totals do not match the known row and column totals. However, we can use the guesses as an initial estimate for the RAS algorithm. The RAS algorithm will adjust the estimates to obtain a new matrix that DOES satisfy the marginal totals.

Adjusting a matrix to match marginal totals

The RAS algorithm can iteratively adjust the girls' estimates until the row sums and column sums of the matrix match the known row sums and column sums. The resulting matrix will probably not be an integer matrix. It will, however, reflect the structure of the girls' estimates in three ways:

1. It will approximately preserve each girl's recollection of the relative proportions of cookie types that she sold. If a girl claims that she sold many of one type of cookie and few of another, that relationship will be reflected in the balanced matrix.
2. It will approximately preserve the relative proportions of cookie types among girls. If one girl claims that she sold many more of one type of cookie than another girl, that relationship will be reflected in the balanced matrix.
3. It will preserve zeros. If a girl claims that she did not sell any of one type of cookie, the balanced matrix will also contain a zero for that cell.

The RAS algorithm is explained in Alderman (1992). The matrix starts with a nonnegative r x c matrix. Let u be the observed r x 1 vector of row sums. Let v be the observed 1 x c vector of column sums. The main steps of the RAS algorithm are:

1. Initialize X = A.
2. Scale the rows. Let R = u / RowSum(X), where 'RowSum' is a function that returns the vector of row sums. Update X ← R # X. Here R#X indicates elementwise multiplication of each column of X by the corresponding element of R.
3. Scale the columns. Let S = v / ColSum(X), where 'ColSum' is a function that returns the vector of column sums. Update X ← X # S. Here X#S indicates elementwise multiplication of each row of X by the corresponding element of S.
4. Check for convergence. If RowSum(X) is close to u and ColSum(X) is close to v, then the algorithm has converged, and X is the balanced matrix. Otherwise, return to Step 2.

The RAS algorithm is implemented in the following SAS/IML program:

/* Use the RAS matrix scaling algorithm to find a matrix X that approximately satisfies X[ ,+] = u (marginals constraints on rows) X[+, ] = v (marginals constraints on cols) and X is obtained by scaling rows and columns of A */ start RAS(A, u, v); X = A; /* Step 0: Initialize X */ converged = 0; do k = 1 to 100 while(^Converged); /* Step 1: Row scaling: X = u # (X / X[ ,+]); */ R = u / X[ ,+]; X = R # X;   /* Step 2: Column scaling: X = (X / X[+, ]) # v; */ S = v / X[+, ]; X = X # S;   /* check for convergence: u=rowTot and v=colTot */ rowTot = X[ ,+]; /* for each row, sum across all cols */ colTot = X[+, ]; /* for each col, sum down all rows */ maxDiff = max( abs(u-rowTot), abs(v-colTot) ); Converged = (maxDiff < 1e-8); end; return( X ); finish;   X = RAS(A, u, v); print X[r=Items c=Person format=7.1];

The matrix X has row and column sums that are close to the specified values. The matrix also preserves relationships among individual columns and rows, based on the girls' recollections of how many boxes they sold. The algorithm preserves zeros for the girls who claim they sold zero boxes of a certain type. It does not introduce any new zeros for cells.

You can't determine how close the X matrix is to "the truth," because we don't know the true number of boxes that each girl sold. The X matrix depends on the girls' recollections and estimates. If the recollections are close to the truth, then X will be close to the truth as well.

Summary

This article looks at the RAS algorithm, which is one way to perform matrix balancing on a table that has nonnegative cells. "Balancing" refers to adjusting the interior cells of a matrix to match observed marginal totals. The RAS algorithm starts with an estimate, A. The estimate does not need to satisfy the marginal totals, but it should reflect relationships between the rows and columns that you want to preserve, such as relative proportions and zero cells. The RAS algorithm iteratively scales the rows and columns of A to create a new matrix that matches the observed row and column sums.

It is worth mentioning the row scaling could be accumulated into a diagonal matrix, R, and the column scaling into a diagonal matrix, S. Thus, the balanced matrix is the product of these diagonal matrices and the original estimate, A. As a matrix equation, you can write X = RAS. It is this matrix factorization that gives the RAS algorithm its name.

The RAS is not the only algorithm for matrix balancing, but it is the simplest. Another algorithm, called iterative proportional fitting (IPF), will be discussed in a second article.

The post Matrix balancing: Update matrix cells to match row and column sums appeared first on The DO Loop.