Matrix Computations

2月 082021
 

Look at the following matrices. Do you notice anything that these matrices have in common?

If you noticed that the rows of each matrix are arithmetic progressions, good for you. For each row, there is a constant difference (also called the "increment") between adjacent elements. For these examples:

  • In the first matrix, each row is an arithmetic progression with a difference of 1. That is, the value of the (j+1)th column is one more than the value of the j_th column.
  • In the second matrix, each row is an arithmetic progression with a difference of 2. That is, the value of the (j+1)th column is two more than the value of the j_th column.
  • In the third matrix, each row is an arithmetic progression, but the increments for the rows are not identical. The increment for the first row is -3. The increment for the second row is +1. The increments for the remaining rows are -2 and +3, respectively.

However, there is another characteristic that these matrices have in common: they are all singular matrices. It turns out this is not a coincidence. An n x n matrix (n>2) is singular if each row is an arithmetic progression.

Why are these matrices singular?

Recall that a matrix will be singular if one of the columns can be written as a linear combination of other columns. For these three matrices, you can use mental arithmetic to verify that the third column is a linear combination of the first two columns. Specifically, you can verify that the third column can be written as
X3 = 2 X2 – X1
where X1, X2, X3, ... are the columns of a matrix. Thus, these matrices are singular.

It turns out that this relationship holds in general for any matrix whose rows are arithmetic progressions. Suppose the elements of the i_th row satisfy the progression with increment di so that the elements of the i_th row satisfy X[i, j+1] = X[i, j] + d[i]  for j=1,2,...,n. Let d be the vector of differences. Then
X2 = X1 + d
which you can solve for d to obtain
d = X2 – X1.
By substitution,
X3 = X2 + d = X2 + (X2 – X1), or
X3 = 2*X2 – X1.
Therefore, the third column is always a linear combination of the first two columns, and the coefficients are independent of the differences in the arithmetic progressions of the rows. It doesn't matter what increments are used in the arithmetic progressions of the rows.

In fact, even more is true. You can use similar arguments to show that the j_th column can be written as a linear combination of the first two columns for any j > 2. Thus, regardless of the matrix size, matrices whose rows are arithmetic progressions are rank-2 matrices: they have only two linearly independent columns.

Why should you care about this?

I decided to write about this topic for three reasons:

  1. It's mathematically interesting, and I never noticed it before.
  2. When I write a program, I often need to quickly create an example matrix to test the program. I often use a matrix such as {1 2 3, 4 5 6, 7 8 9} or SHAPE(1:25,5,5). This article shows that both matrices are singular, which is important to know if the program requires a nonsingular matrix.
  3. Lastly, I wanted to emphasize the difference between a "random matrix" and an "arbitrary matrix." In a previous article, I discussed the probability that a random matrix is singular and concluded that a random matrix is almost always nonsingular. Here, "random matrix" means that the matrix elements are drawn randomly from a continuous probability distribution such as the uniform or normal distribution.

    However, the other day I received a comment on that blog post. Essentially, the comment said, "I wrote down this matrix at random, but it is singular. How do you explain that?" The matrix was similar to the matrices at the top of this article. The answer is that the elements of the matrix were not generated randomly. The rows of the matrix form an arithmetic progression (very NON-random!), which, as we have seen, implies that the matrix is singular.

The post A matrix is singular if its rows are arithmetic sequences appeared first on The DO Loop.

9月 162020
 

A data analyst recently asked a question about restricted least square regression in SAS. Recall that a restricted regression puts linear constraints on the coefficients in the model. Examples include forcing a coefficient to be 1 or forcing two coefficients to equal each other. Each of these problems can be solved by using PROC REG in SAS. The analyst's question was, "Why can I use PROC REG to solve a restricted regression problem when the restriction involves EQUALITY constraints, but PROC REG can't solve a regression problem that involves an INEQUALITY constraint?"

This article shows how to use the RESTRICT statement in PROC REG to solve a restricted regression problem that has linear constraints on the parameters. I also use geometry and linear algebra to show that solving an equality-constrained regression problem is mathematically similar to solving an unrestricted least squares system of equations. This explains why PROC REG can solve equality-constrained regression problems. In a future article, I will show how to solve regression problems that involve inequality constraints on the parameters.

The geometry of restricted regression

The linear algebra for restricted least squares regression gets messy, but the geometry is easy to picture. A schematic depiction of restricted regression is shown to the right.

Geometrically, ordinary least-squares (OLS) regression is the orthogonal projection of the observed response (Y) onto the column space of the design matrix. (For continuous regressors, this is the span of the X variables, plus an "intercept column.") If you introduce equality constraints among the parameters, you are restricting the solution to a linear subspace of the span (shown in green). But the geometry doesn't change. The restricted least squares (RLS) solution is still a projection of the observed response, but this time onto the restricted subspace.

In terms of linear algebra, the challenge is to write down the projection matrix for the restricted problem. As shown in the diagram, you can obtain the RLS solution in two ways:

  • Starting from Y: You can directly project the Y vector onto the restricted subspace. The linear algebra for this method is straightforward and is often formulated in terms of Lagrange multipliers.
  • Starting from the OLS solution: If you already know the OLS solution, you can project that solution vector onto the restricted subspace. Writing down the projection matrix is complicated, so I refer you to the online textbook Practical Econometrics and Data Science by A. Buteikis.

Restricted models and the TEST statement in PROC REG

Let's start with an example, which is based on an example in the online textbook by A. Buteikis. The following SAS DATA step simulates data from a regression model
      Y = B0 + B1*X1 + B2*X2 + B3*X3 + B4*X4 + ε
where B3 = B1 and B4 = –2*B2 and ε ~ N(0, 3) is a random error term.

/* Model: Y = B0 + B1*X1 + B2*X2 + B3*X3 + B4*X4
   where B3 = B1 and B4 = -2*B2
*/
data RegSim;
array beta[0:4] (-5, 2, 3, 2, -6);  /* parameter values for regression coefficients */
N = 100;                            /* sample size */
call streaminit(123);
do i = 1 to N;
   x1 = rand("Normal", 5, 2);       /* X1 ~ N(5,2) */
   x2 = rand("Integer", 1, 50);     /* X2 ~ random integer 1:50 */
   x3 = (i-1)*10/(N-1);             /* X3 = sequence: 0 to 10 by 1/N */
   x4 = rand("Normal", 10, 3);      /* X4 ~ N(10, 3) */
   eps = rand("Normal", 0, 3);      /* RMSE = 3 */
   Y = beta[0] + beta[1]*x1 + beta[2]*x2 + beta[3]*x3 + beta[4]*x4 + eps;
   output;
end;
run;

Although there are five parameters in the model (B0-B4), there are only three free parameters because the values of B2 and B4 are determined by the values of B1 and B4, respectively.

If you suspect that a parameter in your model is a known constant or is equal to another parameter, you can use the RESTRICT statement in PROC REG to incorporate that restriction into the model. Before you do, however, it is usually a good idea to perform a statistical test to see if the data supports the model. For this example, you can use the TEST statement in PROC REG to hypothesize that B3 = B1 and B4 = –2*B2. Recall that the syntax for the TEST statement uses the variable names (X1-X4) to represent the coefficients of the variable. The following call to PROC REG carries out this analysis:

proc reg data=RegSim plots=none;
   model Y = x1 - x4;
   test x3 = x1, x4 = -2*x2;
run;

The output shows that the parameter estimates for the simulated data are close to the parameter values used to generate the data. Specifically, the root MSE is close to 3, which is the standard deviation for the error term. The Intercept estimate is close to -5. The coefficients for the X1 and X3 term are each approximately 2. The coefficient for X4 is approximately –2 times the coefficient of X2. The F test for the test of hypotheses has a large p-value, so you should not reject the hypotheses that B3 = B1 and B4 = –2*B2.

The RESTRICT statement in PROC REG

If the hypothesis on the TEST statement fails to reject, then you can use the RESTRICT statement to recompute the parameter estimates subject to the constraints. You could rerun the entire analysis or, if you are running SAS Studio in interactive mode, you can simply submit a RESTRICT statement and PROC REG will compute the new parameter estimates without rereading the data:

   restrict x3 = x1, x4 = -2*x2; /* restricted least squares (RLS) */
   print;
quit;

The new parameter estimates enforce the restrictions among the regression coefficients. In the new model, the coefficients for X1 and X3 are exactly equal, and the coefficient for X4 is exactly –2 times the coefficient for X2.

Notice that the ParameterEstimates table is augmented by two rows labeled "RESTRICT". These rows have negative degrees of freedom because they represent constraints. The "Parameter Estimate" column shows the values of the Lagrange multipliers that are used to enforce equality constraints while solving the least squares system. Usually, a data analyst does not care about the actual values in these rows, although the Wikipedia article on Lagrange multipliers discusses ways to interpret the values.

The linear algebra of restricted regression

This section shows the linear algebra behind the restricted least squares solution by using SAS/IML. Recall that the usual way to compute the unrestricted OLS solution is the solve the "normal equations" (X`*X)*b = X`*Y for the parameter estimates, b. Although textbooks often solve this equation by using the inverse of the X`X matrix, it is more efficient to use the SOLVE function in SAS/IML to solve the equation, as follows:

proc iml;
varNames = {x1 x2 x3 x4};
use RegSim;                  /* read the data */
   read all var varNames into X;
   read all var "Y";
close;
X = j(nrow(X), 1, 1) || X;   /* add intercept column to form design matrix */
XpX = X`*X;
 
/* Solve the unrestricted OLS system */
b_ols = solve(XpX, X`*Y);    /* normal equations X`*X * b = X`*Y */
ParamNames = "Intercept" || varNames;
print b_ols[r=ParamNames F=10.5];

The OLS solution is equal to the first (unrestricted) solution from PROC REG.

If you want to require that the solution satisfies B3 = B1 and B4 = –2*B2, then you can augment the X`X matrix to enforce these constraints. If L*λ = c is the matrix equation for the linear constraints, then the augmented system is
\( \begin{pmatrix} X^{\prime}X & L^{\prime} \\ L & 0 \end{pmatrix} \begin{pmatrix} b_{\small\mathrm{RLS}} \\ \lambda \end{pmatrix} = \begin{pmatrix} X^{\prime}Y \\ c \end{pmatrix} \)

The following statements solve the augmented system:

/* Direct method: Find b_RLS by solving augmented system */
/* Int X1 X2 X3 X4 */
L = {0  1  0 -1  0,     /*    X1 - X3 = 0 */
     0  0 -2  0 -1};    /* -2*X2 - X4 = 0 */
c = {0, 0};
 
zero = j(nrow(L), nrow(L), 0);
A = (XpX || L`   ) //
    (L   || zero );
z = X`*Y // c;
b_rls = solve(A, z);
rNames = ParamNames||{'Lambda1' 'Lambda2'};
print b_rls[r=rNames F=10.5];

The parameter estimates are the same as are produced by using PROC REG. You can usually ignore the last two rows, which are the values of the Lagrange multipliers that enforce the constraints.

By slogging through some complicated linear algebra, you can also obtain the restricted least squares solution from the OLS solution and the constraint matrix (L). The following equations use the formulas in the online textbook by A. Buteikis to compute the restricted solution:

/* Adjust the OLS solution to obtain the RLS solution.
   b_RLS = b_OLS - b_adjustment
   Follows
   http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/4-4-Multiple-RLS.html
*/
RA_1 = solve(XpX, L`);
RA_3 = solve(L*RA_1, L*b_ols - c);
b_adj = RA_1 * RA_3;
b_rls = b_ols - b_adj;
print b_rls[r=ParamNames F=10.5];

This answer agrees with the previous answer but does not compute the Lagrange multipliers. Instead, it computes the RLS solution as the projection of the OLS solution onto the restricted linear subspace.

Summary

This article shows how to use the RESTRICT statement in PROC REG to solve a restricted regression problem that involves linear constraints on the parameters. Solving an equality-constrained regression problem is very similar to solving an unrestricted least squares system of equations. Geometrically, they are both projections onto a linear subspace. Algebraically, you can solve the restricted problem directly or as the projection of the OLS solution.

Although changing one of the equality constraints into an INEQUALITY seems like a minor change, it means that the restricted region is no longer a linear subspace. Ordinary least squares can no longer solve the problem because the solution might not be an orthogonal projection. The next article discusses ways to solve problems where the regression coefficients are related by linear inequalities.

The post Restricted least squares regression in SAS appeared first on The DO Loop.

9月 082020
 

Matrix balancing is an interesting problem that has a long history. Matrix balancing refers to adjusting the cells of a frequency table to match known values of the row and column sums. One of the early algorithms for matrix balancing is known as the RAS algorithm, but it is also called the raking algorithm in some fields. The presentation in this article is inspired by a paper by Carol Alderman (1992).

The problem: Only marginal totals are known

Alderman shows a typical example (p. 83), but let me give a smaller example. Suppose a troop of six Girl Scouts sells seven different types of cookies. At the end of the campaign, each girl reports how many boxes she sold. Also, the troop leader knows how many boxes of each cookie type were sold. However, no one kept track of how many boxes of each type were sold by each girl.

The situation is summarized by the following 7 x 6 table. We know the row totals, which are the numbers of boxes that were sold for each cookie type. We also know the column totals, which are the numbers of boxes that were sold by each girl. But we do not know the values of the individual cells, which are the type-by-girl totals.

Guessing the cells of a matrix

As stated, the problem usually has infinitely many solutions. For an r x c table, there are r*c cells, but the marginal totals only provide r + c linear constraints. Except for 2 x 2 tables, there are more unknowns than equations, which leads to an underdetermined linear system that (typically) has infinitely many solutions.

Fortunately, often obtain a unique solution if we provide an informed guess for the cells in the table. Perhaps we can ask the girls to provide estimates of the number of boxes of each type that they sold. Or perhaps we have values for the cells from the previous year. In either case, if we can estimate the cell values, we can use a matrix balancing algorithm to adjust the cells so that they match the observed marginal totals.

Let's suppose that the troop leader asks the girls to estimate (from memory) how many boxes of each type they sold. The estimates are shown in the following SAS/IML program. The row and column sums of the estimates do not match the known marginal sums, but that's okay. The program uses subscript reduction operators to compute the marginal sums of the girls' estimates. These are displayed next to the actual totals, along with the differences.

proc iml;
/* Known marginal totals: */
u = {260, 214, 178, 148, 75, 67, 59}; /* total boxes sold, by type */
v = {272 180 152 163 134 100};        /* total boxes sold, by girl */
 
/* We don't know the cell values that produce these marginal totals,
   but we can ask each girl to estimate how many boxes of each type she sold */
/*   G1 G2 G3 G4 G5 G6          */
A = {75 45 40 40 40 30 ,  /* C1 */
     40 35 45 35 30 30 ,  /* C2 */
     40 25 30 40 30 20 ,  /* C3 */
     40 25 25 20 20 20 ,  /* C4 */
     30 25  0 10 10  0 ,  /* C5 */
     20 10 10 10 10  0 ,  /* C6 */
     20 10  0 10  0  0 }; /* C7 */
 
/* notice that the row/col totals for A do not match the truth: */
rowTotEst = A[ ,+];        /* for each row, sum across all cols */
colTotEst = A[+, ];        /* for each col, sum down all rows   */
Items  = 'Cookie1':'Cookie7';                  /* row labels    */
Person = 'Girl1':'Girl6';                      /* column labels */
 
/* print known totals, estimated totals, and difference */
rowSums = (u || rowTotEst || (u-rowTotEst));
print rowSums[r=Items c={'rowTot (u)' 'rowTotEst' 'Diff'}];
colSums = (v // colTotEst // (v-colTotEst));
print colSums[r={'colTot (v)' 'colTotEst' 'Diff'} c=Person];

You can see that the marginal totals do not match the known row and column totals. However, we can use the guesses as an initial estimate for the RAS algorithm. The RAS algorithm will adjust the estimates to obtain a new matrix that DOES satisfy the marginal totals.

Adjusting a matrix to match marginal totals

The RAS algorithm can iteratively adjust the girls' estimates until the row sums and column sums of the matrix match the known row sums and column sums. The resulting matrix will probably not be an integer matrix. It will, however, reflect the structure of the girls' estimates in three ways:

  1. It will approximately preserve each girl's recollection of the relative proportions of cookie types that she sold. If a girl claims that she sold many of one type of cookie and few of another, that relationship will be reflected in the balanced matrix.
  2. It will approximately preserve the relative proportions of cookie types among girls. If one girl claims that she sold many more of one type of cookie than another girl, that relationship will be reflected in the balanced matrix.
  3. It will preserve zeros. If a girl claims that she did not sell any of one type of cookie, the balanced matrix will also contain a zero for that cell.

The RAS algorithm is explained in Alderman (1992). The matrix starts with a nonnegative r x c matrix. Let u be the observed r x 1 vector of row sums. Let v be the observed 1 x c vector of column sums. The main steps of the RAS algorithm are:

  1. Initialize X = A.
  2. Scale the rows. Let R = u / RowSum(X), where 'RowSum' is a function that returns the vector of row sums. Update X ← R # X. Here R#X indicates elementwise multiplication of each column of X by the corresponding element of R.
  3. Scale the columns. Let S = v / ColSum(X), where 'ColSum' is a function that returns the vector of column sums. Update X ← X # S. Here X#S indicates elementwise multiplication of each row of X by the corresponding element of S.
  4. Check for convergence. If RowSum(X) is close to u and ColSum(X) is close to v, then the algorithm has converged, and X is the balanced matrix. Otherwise, return to Step 2.

The RAS algorithm is implemented in the following SAS/IML program:

/* Use the RAS matrix scaling algorithm to find a matrix X that 
   approximately satisfies 
   X[ ,+] = u  (marginals constraints on rows)
   X[+, ] = v  (marginals constraints on cols)
   and X is obtained by scaling rows and columns of A */
start RAS(A, u, v);
   X = A;       /* Step 0: Initialize X */
   converged = 0;
   do k = 1 to 100 while(^Converged);
      /* Step 1: Row scaling: X = u # (X / X[ ,+]); */
      R = u / X[ ,+];
      X = R # X;
 
      /* Step 2: Column scaling: X = (X / X[+, ]) # v; */
      S = v / X[+, ];
      X = X # S;
 
      /* check for convergence: u=rowTot and v=colTot */
      rowTot = X[ ,+];   /* for each row, sum across all cols */
      colTot = X[+, ];   /* for each col, sum down all rows */
      maxDiff = max( abs(u-rowTot), abs(v-colTot) );
      Converged = (maxDiff < 1e-8);
   end;
   return( X );
finish;
 
X = RAS(A, u, v);
print X[r=Items c=Person format=7.1];

The matrix X has row and column sums that are close to the specified values. The matrix also preserves relationships among individual columns and rows, based on the girls' recollections of how many boxes they sold. The algorithm preserves zeros for the girls who claim they sold zero boxes of a certain type. It does not introduce any new zeros for cells.

You can't determine how close the X matrix is to "the truth," because we don't know the true number of boxes that each girl sold. The X matrix depends on the girls' recollections and estimates. If the recollections are close to the truth, then X will be close to the truth as well.

Summary

This article looks at the RAS algorithm, which is one way to perform matrix balancing on a table that has nonnegative cells. "Balancing" refers to adjusting the interior cells of a matrix to match observed marginal totals. The RAS algorithm starts with an estimate, A. The estimate does not need to satisfy the marginal totals, but it should reflect relationships between the rows and columns that you want to preserve, such as relative proportions and zero cells. The RAS algorithm iteratively scales the rows and columns of A to create a new matrix that matches the observed row and column sums.

It is worth mentioning the row scaling could be accumulated into a diagonal matrix, R, and the column scaling into a diagonal matrix, S. Thus, the balanced matrix is the product of these diagonal matrices and the original estimate, A. As a matrix equation, you can write X = RAS. It is this matrix factorization that gives the RAS algorithm its name.

The RAS is not the only algorithm for matrix balancing, but it is the simplest. Another algorithm, called iterative proportional fitting (IPF), will be discussed in a second article.

The post Matrix balancing: Update matrix cells to match row and column sums appeared first on The DO Loop.

7月 272020
 

The Kronecker product (also called the direct product) is a binary operation that combines two matrices to form a new matrix. The Kronecker product appears in textbooks about the design of experiments and multivariate statistics. The Kronecker product seems intimidating at first, but often one of the matrices in the product has a special form, such as being a matrix of all 1s. In these special cases, the Kronecker product becomes much easier to understand. This article demonstrates some of the most common uses of the Kronecker product, including using it to concatenate matrices and to form block matrices.

What is the Kronecker product?

The Kronecker product is best understood by thinking about block matrices. If A is a n x p matrix and B is a m x q matrix, then the Kronecker product A ⊗ B is a block matrix that is formed from blocks of B, where each block is multiplied by an element of A:

The SAS/IML language uses the "at sign" (@) to represent the binary operator. Thus, the SAS/IML expression A @ B forms the Kronecker product A ⊗ B.

The Kronecker product for vectors of 1s and special matrices

If A or B has a special form, the Kronecker product simplifies. First, consider only the case where A is a vector of all 1s or a special matrix. The following statements follow directly from the definition of the Kronecker product.

  • If A is a row vector of all 1s, then A ⊗ B is the horizontal concatenation of p copies of B.
  • If A is a column vector of all 1s, then A ⊗ B is the vertical concatenation of n copies of B.
  • If A is a matrix of all 1s, then A ⊗ B is a block matrix that contains blocks of the matrix B.
  • If A is the identity matrix, then A ⊗ B is a block diagonal matrix that contains copies of the matrix B along the diagonal.

Next, let A be an arbitrary matrix but consider only special forms for B:

  • If B is a row vector of all 1s, then A ⊗ B contains q copies of the first column of A, followed by q copies of the second column of A, and so forth. I call this sequential replication of columns of A.
  • If B is a column vector of all 1s, then A ⊗ B contains m copies of the first row of A, followed by m copies of the second row of A, and so forth. This is sequential replication of rows of A.
  • If B is a matrix of all 1s, then A ⊗ B is a block matrix where the (i,j)th block is the constant matrix where all elements are A[i,j].
  • If B is an identity matrix, then A ⊗ B is a block matrix where the (i,j)th block is the matrix A[i,j]*I.

The next sections show concrete examples of these eight ways to use the Kronecker product operator.

Use the Kronecker product for horizonal or vertical concatenation

You can use the Kronecker product to perform horizontal or vertical concatenation. For example, the following SAS/IML program defines two vectors that contain only 1s. The vector w is a row vector and the vector h is a column vector. The program computes w ⊗ B and h ⊗ B for various choices of B. The ODS LAYOUT GRIDDED statement arranges the output in a grid.

proc iml;
/* specify RHS vectors */
w  = {1  1};       /* w for 'wide' (row vector) */
h  = {1, 1};       /* h for 'high' (col vector) */
 
/* specify data matrices */
r = {1  2  3};      /* row vector    */
c = {1, 2, 3};      /* column vector */
x = {1 2 3,         /* 2 x 3 matrix  */
     4 5 6};
 
/******************************/
/* vector @ B : Concatenation */
/******************************/
wr = w @ r;         /* horiz concat. Same as repeat(r, 1, ncol(w)) */
wx = w @ x;         /* horiz concat. Same as repeat(x, 1, ncol(w)) */
hc = h @ c;         /* vert  concat. Same as repeat(c, nrow(h), 1) */
hx = h @ x;         /* vert  concat. Same as repeat(x, nrow(h), 1) */
 
ods layout gridded columns=2 advance=table;
print wr[L='w@r'], wx[L='w@x'], hc[L='h@c'], hx[L='h@x'];
ods layout end;

The output indicates that the Kronecker product with w on the left results in horizontal concatenation. Using h on the left results in vertical concatenation.

Use the Kronecker product for sequential replication

The previous section showed how to take a sequence of numbers such as {1,2,3} and repeat the entire sequence to form a new sequence such as {1,2,3,1,2,3}. But sometimes you want to repeat each element before repeating the next element. In other words, you might be interested in forming a sequence such as {1,1,2,2,3,3}. You can perform this kind of "sequential replication" by putting the data on the left and the vectors of 1s on the right, as follows:

/***************************************/
/* A @ vector : Sequential replication */
/***************************************/
rw = r @ w;         /* repeat cols sequentially */
xw = x @ w;
ch = c @ h;         /* repeat rows sequentially */
xh = x @ h;
 
ods layout gridded columns=2 advance=table;
print rw[L='r@w'], xw[L='x@w'], ch[L='c@h'], xh[L='x@h'];
ods layout end;

It is worth mentioning that the second row of output is the transpose of the first row. That is because the transpose operation "distributes" over the Kronecker product operator. That is, for any matrices A and B, it is true that (A ⊗ B)` = A` ⊗ B`. Since r`=c and w`=h, then (r ⊗ w)` = (r` ⊗ w`) = c ⊗ h.

Use the Kronecker product to construct block matrices

The Kronecker product is essentially an operation that forms block matrices. When one of the components is a vector of all 1s, then "forming a block matrix" is the same as concatenation. But if A is a binary matrix, then A ⊗ B is a block matrix that has the same structure as A, but each nonzero block is a copy of B.

Probably the most important subcase is that the matrix I(n) ⊗ B is a block diagonal matrix, where each diagonal block is a copy of B. The following list gives some other examples that use the identity or a constant matrix. Let I(n) be the n x k identity matrix and let J(n) be the n x n matrix of all 1s. Then

  • I(n) ⊗ B is a block diagonal matrix with copies of B on the diagonal.
  • J(n) ⊗ B is a block matrix with a copy of B in each block.
  • A ⊗ I(n) is a block matrix where each block is a diagonal matrix.
  • A ⊗ J(n) is a block matrix where each block is a constant matrix.
Ix = I(2) @ x;      /* diagonal block matrix */    
Jx = J(2) @ x;      /* 2 x 2 block matrix    */
xI = x @ I(2);      /* blocks of diagonal matrices */     
xJ = x @ J(2);      /* blocks of constant matrices */
 
ods layout gridded columns=2 advance=table;
print Ix[L='I@x'], Jx[L='J@x'], xI[L='x@I'], xJ[L='x@J'];
ods layout end;

How is the Kronecker product used in statistics?

Recall that (mathematically) you can only add matrices that have the same dimensions. For example, you can't add a matrix and a vector. However, a common operation in multivariate statistics is to center a data matrix by subtracting the mean of each column from that column. If X is the data matrix, "mean(X)" is a row vector where the i_th element of the vector is the mean of the i_th column. The centering operation is sometimes informally written as "X – mean(X)," even though that expression does not make mathematical sense. Formally, you need to vertically concatenate the mean vector n times, where n is the number of rows in the data matrix. As we have seen, if h is a column vector of length n that contains all 1s, then mean(X) ⊗ h is a matrix and the expression X – mean(X) ⊗ h is a proper mathematical expression that indicates the centering operation.

As described in a previous article, in the early days of SAS, statistical programmers had to use Kronecker product operator to ensure that all matrix operations were mathematically correct. But that changed in SAS version 8 (circa 1999) when the IML language introduced a shorthand notation that makes it easier to perform row and column operations. Back in the "old days," you saw lots of programs that used the Kronecker operator like this:

/* read data matrix into X */
use Sashelp.Class; read all var _num_ into X;  close;
mean = x[:,];            /* row vector of means for each column */
h = j(nrow(X), 1, 1);    /* column vector */
CenterX = X - mean @ h;  /* subtract matrices of same dimension */

The modern SAS/IML programmer will typically use the shorthand notation to avoid the Kronecker product:

CenterX = X - mean;      /* modern code: Subtract the row vector from each row of X */

Although you no longer need to use the Kronecker product for basic centering operations, the Kronecker operator is still useful for other tasks, such as building block matrices and concatenation. Block diagonal matrices are used in mixed models of longitudinal data (Chapter 12 of Wicklin, 2013).

Summary

This article shows eight ways that you can use the Kronecker product. The Kronecker operator enables you to perform horizontal and vertical concatenation, perform sequential repetition of rows and columns, and build block matrices with special structure. You can use this article as a "cheat sheet" to remind you about some common uses of the Kronecker operator.

Do you have a favorite way to use the Kronecker product? Leave a comment.

The post 8 ways to use the Kronecker product appeared first on The DO Loop.

6月 242020
 

If you have ever run a Kolmogorov-Smirnov test for normality, you have encountered the Kolmogorov D statistic. The Kolmogorov D statistic is used to assess whether a random sample was drawn from a specified distribution. Although it is frequently used to test for normality, the statistic is "distribution free" in the sense that it compares the empirical distribution to any specified distribution. I have previously written about how to interpret the Kolmogorov D statistic and how to compute it in SAS.

If you can compute a statistic, you can use simulation to approximate its sampling distribution and to approximate its critical values. Although I am a fan of simulation, in this article I show how to compute the exact distribution and critical values for Kolmogorov D statistic. The technique used in this article is from Facchinetti (2009), "A Procedure to Find Exact Critical Values of Kolmogorov-Smirnov Test."

An overview of the Facchinetti method

Suppose you have a sample of size n and you perform a Kolmogorov-Smirnov test that compares the empirical distribution to a reference distribution. You obtain the test statistic D. To know whether you should reject the null hypothesis (that the test came from the reference distribution), you need to be able to compute the probability CDF(D) = Pr(Dn ≤ D). Facchinetti shows that you can compute the probability for any D by solving a system of linear equations. If a random sample contains n observations, you can form a certain (2n+2) x (2n+2) matrix whose entries are composed of binomial probabilities. You can use a discrete heat map to display the structure of the matrix that is used in the Facchinetti method. One example is shown below. For n=20, the matrix is 42 x 42. As Facchinetti shows in Figure 6 of her paper, the matrix has a block structure, with triangular matrices in four blocks.

The right-hand side of the linear system is also composed of binomial probabilities. The system is singular, but you can use a generalized inverse to obtain a solution. From the solution, you can compute the probability. For the details, see Facchinetti (2009).

Facchinetti includes a MATLAB program in the appendix of her paper. I converted the MATLAB program into SAS/IML and improved the efficiency by vectorizing some of the computations. You can download the SAS/IML program that computes the CDF of the Kolmogorov D statistic and all the graphs in this article. The name of the SAS/IML function is KolmogorovCDF, which requires two arguments, n (sample size) and D (value of the test statistic).

Examples of using the CDF of the Kolmorov distribution

Suppose you have a sample of size 20. You can use a statistical table to look up the critical value of the Kolmogorov D statistic. For α = 0.05, the (two-sided) critical value for n=20 is D = 0.294. If you load and call the KolmogorovCDF function, you can confirm that the CDF for the Kolmogorov D distribution is very close to 0.95 when D=0.294:

proc iml;
load  module=KolmogorovCDF;      /* load the function that implements the Facchinetti method */
/* simple example of calling the function */
n = 20;
D = 0.29408;
cdf = KolmogorovCDF(n, D);
print n D cdf;

The interpretation of this result is that if you draw a random samples of size 20 from the reference distribution, there is a 5% chance that the Kolmogorov D value will exceed 0.294. A test statistic that exceeds 0.294 implies that you should reject the null hypothesis at the 0.05 significance level.

Visualize the Kolmogorov D distribution

If you can compute the CDF for one value of D, you can compute it for a sequence of D values. In this way, you can visualize the CDF curve. The following SAS/IML statements compute the CDF for the Kolmogorov D distribution when n=20 by running the computation on a sequence of D values in the open interval (0, 1):

D = do(0.01, 0.99, 0.01);             /* sequence of D values */
CDF = j(1, ncol(D), .);
do i = 1 to ncol(D);
   CDF[i] = KolmogorovCDF(n, D[i]);   /* CDF for each D */ 
end;
title "Kolmogorov CDF, n=20";
call series(D, CDF) grid={x y};       /* create a graph of the CDF */

The graph enables you to read probabilities and quantiles for the D20 distribution. The graph shows that almost all random samples of size 20 (drawn from the reference distribution) will have a D statistic of less than 0.4. Geometrically, the D statistic represents the largest gap between the empirical distribution and the reference distribution. A large gap indicates that the sample was probably not drawn from the reference distribution.

Visualize a family of Kolmogorov D distributions

The previous graph was for n=20, but you can repeat the computation for other values of n to obtain a family of CDF curves for the Kolmogorov D statistic. (Facchinetti writes Dn to emphasize that the statistic depends on the sample size.) The computation is available in the program that accompanies this article. The results are shown below:

This graph enables you to estimate probabilities and quantiles for several Dn distributions.

Visualize a family of density curves

Because the PDF is the derivative of the CDF, you can use a forward difference approximation of the derivative to also plot the density of the distribution. The following density curves are derived from the previous CDF curves:

A table of critical values

If you can compute the CDF, you can compute quantiles by using the inverse CDF. You can use a root-finding technique to obtain a quantile.

In the SAS/IML language, the FROOT function can find univariate roots. The critical values of the Dn statistic are usually tabulated by listing the sample size down columns and the significance level (α) or confidence level (1-α) across the columns. Facchinetti (p. 352) includes a table of critical values in her paper. The following SAS/IML statements show how to recreate the table. For simplicity, I show only a subset of the table for n=20, 25, 20, and 35, and for four common values of α.

/* A critical value is the root of the function
   f(D) = KolmogorovCDF(n, D) - (1-alpha)
   for any specified n and alpha value.
*/
start KolmogorovQuantile(D) global (n, alpha);
   return  KolmogorovCDF(n, D) - (1-alpha);
finish;
 
/* create a table of critical values for the following vector of n and alpha values */
nVals = do(20, 35, 5);
alphaVals = {0.001 0.01 0.05 0.10};
CritVal = j(ncol(nVals), ncol(alphaVals), .);
do i = 1 to ncol(nVals);
   n = nVals[i];
   do j = 1 to ncol(alphaVals);
      alpha = alphaVals[j];
      CritVal[i,j] = froot("KolmogorovQuantile", {0.01, 0.99});
   end;
end;
print CritVal[r=nVals c=alphaVals format=7.5 L="Critical Values (n,alpha)"];

For each sample size, the corresponding row of the table shows the critical values of D for which CDF(D) = 1-α. Of course, a printed table is unnecessary when you have a programmatic way to compute the quantiles.

Summary

Facchinetti (2009) shows how to compute the CDF for the Kolmogorov D distribution for any value of n and D. The Kolmogorov D statistic is often used for goodness-of-fit tests. Facchinetti's method consists of forming and solving a system of linear equations. To make her work available to the SAS statistical programmer, I translated her MATLAB program into a SAS/IML program that computes the CDF of the Kolmogorov D distribution. This article demonstrates how to use the KolmogorovCDF in SAS/IML to compute and visualize the CDF and density of the Kolmogorov D statistic. It also shows how to use the computation in conjunction with a root-finding method to obtain quantiles and exact critical values of the Kolmogorov D distribution.

The post The Kolmogorov D distribution and exact critical values appeared first on The DO Loop.

6月 222020
 

Sometimes in matrix computations, it is important to display the nonzero elements of a matrix. This can be useful for visualizing the structure of a sparse matrix (one that has many zeros) and it is also useful when describing a matrix algorithm (such as Gaussian elimination) that introduces zeros at each step of the algorithm.

Let A be the matrix you want to visualize. You can visualize the nonzero elements of the matrix A in a graph or in a table:

  • Use a custom SAS format that prints an asterisk (*) for nonzero values and a blank for zero values.
  • Create a discrete heat map of the binary matrix A^=0, which replaces the nonzero cells of A with the value 1.

A text display method

For small matrices, you can use a custom SAS format to print the matrix. The format is defined by using PROC FORMAT. This format displays a star (*) for nonzero values and displays a blank for zero values:

proc format;
value SparseFmt   
      0 = " "      /* display blank instead of 0 */
      other = "*"; /* display '*' instead of number */
run;
 
proc iml;
A = {1.0  0.2  0.0  0.0  0.0  0.0 1e-6,
     0.2  1.0  0.2  0.0  0.0  0.0 0.0,
     0.0  0.2  1.0  0.2  0.0  0.0 0.0,
     0.0  0.0  0.2  1.0  0.2  0.0 0.0,
     1e-6 0.0  0.0  0.2  1.0  0.2 0.0,
     2e-6 1e-6 0.0  0.0  0.2  1.0 0.2,
     3e-6 2e-6 1e-6 0.0  0.0  0.2 1.0 };
 
/* use a custom format to print * or blank in each cell */
print A[format=SparseFmt.];
Visualize a sparse matrix by using a custom format that print a blank or an asterisk

You can see the banded structure of the matrix and also see that elements near the corners are nonzero. This textual visualization is concise and is used in textbooks about numerical linear algebra.

A graphical method

For larger matrices, a graphical method is probably better. You can form the binary logical matrix B=(A^=0) and then plot B by using a two-color heat map. In the SAS/IML language, you can use the HEATMAPDISC call to create a discrete heat map of a matrix. I like to use white for the zeroes. The following SAS/IML statements define a 7 x 7 matrix and use a discrete heat map to display the nonzero values:

/* discrete heat map of binary matrix A^=0 */
ods graphics / width=400px height=400px;
call heatmapdisc(A^=0) colorramp={white steelblue}
         title="Discrete heat map of binary matrix";
Visualize a sparse matrix by using a heat map of a binary matrix where each cell is zero (0) or nonzero (1)

The information in the heat map is identical to the printed display. However, for large matrices, the heat map is easier to view. As shown, you can also use the ODS GRAPHICS statement to control the dimensions of the graph. So, for example, if you are visualizing a square matrix, you can make the graph square.

The post Visualize the structure of a sparse matrix appeared first on The DO Loop.

12月 162019
 
Rockin' around the Christmas tree
At the Christmas party hop.
     – Brenda Lee
Christmas tree image with added noise

Last Christmas, I saw a fun blog post that used optimization methods to de-noise an image of a Christmas tree. Although there are specialized algorithms that remove random noise from an image, I am not going to use a specialized de-noising algorithm. Instead, I will use the SVD algorithm, which is a general-purpose matrix algorithm that has many applications. One application is to construct a low-rank approximation of a matrix. This article applies the SVD algorithm to a noisy image of a Christmas tree. The result is recognizable as a Christmas tree, but much of the noise has been eliminated.

A noisy Christmas tree image

The image to the right is a heat map of a binary (0/1) matrix that has 101 rows and 100 columns. First, I put the value 1 in certain cells so that the heat map resembles a Christmas tree. About 41% of the cells have the value 1. We will call these cells "the tree".

To add random noise to the image, I randomly switched 10% of the cells. The result is an image that is recognizable as a Christmas tree but is very noisy.

Apply the SVD to a matrix

As mentioned earlier, I will attempt to de-noise the matrix (image) by using the general-purpose SVD algorithm, which is available in SAS/IML software. If M is the 101 x 100 binary matrix, then the following SAS/IML statements factor the matrix by using a singular-value decomposition. A series plot displays the first 20 singular values (ordered by magnitude) of the noisy Christmas tree matrix:

call svd(U, D, V, M);                             /* A = U*diag(D)*V` */
call series(1:20, D) grid={x y} xvalues=1:nrow(D) label={"Component" "Singular Value"};

The graph (which is similar to a scree plot in principal component analysis) indicates that the first four singular values are somewhat larger than the remainder. The following statements approximate M by using rank-4 matrix. The rank-4 matrix is not a binary matrix, but you can visualize the rank-4 approximation matrix by using a continuous heat map. For convenience, the matrix elements are shifted and scaled into the interval [0,1].

keep = 4;        /* how many components to keep? */
idx = 1:keep;
Ak = U[ ,idx] * diag(D[idx]) * V[ ,idx]`;
Ak = (Ak - min(Ak)) / range(Ak); /* for plotting, standardize into [0,1] */
s = "Rank " + strip(char(keep)) + " Approximation of Noisy Christmas Tree Image";
call heatmapcont(Ak) colorramp=ramp displayoutlines=0 showlegend=0 title=s range={0, 1};

Apply a threshold cutoff to classify pixels

The continuous heat map shows a dark image of the Christmas tree surrounded by a light-green "fog". The dark-green pixels represent cells that have values near 1. The light-green pixels have smaller values. You can use a histogram to show the distribution of values in the rank-4 matrix:

ods graphics / width=400px height=300px;
call histogram(Ak);

You can think of the cell values as being a "score" that predicts whether each cell belongs to the Christmas tree (green) or not (white). The histogram indicates that the values in the matrix appear to be a mixture of two distributions. The high values in the histogram correspond to dark-green pixels (cells); the low values correspond to light-green pixels. To remove the light-green "fog" in the rank-4 image, you can use a "threshold value" to convert the continuous values to binary (0/1) values. Essentially, this operation predicts which cells are part of the tree and which are not.

For every choice of the threshold parameter, there will be correct and incorrect predictions:

  • A "true positive" is a cell that belongs to the tree and is colored green.
  • A "false positive" is a cell that does not belong to the tree but is colored green.
  • A "true negative" is a cell that does not belong to the tree and is colored white.
  • A "false negative" is a cell that belongs to the tree but is colored white.

By looking at the histogram, you might guess that a threshold value of 0.5 will do a good job of separating the low and high values. The following statements use 0.5 to convert the rank-4 matrix into a binary matrix, which you can think of as the predicted values of the de-noising process:

t = 0.5;  /* threshold parameter */
s = "Discrete Filter: Threshold = " + strip(char(t,4,2)) ;
call HeatmapDisc(Ak > t) colorramp=ramp displayoutlines=0 showlegend=0 title=s;

Considering that 10% of the original image was corrupted, the heat map of the reconstructed matrix is impressive. It looks like a Christmas tree, but has much less noise. The reconstructed matrix agrees with the original (pre-noise) matrix for 98% of the cells. In addition:

  • There are only a few false positives (green dots) that are far from the tree.
  • There are some false negatives in the center of the tree, but many fewer than were in the original noisy image.
  • The locations where the image is the most ragged is along the edge and trunk of the Christmas tree. In that region, the de-noising could not tell the difference between tree and non-tree.

The effect of the threshold parameter

You might wonder what the reconstruction would look like for different choices of the cutoff threshold. The following statements create two other heat maps, one for the threshold value 0.4 and the other for the threshold value 0.6. As you might have guessed, smaller threshold values result in more false positives (green pixels away from the tree), whereas higher threshold values result in more false negatives (white pixels inside the tree).

do t = 0.4 to 0.6 by 0.2;
   s = "Discrete Filter: Threshold = " + strip(char(t,4,2)) ;
   call HeatmapDisc(Ak > t) colorramp=ramp displayoutlines=0 showlegend=0 title=s;
end;

Summary

Although I have presented this experiment in terms of an image of a Christmas tree, it is really a result about matrices and low-rank approximations. I started with a binary 0/1 matrix, and I added noise to 10% of the cells. The SVD factorization enables you to approximate the matrix by using a rank-4 approximation. By using a cutoff threshold, you can approximate the original matrix before the random noise was added. Although the SVD algorithm is a general-purpose algorithm that was not designed for de-noising images, it does a good job eliminating the noise and estimating the original matrix.

If you are interested in exploring these ideas for yourself, you can download the complete SAS program that creates the graphs in this article.

The post Math-ing around the Christmas tree: Can the SVD de-noise an image? appeared first on The DO Loop.

12月 112019
 

Binary matrices are used for many purposes. I have previously written about how to use binary matrices to visualize missing values in a data matrix. They are also used to indicate the co-occurrence of two events. In ecology, binary matrices are used to indicate which species of an animal are present in which ecological site. For example, if you remember your study of Darwin's finches in high school biology class, you might recall that certain finches (species) are present or absent on various Galapagos islands (sites). You can use a binary matrix to indicate which finches are present on which islands.

Recently I was involved in a project that required performing a permutation test on rows of a binary matrix. As part of the project, I had to solve three smaller problems involving rows of a binary matrix:

  1. Given two rows, find the locations at which the rows differ.
  2. Given two binary matrices, determine how similar the matrices are by computing the proportion of elements that are the same.
  3. Given two rows, swap some of the elements that differ.

This article shows how to solve each problem by using the SAS/IML matrix language. A future article will discuss permutation tests for binary matrices. For clarity, I introduce the following macro that uses a temporary variable to swap two sets of values:

/* swap values of A and B. You can use this macro in the DATA step or in the SAS/IML language */
%macro SWAP(A, B, TMP=_tmp);
   &TMP = &A; &A = &B; &B = &TMP;
%mend;

Where do binary matrices differ?

The SAS/IML matrix language enables you to treat matrices as high-level objects. You often can answer questions about matrices without writing loops that iterate over the elements of the matrices. For example, if you have two matrices of the same dimensions, you can determine the cells at which the matrices are unequal by using the "not equal" (^=) operator. The following SAS/IML statements define two 2 x 10 matrices and use the "not equal" operator to find the cells that are different:

proc iml;
A = {0 0 1 0 0 1 0 1 0 0 ,
     1 0 0 0 0 0 1 1 0 1 };
B = {0 0 1 0 0 0 0 1 0 1 ,
     1 0 0 0 0 1 1 1 0 0 };
colLabel = "Sp1":("Sp"+strip(char(ncol(A))));
rowLabel = "A1":("A"+strip(char(nrow(A))));
 
/* 1. Find locations where binary matrices differ */
Diff = (A^=B);
print Diff[c=colLabel r=rowLabel];

The matrices A and B are similar. They both have three 1s in the first row and fours 1s in the second row. However, the output shows that the matrices are different for the four elements in the sixth and tenth columns. Although I used entire matrices for this example, the same operations work on row vectors.

The proportion of elements in common

You can use various statistics to measure the similarity between the A and B matrices. A simple statistic is the proportion of elements that are in common. These matrices have 20 elements, and 16/20 = 0.8 are common to both matrices. You can compute the proportion in common by using the express (A=B)[:], or you can use the following statements if you have previously computed the Diff matrix:

/* 2. the proportion of elements in B that are the same as in A */
propDiff = 1 - Diff[:];
print propDiff;

As a reminder, the mean subscript reduction operator ([:]) computes the mean value of the elements of the matrix. For a binary matrix, the mean value is the proportion of ones.

Swap elements of rows

The first two tasks were easy. A more complicated task is swapping values that differ between rows. The swapping operation is not difficult, but it requires finding the k locations where the rows differ and then swapping all or some of those values. In a permutation test, the number of elements that you swap is a random integer between 1 and k, but for simplicity, this example uses the SWAP macro to swap two cells that differ. For clarity, the following example uses temporary variables such as x1, x2, d1, and d2 to swap elements in the matrix A:

/* specify the rows whose value you want to swap */
i1 = 1;                         /* index of first row to compare and swap */
i2 = 2;                         /* index of second row to compare and swap */
 
/* For clarity, use temp vars x1 & x2 instead of A[i1, ] and A[i2, ] */
x1 = A[ i1, ];                  /* get one row */
x2 = A[ i2, ];                  /* get another row */
idx = loc( x1^=x2 );            /* find the locations where rows differ */
if ncol(idx) > 0 then do;       /* do the rows differ? */
   d1 = x1[ ,idx];              /* values at the locations that differ */
   d2 = x2[ ,idx];
   print (d1//d2)[r={'r1' 'r2'} L='Original'];
 
   /* For a permutation test, choose a random number of locations to swap */
   /* numSwaps = randfun(1, "Integer", 1, n);
      idxSwap = sample(1:ncol(idx), numSwaps, "NoReplace");  */
   idxSwap = {2 4};              /* for now, hard-code locations to swap */
   %SWAP(d1[idxSwap], d2[idxSwap]);
   print (d1//d2)[r={'d1' 'd2'} L='New'];
   /* update matrix */
   A[ i1, idx] = d1;
   A[ i2, idx] = d2;
end;
print A[c=colLabel r=rowLabel];

The vectors x1 and x2 are the rows of A to compare. The vectors d1 and d2 are the subvectors that contain only the elements of x1 and x2 that differ. The example swaps the second and fourth columns of d1 and d2. The new values are then inserted back into the matrix A. You can compare the final matrix to the original to see that the process swapped two elements in each of two rows.

Although the examples are for binary matrices, these techniques work for any numerical matrices.

The post Swap elements in binary matrices appeared first on The DO Loop.

11月 272019
 
Visualization of a quadratic function and a linear subspace

This article discusses how to restrict a multivariate function to a linear subspace. This is a useful technique in many situations, including visualizing an objective function that is constrained by linear equalities. For example, the graph to the right is from a previous article about how to evaluate quadratic polynomials. The graph shows a heat map for a quadratic polynomial of two variables. The diagonal line represents a linear constraint between the X and Y variables. If you restrict the polynomial to the diagonal line, you obtain a one-dimensional function that you can easily graph and visualize. By repeating this process for other lines, you can construct a "stack" of lower-dimensional slices that enable you to understand the graph of the original bivariate function.

This example generalizes. For a function of many variables, you can restrict the function to a lower-dimensional subspace. By visualizing the restricted function, you can understand the high-dimensional function better. I previously demonstrated this technique for multidimensional regression models by using the SLICEFIT option in the EFFECTPLOT statement in SAS. This article shows how to use ideas from vector calculus to restrict a function to a parameterized linear subspace.

Visualize high-dimensional data and functions

There are basically two techniques for visualizing high-dimensional objects: projection and slicing. Many methods in multivariate statistics compute a linear subspace such that the projection of the data onto the subspace has a desirable property. One example is principal component analysis, which endeavors to find a low dimensional subspace that captures most of the variation in the data. Another example is linear discriminant analysis, which aims to find a linear subspace that maximally separates groups in the data. In both cases, the data are projected onto the computed subspaces to reveal characteristics of the data. There are also statistical methods that attempt to find nonlinear subspaces.

Whereas projection is used for data, slicing is often used to visualize the graphs of functions. In regression, you model a response variable as a function of the explanatory variables. Slicing the function along a linear subspace of the domain can reveal important characteristics about the function. That is the method used by the SLICEFIT option in the EFFECTPLOT statement, which enables you to visualize multidimensional regression models. The most common "slice" for a regression model is to specify constant values for all but one continuous variable in the model.

When visualizing an objective function that is constrained by a set of linear equalities, the idea is similar. The main difference is that the "slice" is usually determined by a general linear combination of variables.

Restrict a function to a one-dimensional subspace

A common exercise in multivariate calculus is to restrict a bivariate function to a line. (This is equivalent to a "slice" of the bivariate function.) Suppose that you choose a point, x0, and a unit vector, u, which represents the "direction". Then a parametric line through x0 in the direction of u is given by the vector expression v(t) = x0 + t u, where t is any real number. If you restrict a multivariate function to the image of v, you obtain a one-dimensional function.

For example, consider the quadratic polynomial in two variables:
f(x,y) = (9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6
The function is visualized by the heat map at the top of this article. Suppose x0 = {0, -1} and choose u to be the unit vector in the direction of the vector {3, 1}. (This choice corresponds to a linear constraint of the form x – 3y = 3.)

The following SAS/IML program constructs the parameterized line v(t) for a uniform set of t values. The quadratic function is evaluated at this set of values and the resulting one-dimensional function is graphed:

proc iml;
/* Evaluate the quadratic function at each column of X and return a row vector. */
start Func(XY);
   x = XY[1,]; y = XY[2,];
   return (9*x##2  + x#y + 4*y##2) - 12*x - 4*y + 6;
finish;
 
x0 = {0, -1};          /* evaluate polynomial at this point */
d  = {3, 1};           /* vector that determines direction */
u  = d / norm(d);      /* unit vector (direction) */
 
t = do(-2, 2, 0.1);    /* parameter values */
v = x0 + t @ u;        /* evenly spaced points along the linear subspace */
f = Func(v);           /* evaluate the 2-D function along the 1-D subspace */
title "Quadratic Form Evaluated on Linear Subspace";
call series(t, f) grid={x y};
Quadratic function restricted to linear subspace

The graph shows the restriction of the 2-D function to the 1-D subspace. According to the graph, the restricted function reaches a minimum value for t ≈ 0.92. The corresponding (x, y) values and the corresponding function value are shown below:

tMin = 0.92;           /* t* = parameter near the minimum */
vMin = x0 + tMin * u;  /* corresponding (x(t*), y(t*)) */
fMin = Func(vMin);     /* f(x(t*), y(t*)) */
print tMin vMin fMin;

If you look back to the heat map at the top of this article, you will see that the value (x,y) = (0.87,-0.71) correspond to the location for which the function achieves a minimum value when restricted to the linear subspace. The value of the function at that (x,y) value is approximately 6.6.

The SAS/IML program uses the Kronecker direct-product operator (@) to create points along the line. The operation is v = x0 + t @ u. The symbols x0 and u are both column vectors with two elements. The Kronecker product creates a 2 x k matrix, where k is the number of elements in the row vector t. The Kronecker product enables you to vectorize the program instead of looping over the elements of t and forming the individual vectors x0 + t[i]*u.

You can generalize this example to evaluate a multivariate function on a two-dimensional subspace. If u1 and u2 are two orthogonal, p-dimensional, unit vectors, the expression x0 + s*u1 + t*u2 spans a two-dimensional subspace of Rp. You can use the ExpandGrid function in SAS/IML to generate ordered pairs (s, t) on a two-dimensional grid.

Summary

In summary, you can slice a multivariate function along a linear subspace. Usually, 1-D or 2-D subspaces are used and the resulting (restricted) function is graphed. This helps you to understand the function. You can choose a series of slices (often parallel to each other) and "stack" the slices in order to visualize the multivariate function. Typically, this technique works best for functions that have three or four continuous variables.

You can download the SAS program that creates the graphs in this article.

The post Evaluate a function on a linear subspace appeared first on The DO Loop.

11月 252019
 

What is an efficient way to evaluate a multivariate quadratic polynomial in p variables? The answer is to use matrix computations! A multivariate quadratic polynomial can be written as the sum of a purely quadratic term (degree 2), a purely linear term (degree 1), and a constant term (degree 0). The purely quadratic term is called a quadratic form. This article shows how to use matrix computations to efficiently evaluate a multivariate quadratic polynomial.

Quadratic polynomials and matrix expressions

Visualization of a quadratic polynomial in SAS

As I wrote in a previous article about optimizing a quadratic function, the matrix of second derivatives and the gradient of first derivatives appear in the matrix representation of a quadratic polynomial. I will use the same example as in my previous article. Namely, the following quadratic polynomial in two variables:
f(x,y) = (9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6
       = (1/2) x` Q x + L` x + 6
where Q = {18  1, 1  8} is a 2x2 symmetric matrix, L = {-12  -4} is a column vector, and x = {x, y} is a column vector that represents the coordinates at which to evaluate the quadratic function.

The graph at the right visualizes this quadratic function by using a heat map. Small values of the function are shown in white. Larger values of the function are shown in blues and greens. The largest values are shown in reds and blacks. The global minimum of this function is approximately (x, y) = (0.64, 0.42), and that point is indicated by a star.

This example generalizes. Every quadratic function in p variables can be written as the sum of a quadratic form (1/2 x` Q x), a linear term (L` x) and a constant. Here Q is a p x p symmetric matrix of second derivatives (the Hessian matrix of f) and L and x are p-dimensional column vectors.

Evaluate quadratic polynomial in SAS

Because the computation involves vectors and matrices, the SAS/IML language is the natural place to use matrices to evaluate a quadratic function. The following SAS/IML statements implement a simple function to evaluate a quadratic polynomial (given in terms of Q, L, and a constant) at an arbitrary two-dimensional vector, x:

proc iml;
/* Evaluate  f(x) = 0.5 * x` * Q * x + L`*x + const, where
   Q is p x p symmetric matrix, 
   L and x are col vector with p elements.
   This version evaluates ONE vector x and returns a scalar value. */
start EvalQuad(x, Q, L, const=0);
   return 0.5 * x`*Q*x + L`*x + const;
finish;
 
/* compute Q and L for f(x,y)= 9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6 */
Q = {18 1,             /* matrix of second derivatives */
      1 8};
L = { -12, -4};        /* use column vectors for L and x */
const = 6;
x0 = {0, -1};          /* evaluate polynomial at this point */
f = EvalQuad(x0, Q, L, const);
print f[L="f(0,-1)"];

Evaluate a quadratic polynomial at multiple points

When I implement a function in the SAS/IML language, I try to "vectorize" it so that it can evaluate multiple points in a single call. Often you can use matrix operations to vectorize a function evaluation, but I don't see how to make the math work for this problem. The natural way to evaluate a quadratic polynomial at k vectors X1, X2, ..., Xk, is to pack those vectors into a p x k matrix X such that each column of X is a point at which to evaluate the polynomial. Unfortunately, the matrix computation of the quadratic form M = 0.5 * X`*Q*X results in a k x k matrix. Only the k diagonal elements are needed for evaluating the polynomial on the k input vectors, so although it is possible to compute M, doing so would be very inefficient.

In this case, it seems more efficient to loop over the columns of X. The following function implements a SAS/IML module that evaluates a quadratic polynomial at every column of X and returns a row vector of the results. The module is demonstrated by calling it on a matrix that has 5 columns.

/* Evaluate the quadratic function at each column of X and return a row vector. */
start EvalQuadVec(X, Q, L, const=0);
   f = j(1, ncol(X), .);
   do i = 1 to ncol(X);
      v = X[,i];
      f[i] = 0.5 * v`*Q*v + L`*v + const;
   end;
   return f;
finish;
 
/*    X1  X2  X3 X4  X5  */
vx = {-1 -0.5 0  0.5 1 ,
      -3 -2  -1  0   1 };
f = EvalQuadVec(vx, Q, L, const=0);
print (vx // f)[r={'x' 'y' 'f(x,y)'} c=('X1':'X5')];

Evaluate a quadratic polynomial on a uniform grid of points

You can use the EvalQuadVec function to evaluate a quadratic polynomial on any set of multiple points. In particular, you can use the ExpandGrid function to construct a regular 2-D grid of points. By evaluating the function at each point on the grid, you can visualize the function. The following statements create a heat map of the function on a regular grid. The heat map is shown at the top of this article.

x = do(-1, 1, 0.1); 
y = do(-3, 1.5, 0.1);
xy = expandgrid(x, y);             /* 966 x 2 matrix */
f = EvalQuadVec(xy`, Q, L, const); /* evaluate polynomial at all points */
 
/* write results to a SAS data set and visualize the function by using a heat map */
M = xy || f`;
create Heatmap from M[c={'x' 'y' 'f'}];  append from M;  close;
QUIT;
 
data optimal;
   xx=0.64; yy=0.42;  /* optional: add the optimal (x,y) value */
run;
data All;  set Heatmap optimal; run;
 
title "Heat Map of Quadratic Function";
proc sgplot data=All;
   heatmapparm x=x y=y colorresponse=f / colormodel= (WHITE CYAN YELLOW RED BLACK);
   scatter x=xx y=yy / markerattrs=(symbol=StarFilled);
   xaxis offsetmin=0 offsetmax=0;
   yaxis offsetmin=0 offsetmax=0;
run;

Quadratic approximations

Perhaps you do not often use quadratic polynomials. This technique is useful even for general nonlinear functions because it enables you to find the best quadratic approximation to a multivariate function at any point x0. The multivariate Taylor series at the point x0, truncated at second order, is
f(x) ≈ f(x0) + L` · (xx0) + (1/2) (xx0)` · Q · (xx0)
where L = ∇f(x0) is the gradient of f evaluate at x0 and Q = D2f(x0) is the symmetric Hessian matrix of second derivatives of f evaluated at x0.

Summary

In summary, you can use matrix computations to evaluate a multivariate quadratic polynomial. This article shows how to evaluate a quadratic polynomial at multiple points. For a polynomial of two variables, you can use this technique to visualize quadratic functions.

The post Evaluate a quadratic polynomial in SAS appeared first on The DO Loop.