Matrix Computations

9月 162020

A data analyst recently asked a question about restricted least square regression in SAS. Recall that a restricted regression puts linear constraints on the coefficients in the model. Examples include forcing a coefficient to be 1 or forcing two coefficients to equal each other. Each of these problems can be solved by using PROC REG in SAS. The analyst's question was, "Why can I use PROC REG to solve a restricted regression problem when the restriction involves EQUALITY constraints, but PROC REG can't solve a regression problem that involves an INEQUALITY constraint?"

This article shows how to use the RESTRICT statement in PROC REG to solve a restricted regression problem that has linear constraints on the parameters. I also use geometry and linear algebra to show that solving an equality-constrained regression problem is mathematically similar to solving an unrestricted least squares system of equations. This explains why PROC REG can solve equality-constrained regression problems. In a future article, I will show how to solve regression problems that involve inequality constraints on the parameters.

The geometry of restricted regression

The linear algebra for restricted least squares regression gets messy, but the geometry is easy to picture. A schematic depiction of restricted regression is shown to the right.

Geometrically, ordinary least-squares (OLS) regression is the orthogonal projection of the observed response (Y) onto the column space of the design matrix. (For continuous regressors, this is the span of the X variables, plus an "intercept column.") If you introduce equality constraints among the parameters, you are restricting the solution to a linear subspace of the span (shown in green). But the geometry doesn't change. The restricted least squares (RLS) solution is still a projection of the observed response, but this time onto the restricted subspace.

In terms of linear algebra, the challenge is to write down the projection matrix for the restricted problem. As shown in the diagram, you can obtain the RLS solution in two ways:

  • Starting from Y: You can directly project the Y vector onto the restricted subspace. The linear algebra for this method is straightforward and is often formulated in terms of Lagrange multipliers.
  • Starting from the OLS solution: If you already know the OLS solution, you can project that solution vector onto the restricted subspace. Writing down the projection matrix is complicated, so I refer you to the online textbook Practical Econometrics and Data Science by A. Buteikis.

Restricted models and the TEST statement in PROC REG

Let's start with an example, which is based on an example in the online textbook by A. Buteikis. The following SAS DATA step simulates data from a regression model
      Y = B0 + B1*X1 + B2*X2 + B3*X3 + B4*X4 + ε
where B3 = B1 and B4 = –2*B2 and ε ~ N(0, 3) is a random error term.

/* Model: Y = B0 + B1*X1 + B2*X2 + B3*X3 + B4*X4
   where B3 = B1 and B4 = -2*B2
data RegSim;
array beta[0:4] (-5, 2, 3, 2, -6);  /* parameter values for regression coefficients */
N = 100;                            /* sample size */
call streaminit(123);
do i = 1 to N;
   x1 = rand("Normal", 5, 2);       /* X1 ~ N(5,2) */
   x2 = rand("Integer", 1, 50);     /* X2 ~ random integer 1:50 */
   x3 = (i-1)*10/(N-1);             /* X3 = sequence: 0 to 10 by 1/N */
   x4 = rand("Normal", 10, 3);      /* X4 ~ N(10, 3) */
   eps = rand("Normal", 0, 3);      /* RMSE = 3 */
   Y = beta[0] + beta[1]*x1 + beta[2]*x2 + beta[3]*x3 + beta[4]*x4 + eps;

Although there are five parameters in the model (B0-B4), there are only three free parameters because the values of B2 and B4 are determined by the values of B1 and B4, respectively.

If you suspect that a parameter in your model is a known constant or is equal to another parameter, you can use the RESTRICT statement in PROC REG to incorporate that restriction into the model. Before you do, however, it is usually a good idea to perform a statistical test to see if the data supports the model. For this example, you can use the TEST statement in PROC REG to hypothesize that B3 = B1 and B4 = –2*B2. Recall that the syntax for the TEST statement uses the variable names (X1-X4) to represent the coefficients of the variable. The following call to PROC REG carries out this analysis:

proc reg data=RegSim plots=none;
   model Y = x1 - x4;
   test x3 = x1, x4 = -2*x2;

The output shows that the parameter estimates for the simulated data are close to the parameter values used to generate the data. Specifically, the root MSE is close to 3, which is the standard deviation for the error term. The Intercept estimate is close to -5. The coefficients for the X1 and X3 term are each approximately 2. The coefficient for X4 is approximately –2 times the coefficient of X2. The F test for the test of hypotheses has a large p-value, so you should not reject the hypotheses that B3 = B1 and B4 = –2*B2.

The RESTRICT statement in PROC REG

If the hypothesis on the TEST statement fails to reject, then you can use the RESTRICT statement to recompute the parameter estimates subject to the constraints. You could rerun the entire analysis or, if you are running SAS Studio in interactive mode, you can simply submit a RESTRICT statement and PROC REG will compute the new parameter estimates without rereading the data:

   restrict x3 = x1, x4 = -2*x2; /* restricted least squares (RLS) */

The new parameter estimates enforce the restrictions among the regression coefficients. In the new model, the coefficients for X1 and X3 are exactly equal, and the coefficient for X4 is exactly –2 times the coefficient for X2.

Notice that the ParameterEstimates table is augmented by two rows labeled "RESTRICT". These rows have negative degrees of freedom because they represent constraints. The "Parameter Estimate" column shows the values of the Lagrange multipliers that are used to enforce equality constraints while solving the least squares system. Usually, a data analyst does not care about the actual values in these rows, although the Wikipedia article on Lagrange multipliers discusses ways to interpret the values.

The linear algebra of restricted regression

This section shows the linear algebra behind the restricted least squares solution by using SAS/IML. Recall that the usual way to compute the unrestricted OLS solution is the solve the "normal equations" (X`*X)*b = X`*Y for the parameter estimates, b. Although textbooks often solve this equation by using the inverse of the X`X matrix, it is more efficient to use the SOLVE function in SAS/IML to solve the equation, as follows:

proc iml;
varNames = {x1 x2 x3 x4};
use RegSim;                  /* read the data */
   read all var varNames into X;
   read all var "Y";
X = j(nrow(X), 1, 1) || X;   /* add intercept column to form design matrix */
XpX = X`*X;
/* Solve the unrestricted OLS system */
b_ols = solve(XpX, X`*Y);    /* normal equations X`*X * b = X`*Y */
ParamNames = "Intercept" || varNames;
print b_ols[r=ParamNames F=10.5];

The OLS solution is equal to the first (unrestricted) solution from PROC REG.

If you want to require that the solution satisfies B3 = B1 and B4 = –2*B2, then you can augment the X`X matrix to enforce these constraints. If L*λ = c is the matrix equation for the linear constraints, then the augmented system is
\( \begin{pmatrix} X^{\prime}X & L^{\prime} \\ L & 0 \end{pmatrix} \begin{pmatrix} b_{\small\mathrm{RLS}} \\ \lambda \end{pmatrix} = \begin{pmatrix} X^{\prime}Y \\ c \end{pmatrix} \)

The following statements solve the augmented system:

/* Direct method: Find b_RLS by solving augmented system */
/* Int X1 X2 X3 X4 */
L = {0  1  0 -1  0,     /*    X1 - X3 = 0 */
     0  0 -2  0 -1};    /* -2*X2 - X4 = 0 */
c = {0, 0};
zero = j(nrow(L), nrow(L), 0);
A = (XpX || L`   ) //
    (L   || zero );
z = X`*Y // c;
b_rls = solve(A, z);
rNames = ParamNames||{'Lambda1' 'Lambda2'};
print b_rls[r=rNames F=10.5];

The parameter estimates are the same as are produced by using PROC REG. You can usually ignore the last two rows, which are the values of the Lagrange multipliers that enforce the constraints.

By slogging through some complicated linear algebra, you can also obtain the restricted least squares solution from the OLS solution and the constraint matrix (L). The following equations use the formulas in the online textbook by A. Buteikis to compute the restricted solution:

/* Adjust the OLS solution to obtain the RLS solution.
   b_RLS = b_OLS - b_adjustment
RA_1 = solve(XpX, L`);
RA_3 = solve(L*RA_1, L*b_ols - c);
b_adj = RA_1 * RA_3;
b_rls = b_ols - b_adj;
print b_rls[r=ParamNames F=10.5];

This answer agrees with the previous answer but does not compute the Lagrange multipliers. Instead, it computes the RLS solution as the projection of the OLS solution onto the restricted linear subspace.


This article shows how to use the RESTRICT statement in PROC REG to solve a restricted regression problem that involves linear constraints on the parameters. Solving an equality-constrained regression problem is very similar to solving an unrestricted least squares system of equations. Geometrically, they are both projections onto a linear subspace. Algebraically, you can solve the restricted problem directly or as the projection of the OLS solution.

Although changing one of the equality constraints into an INEQUALITY seems like a minor change, it means that the restricted region is no longer a linear subspace. Ordinary least squares can no longer solve the problem because the solution might not be an orthogonal projection. The next article discusses ways to solve problems where the regression coefficients are related by linear inequalities.

The post Restricted least squares regression in SAS appeared first on The DO Loop.

9月 082020

Matrix balancing is an interesting problem that has a long history. Matrix balancing refers to adjusting the cells of a frequency table to match known values of the row and column sums. One of the early algorithms for matrix balancing is known as the RAS algorithm, but it is also called the raking algorithm in some fields. The presentation in this article is inspired by a paper by Carol Alderman (1992).

The problem: Only marginal totals are known

Alderman shows a typical example (p. 83), but let me give a smaller example. Suppose a troop of six Girl Scouts sells seven different types of cookies. At the end of the campaign, each girl reports how many boxes she sold. Also, the troop leader knows how many boxes of each cookie type were sold. However, no one kept track of how many boxes of each type were sold by each girl.

The situation is summarized by the following 7 x 6 table. We know the row totals, which are the numbers of boxes that were sold for each cookie type. We also know the column totals, which are the numbers of boxes that were sold by each girl. But we do not know the values of the individual cells, which are the type-by-girl totals.

Guessing the cells of a matrix

As stated, the problem usually has infinitely many solutions. For an r x c table, there are r*c cells, but the marginal totals only provide r + c linear constraints. Except for 2 x 2 tables, there are more unknowns than equations, which leads to an underdetermined linear system that (typically) has infinitely many solutions.

Fortunately, often obtain a unique solution if we provide an informed guess for the cells in the table. Perhaps we can ask the girls to provide estimates of the number of boxes of each type that they sold. Or perhaps we have values for the cells from the previous year. In either case, if we can estimate the cell values, we can use a matrix balancing algorithm to adjust the cells so that they match the observed marginal totals.

Let's suppose that the troop leader asks the girls to estimate (from memory) how many boxes of each type they sold. The estimates are shown in the following SAS/IML program. The row and column sums of the estimates do not match the known marginal sums, but that's okay. The program uses subscript reduction operators to compute the marginal sums of the girls' estimates. These are displayed next to the actual totals, along with the differences.

proc iml;
/* Known marginal totals: */
u = {260, 214, 178, 148, 75, 67, 59}; /* total boxes sold, by type */
v = {272 180 152 163 134 100};        /* total boxes sold, by girl */
/* We don't know the cell values that produce these marginal totals,
   but we can ask each girl to estimate how many boxes of each type she sold */
/*   G1 G2 G3 G4 G5 G6          */
A = {75 45 40 40 40 30 ,  /* C1 */
     40 35 45 35 30 30 ,  /* C2 */
     40 25 30 40 30 20 ,  /* C3 */
     40 25 25 20 20 20 ,  /* C4 */
     30 25  0 10 10  0 ,  /* C5 */
     20 10 10 10 10  0 ,  /* C6 */
     20 10  0 10  0  0 }; /* C7 */
/* notice that the row/col totals for A do not match the truth: */
rowTotEst = A[ ,+];        /* for each row, sum across all cols */
colTotEst = A[+, ];        /* for each col, sum down all rows   */
Items  = 'Cookie1':'Cookie7';                  /* row labels    */
Person = 'Girl1':'Girl6';                      /* column labels */
/* print known totals, estimated totals, and difference */
rowSums = (u || rowTotEst || (u-rowTotEst));
print rowSums[r=Items c={'rowTot (u)' 'rowTotEst' 'Diff'}];
colSums = (v // colTotEst // (v-colTotEst));
print colSums[r={'colTot (v)' 'colTotEst' 'Diff'} c=Person];

You can see that the marginal totals do not match the known row and column totals. However, we can use the guesses as an initial estimate for the RAS algorithm. The RAS algorithm will adjust the estimates to obtain a new matrix that DOES satisfy the marginal totals.

Adjusting a matrix to match marginal totals

The RAS algorithm can iteratively adjust the girls' estimates until the row sums and column sums of the matrix match the known row sums and column sums. The resulting matrix will probably not be an integer matrix. It will, however, reflect the structure of the girls' estimates in three ways:

  1. It will approximately preserve each girl's recollection of the relative proportions of cookie types that she sold. If a girl claims that she sold many of one type of cookie and few of another, that relationship will be reflected in the balanced matrix.
  2. It will approximately preserve the relative proportions of cookie types among girls. If one girl claims that she sold many more of one type of cookie than another girl, that relationship will be reflected in the balanced matrix.
  3. It will preserve zeros. If a girl claims that she did not sell any of one type of cookie, the balanced matrix will also contain a zero for that cell.

The RAS algorithm is explained in Alderman (1992). The matrix starts with a nonnegative r x c matrix. Let u be the observed r x 1 vector of row sums. Let v be the observed 1 x c vector of column sums. The main steps of the RAS algorithm are:

  1. Initialize X = A.
  2. Scale the rows. Let R = u / RowSum(X), where 'RowSum' is a function that returns the vector of row sums. Update X ← R # X. Here R#X indicates elementwise multiplication of each column of X by the corresponding element of R.
  3. Scale the columns. Let S = v / ColSum(X), where 'ColSum' is a function that returns the vector of column sums. Update X ← X # S. Here X#S indicates elementwise multiplication of each row of X by the corresponding element of S.
  4. Check for convergence. If RowSum(X) is close to u and ColSum(X) is close to v, then the algorithm has converged, and X is the balanced matrix. Otherwise, return to Step 2.

The RAS algorithm is implemented in the following SAS/IML program:

/* Use the RAS matrix scaling algorithm to find a matrix X that 
   approximately satisfies 
   X[ ,+] = u  (marginals constraints on rows)
   X[+, ] = v  (marginals constraints on cols)
   and X is obtained by scaling rows and columns of A */
start RAS(A, u, v);
   X = A;       /* Step 0: Initialize X */
   converged = 0;
   do k = 1 to 100 while(^Converged);
      /* Step 1: Row scaling: X = u # (X / X[ ,+]); */
      R = u / X[ ,+];
      X = R # X;
      /* Step 2: Column scaling: X = (X / X[+, ]) # v; */
      S = v / X[+, ];
      X = X # S;
      /* check for convergence: u=rowTot and v=colTot */
      rowTot = X[ ,+];   /* for each row, sum across all cols */
      colTot = X[+, ];   /* for each col, sum down all rows */
      maxDiff = max( abs(u-rowTot), abs(v-colTot) );
      Converged = (maxDiff < 1e-8);
   return( X );
X = RAS(A, u, v);
print X[r=Items c=Person format=7.1];

The matrix X has row and column sums that are close to the specified values. The matrix also preserves relationships among individual columns and rows, based on the girls' recollections of how many boxes they sold. The algorithm preserves zeros for the girls who claim they sold zero boxes of a certain type. It does not introduce any new zeros for cells.

You can't determine how close the X matrix is to "the truth," because we don't know the true number of boxes that each girl sold. The X matrix depends on the girls' recollections and estimates. If the recollections are close to the truth, then X will be close to the truth as well.


This article looks at the RAS algorithm, which is one way to perform matrix balancing on a table that has nonnegative cells. "Balancing" refers to adjusting the interior cells of a matrix to match observed marginal totals. The RAS algorithm starts with an estimate, A. The estimate does not need to satisfy the marginal totals, but it should reflect relationships between the rows and columns that you want to preserve, such as relative proportions and zero cells. The RAS algorithm iteratively scales the rows and columns of A to create a new matrix that matches the observed row and column sums.

It is worth mentioning the row scaling could be accumulated into a diagonal matrix, R, and the column scaling into a diagonal matrix, S. Thus, the balanced matrix is the product of these diagonal matrices and the original estimate, A. As a matrix equation, you can write X = RAS. It is this matrix factorization that gives the RAS algorithm its name.

The RAS is not the only algorithm for matrix balancing, but it is the simplest. Another algorithm, called iterative proportional fitting (IPF), will be discussed in a second article.

The post Matrix balancing: Update matrix cells to match row and column sums appeared first on The DO Loop.

7月 272020

The Kronecker product (also called the direct product) is a binary operation that combines two matrices to form a new matrix. The Kronecker product appears in textbooks about the design of experiments and multivariate statistics. The Kronecker product seems intimidating at first, but often one of the matrices in the product has a special form, such as being a matrix of all 1s. In these special cases, the Kronecker product becomes much easier to understand. This article demonstrates some of the most common uses of the Kronecker product, including using it to concatenate matrices and to form block matrices.

What is the Kronecker product?

The Kronecker product is best understood by thinking about block matrices. If A is a n x p matrix and B is a m x q matrix, then the Kronecker product A ⊗ B is a block matrix that is formed from blocks of B, where each block is multiplied by an element of A:

The SAS/IML language uses the "at sign" (@) to represent the binary operator. Thus, the SAS/IML expression A @ B forms the Kronecker product A ⊗ B.

The Kronecker product for vectors of 1s and special matrices

If A or B has a special form, the Kronecker product simplifies. First, consider only the case where A is a vector of all 1s or a special matrix. The following statements follow directly from the definition of the Kronecker product.

  • If A is a row vector of all 1s, then A ⊗ B is the horizontal concatenation of p copies of B.
  • If A is a column vector of all 1s, then A ⊗ B is the vertical concatenation of n copies of B.
  • If A is a matrix of all 1s, then A ⊗ B is a block matrix that contains blocks of the matrix B.
  • If A is the identity matrix, then A ⊗ B is a block diagonal matrix that contains copies of the matrix B along the diagonal.

Next, let A be an arbitrary matrix but consider only special forms for B:

  • If B is a row vector of all 1s, then A ⊗ B contains q copies of the first column of A, followed by q copies of the second column of A, and so forth. I call this sequential replication of columns of A.
  • If B is a column vector of all 1s, then A ⊗ B contains m copies of the first row of A, followed by m copies of the second row of A, and so forth. This is sequential replication of rows of A.
  • If B is a matrix of all 1s, then A ⊗ B is a block matrix where the (i,j)th block is the constant matrix where all elements are A[i,j].
  • If B is an identity matrix, then A ⊗ B is a block matrix where the (i,j)th block is the matrix A[i,j]*I.

The next sections show concrete examples of these eight ways to use the Kronecker product operator.

Use the Kronecker product for horizonal or vertical concatenation

You can use the Kronecker product to perform horizontal or vertical concatenation. For example, the following SAS/IML program defines two vectors that contain only 1s. The vector w is a row vector and the vector h is a column vector. The program computes w ⊗ B and h ⊗ B for various choices of B. The ODS LAYOUT GRIDDED statement arranges the output in a grid.

proc iml;
/* specify RHS vectors */
w  = {1  1};       /* w for 'wide' (row vector) */
h  = {1, 1};       /* h for 'high' (col vector) */
/* specify data matrices */
r = {1  2  3};      /* row vector    */
c = {1, 2, 3};      /* column vector */
x = {1 2 3,         /* 2 x 3 matrix  */
     4 5 6};
/* vector @ B : Concatenation */
wr = w @ r;         /* horiz concat. Same as repeat(r, 1, ncol(w)) */
wx = w @ x;         /* horiz concat. Same as repeat(x, 1, ncol(w)) */
hc = h @ c;         /* vert  concat. Same as repeat(c, nrow(h), 1) */
hx = h @ x;         /* vert  concat. Same as repeat(x, nrow(h), 1) */
ods layout gridded columns=2 advance=table;
print wr[L='w@r'], wx[L='w@x'], hc[L='h@c'], hx[L='h@x'];
ods layout end;

The output indicates that the Kronecker product with w on the left results in horizontal concatenation. Using h on the left results in vertical concatenation.

Use the Kronecker product for sequential replication

The previous section showed how to take a sequence of numbers such as {1,2,3} and repeat the entire sequence to form a new sequence such as {1,2,3,1,2,3}. But sometimes you want to repeat each element before repeating the next element. In other words, you might be interested in forming a sequence such as {1,1,2,2,3,3}. You can perform this kind of "sequential replication" by putting the data on the left and the vectors of 1s on the right, as follows:

/* A @ vector : Sequential replication */
rw = r @ w;         /* repeat cols sequentially */
xw = x @ w;
ch = c @ h;         /* repeat rows sequentially */
xh = x @ h;
ods layout gridded columns=2 advance=table;
print rw[L='r@w'], xw[L='x@w'], ch[L='c@h'], xh[L='x@h'];
ods layout end;

It is worth mentioning that the second row of output is the transpose of the first row. That is because the transpose operation "distributes" over the Kronecker product operator. That is, for any matrices A and B, it is true that (A ⊗ B)` = A` ⊗ B`. Since r`=c and w`=h, then (r ⊗ w)` = (r` ⊗ w`) = c ⊗ h.

Use the Kronecker product to construct block matrices

The Kronecker product is essentially an operation that forms block matrices. When one of the components is a vector of all 1s, then "forming a block matrix" is the same as concatenation. But if A is a binary matrix, then A ⊗ B is a block matrix that has the same structure as A, but each nonzero block is a copy of B.

Probably the most important subcase is that the matrix I(n) ⊗ B is a block diagonal matrix, where each diagonal block is a copy of B. The following list gives some other examples that use the identity or a constant matrix. Let I(n) be the n x k identity matrix and let J(n) be the n x n matrix of all 1s. Then

  • I(n) ⊗ B is a block diagonal matrix with copies of B on the diagonal.
  • J(n) ⊗ B is a block matrix with a copy of B in each block.
  • A ⊗ I(n) is a block matrix where each block is a diagonal matrix.
  • A ⊗ J(n) is a block matrix where each block is a constant matrix.
Ix = I(2) @ x;      /* diagonal block matrix */    
Jx = J(2) @ x;      /* 2 x 2 block matrix    */
xI = x @ I(2);      /* blocks of diagonal matrices */     
xJ = x @ J(2);      /* blocks of constant matrices */
ods layout gridded columns=2 advance=table;
print Ix[L='I@x'], Jx[L='J@x'], xI[L='x@I'], xJ[L='x@J'];
ods layout end;

How is the Kronecker product used in statistics?

Recall that (mathematically) you can only add matrices that have the same dimensions. For example, you can't add a matrix and a vector. However, a common operation in multivariate statistics is to center a data matrix by subtracting the mean of each column from that column. If X is the data matrix, "mean(X)" is a row vector where the i_th element of the vector is the mean of the i_th column. The centering operation is sometimes informally written as "X – mean(X)," even though that expression does not make mathematical sense. Formally, you need to vertically concatenate the mean vector n times, where n is the number of rows in the data matrix. As we have seen, if h is a column vector of length n that contains all 1s, then mean(X) ⊗ h is a matrix and the expression X – mean(X) ⊗ h is a proper mathematical expression that indicates the centering operation.

As described in a previous article, in the early days of SAS, statistical programmers had to use Kronecker product operator to ensure that all matrix operations were mathematically correct. But that changed in SAS version 8 (circa 1999) when the IML language introduced a shorthand notation that makes it easier to perform row and column operations. Back in the "old days," you saw lots of programs that used the Kronecker operator like this:

/* read data matrix into X */
use Sashelp.Class; read all var _num_ into X;  close;
mean = x[:,];            /* row vector of means for each column */
h = j(nrow(X), 1, 1);    /* column vector */
CenterX = X - mean @ h;  /* subtract matrices of same dimension */

The modern SAS/IML programmer will typically use the shorthand notation to avoid the Kronecker product:

CenterX = X - mean;      /* modern code: Subtract the row vector from each row of X */

Although you no longer need to use the Kronecker product for basic centering operations, the Kronecker operator is still useful for other tasks, such as building block matrices and concatenation. Block diagonal matrices are used in mixed models of longitudinal data (Chapter 12 of Wicklin, 2013).


This article shows eight ways that you can use the Kronecker product. The Kronecker operator enables you to perform horizontal and vertical concatenation, perform sequential repetition of rows and columns, and build block matrices with special structure. You can use this article as a "cheat sheet" to remind you about some common uses of the Kronecker operator.

Do you have a favorite way to use the Kronecker product? Leave a comment.

The post 8 ways to use the Kronecker product appeared first on The DO Loop.

6月 242020

If you have ever run a Kolmogorov-Smirnov test for normality, you have encountered the Kolmogorov D statistic. The Kolmogorov D statistic is used to assess whether a random sample was drawn from a specified distribution. Although it is frequently used to test for normality, the statistic is "distribution free" in the sense that it compares the empirical distribution to any specified distribution. I have previously written about how to interpret the Kolmogorov D statistic and how to compute it in SAS.

If you can compute a statistic, you can use simulation to approximate its sampling distribution and to approximate its critical values. Although I am a fan of simulation, in this article I show how to compute the exact distribution and critical values for Kolmogorov D statistic. The technique used in this article is from Facchinetti (2009), "A Procedure to Find Exact Critical Values of Kolmogorov-Smirnov Test."

An overview of the Facchinetti method

Suppose you have a sample of size n and you perform a Kolmogorov-Smirnov test that compares the empirical distribution to a reference distribution. You obtain the test statistic D. To know whether you should reject the null hypothesis (that the test came from the reference distribution), you need to be able to compute the probability CDF(D) = Pr(Dn ≤ D). Facchinetti shows that you can compute the probability for any D by solving a system of linear equations. If a random sample contains n observations, you can form a certain (2n+2) x (2n+2) matrix whose entries are composed of binomial probabilities. You can use a discrete heat map to display the structure of the matrix that is used in the Facchinetti method. One example is shown below. For n=20, the matrix is 42 x 42. As Facchinetti shows in Figure 6 of her paper, the matrix has a block structure, with triangular matrices in four blocks.

The right-hand side of the linear system is also composed of binomial probabilities. The system is singular, but you can use a generalized inverse to obtain a solution. From the solution, you can compute the probability. For the details, see Facchinetti (2009).

Facchinetti includes a MATLAB program in the appendix of her paper. I converted the MATLAB program into SAS/IML and improved the efficiency by vectorizing some of the computations. You can download the SAS/IML program that computes the CDF of the Kolmogorov D statistic and all the graphs in this article. The name of the SAS/IML function is KolmogorovCDF, which requires two arguments, n (sample size) and D (value of the test statistic).

Examples of using the CDF of the Kolmorov distribution

Suppose you have a sample of size 20. You can use a statistical table to look up the critical value of the Kolmogorov D statistic. For α = 0.05, the (two-sided) critical value for n=20 is D = 0.294. If you load and call the KolmogorovCDF function, you can confirm that the CDF for the Kolmogorov D distribution is very close to 0.95 when D=0.294:

proc iml;
load  module=KolmogorovCDF;      /* load the function that implements the Facchinetti method */
/* simple example of calling the function */
n = 20;
D = 0.29408;
cdf = KolmogorovCDF(n, D);
print n D cdf;

The interpretation of this result is that if you draw a random samples of size 20 from the reference distribution, there is a 5% chance that the Kolmogorov D value will exceed 0.294. A test statistic that exceeds 0.294 implies that you should reject the null hypothesis at the 0.05 significance level.

Visualize the Kolmogorov D distribution

If you can compute the CDF for one value of D, you can compute it for a sequence of D values. In this way, you can visualize the CDF curve. The following SAS/IML statements compute the CDF for the Kolmogorov D distribution when n=20 by running the computation on a sequence of D values in the open interval (0, 1):

D = do(0.01, 0.99, 0.01);             /* sequence of D values */
CDF = j(1, ncol(D), .);
do i = 1 to ncol(D);
   CDF[i] = KolmogorovCDF(n, D[i]);   /* CDF for each D */ 
title "Kolmogorov CDF, n=20";
call series(D, CDF) grid={x y};       /* create a graph of the CDF */

The graph enables you to read probabilities and quantiles for the D20 distribution. The graph shows that almost all random samples of size 20 (drawn from the reference distribution) will have a D statistic of less than 0.4. Geometrically, the D statistic represents the largest gap between the empirical distribution and the reference distribution. A large gap indicates that the sample was probably not drawn from the reference distribution.

Visualize a family of Kolmogorov D distributions

The previous graph was for n=20, but you can repeat the computation for other values of n to obtain a family of CDF curves for the Kolmogorov D statistic. (Facchinetti writes Dn to emphasize that the statistic depends on the sample size.) The computation is available in the program that accompanies this article. The results are shown below:

This graph enables you to estimate probabilities and quantiles for several Dn distributions.

Visualize a family of density curves

Because the PDF is the derivative of the CDF, you can use a forward difference approximation of the derivative to also plot the density of the distribution. The following density curves are derived from the previous CDF curves:

A table of critical values

If you can compute the CDF, you can compute quantiles by using the inverse CDF. You can use a root-finding technique to obtain a quantile.

In the SAS/IML language, the FROOT function can find univariate roots. The critical values of the Dn statistic are usually tabulated by listing the sample size down columns and the significance level (α) or confidence level (1-α) across the columns. Facchinetti (p. 352) includes a table of critical values in her paper. The following SAS/IML statements show how to recreate the table. For simplicity, I show only a subset of the table for n=20, 25, 20, and 35, and for four common values of α.

/* A critical value is the root of the function
   f(D) = KolmogorovCDF(n, D) - (1-alpha)
   for any specified n and alpha value.
start KolmogorovQuantile(D) global (n, alpha);
   return  KolmogorovCDF(n, D) - (1-alpha);
/* create a table of critical values for the following vector of n and alpha values */
nVals = do(20, 35, 5);
alphaVals = {0.001 0.01 0.05 0.10};
CritVal = j(ncol(nVals), ncol(alphaVals), .);
do i = 1 to ncol(nVals);
   n = nVals[i];
   do j = 1 to ncol(alphaVals);
      alpha = alphaVals[j];
      CritVal[i,j] = froot("KolmogorovQuantile", {0.01, 0.99});
print CritVal[r=nVals c=alphaVals format=7.5 L="Critical Values (n,alpha)"];

For each sample size, the corresponding row of the table shows the critical values of D for which CDF(D) = 1-α. Of course, a printed table is unnecessary when you have a programmatic way to compute the quantiles.


Facchinetti (2009) shows how to compute the CDF for the Kolmogorov D distribution for any value of n and D. The Kolmogorov D statistic is often used for goodness-of-fit tests. Facchinetti's method consists of forming and solving a system of linear equations. To make her work available to the SAS statistical programmer, I translated her MATLAB program into a SAS/IML program that computes the CDF of the Kolmogorov D distribution. This article demonstrates how to use the KolmogorovCDF in SAS/IML to compute and visualize the CDF and density of the Kolmogorov D statistic. It also shows how to use the computation in conjunction with a root-finding method to obtain quantiles and exact critical values of the Kolmogorov D distribution.

The post The Kolmogorov D distribution and exact critical values appeared first on The DO Loop.

6月 222020

Sometimes in matrix computations, it is important to display the nonzero elements of a matrix. This can be useful for visualizing the structure of a sparse matrix (one that has many zeros) and it is also useful when describing a matrix algorithm (such as Gaussian elimination) that introduces zeros at each step of the algorithm.

Let A be the matrix you want to visualize. You can visualize the nonzero elements of the matrix A in a graph or in a table:

  • Use a custom SAS format that prints an asterisk (*) for nonzero values and a blank for zero values.
  • Create a discrete heat map of the binary matrix A^=0, which replaces the nonzero cells of A with the value 1.

A text display method

For small matrices, you can use a custom SAS format to print the matrix. The format is defined by using PROC FORMAT. This format displays a star (*) for nonzero values and displays a blank for zero values:

proc format;
value SparseFmt   
      0 = " "      /* display blank instead of 0 */
      other = "*"; /* display '*' instead of number */
proc iml;
A = {1.0  0.2  0.0  0.0  0.0  0.0 1e-6,
     0.2  1.0  0.2  0.0  0.0  0.0 0.0,
     0.0  0.2  1.0  0.2  0.0  0.0 0.0,
     0.0  0.0  0.2  1.0  0.2  0.0 0.0,
     1e-6 0.0  0.0  0.2  1.0  0.2 0.0,
     2e-6 1e-6 0.0  0.0  0.2  1.0 0.2,
     3e-6 2e-6 1e-6 0.0  0.0  0.2 1.0 };
/* use a custom format to print * or blank in each cell */
print A[format=SparseFmt.];
Visualize a sparse matrix by using a custom format that print a blank or an asterisk

You can see the banded structure of the matrix and also see that elements near the corners are nonzero. This textual visualization is concise and is used in textbooks about numerical linear algebra.

A graphical method

For larger matrices, a graphical method is probably better. You can form the binary logical matrix B=(A^=0) and then plot B by using a two-color heat map. In the SAS/IML language, you can use the HEATMAPDISC call to create a discrete heat map of a matrix. I like to use white for the zeroes. The following SAS/IML statements define a 7 x 7 matrix and use a discrete heat map to display the nonzero values:

/* discrete heat map of binary matrix A^=0 */
ods graphics / width=400px height=400px;
call heatmapdisc(A^=0) colorramp={white steelblue}
         title="Discrete heat map of binary matrix";
Visualize a sparse matrix by using a heat map of a binary matrix where each cell is zero (0) or nonzero (1)

The information in the heat map is identical to the printed display. However, for large matrices, the heat map is easier to view. As shown, you can also use the ODS GRAPHICS statement to control the dimensions of the graph. So, for example, if you are visualizing a square matrix, you can make the graph square.

The post Visualize the structure of a sparse matrix appeared first on The DO Loop.

12月 162019
Rockin' around the Christmas tree
At the Christmas party hop.
     – Brenda Lee
Christmas tree image with added noise

Last Christmas, I saw a fun blog post that used optimization methods to de-noise an image of a Christmas tree. Although there are specialized algorithms that remove random noise from an image, I am not going to use a specialized de-noising algorithm. Instead, I will use the SVD algorithm, which is a general-purpose matrix algorithm that has many applications. One application is to construct a low-rank approximation of a matrix. This article applies the SVD algorithm to a noisy image of a Christmas tree. The result is recognizable as a Christmas tree, but much of the noise has been eliminated.

A noisy Christmas tree image

The image to the right is a heat map of a binary (0/1) matrix that has 101 rows and 100 columns. First, I put the value 1 in certain cells so that the heat map resembles a Christmas tree. About 41% of the cells have the value 1. We will call these cells "the tree".

To add random noise to the image, I randomly switched 10% of the cells. The result is an image that is recognizable as a Christmas tree but is very noisy.

Apply the SVD to a matrix

As mentioned earlier, I will attempt to de-noise the matrix (image) by using the general-purpose SVD algorithm, which is available in SAS/IML software. If M is the 101 x 100 binary matrix, then the following SAS/IML statements factor the matrix by using a singular-value decomposition. A series plot displays the first 20 singular values (ordered by magnitude) of the noisy Christmas tree matrix:

call svd(U, D, V, M);                             /* A = U*diag(D)*V` */
call series(1:20, D) grid={x y} xvalues=1:nrow(D) label={"Component" "Singular Value"};

The graph (which is similar to a scree plot in principal component analysis) indicates that the first four singular values are somewhat larger than the remainder. The following statements approximate M by using rank-4 matrix. The rank-4 matrix is not a binary matrix, but you can visualize the rank-4 approximation matrix by using a continuous heat map. For convenience, the matrix elements are shifted and scaled into the interval [0,1].

keep = 4;        /* how many components to keep? */
idx = 1:keep;
Ak = U[ ,idx] * diag(D[idx]) * V[ ,idx]`;
Ak = (Ak - min(Ak)) / range(Ak); /* for plotting, standardize into [0,1] */
s = "Rank " + strip(char(keep)) + " Approximation of Noisy Christmas Tree Image";
call heatmapcont(Ak) colorramp=ramp displayoutlines=0 showlegend=0 title=s range={0, 1};

Apply a threshold cutoff to classify pixels

The continuous heat map shows a dark image of the Christmas tree surrounded by a light-green "fog". The dark-green pixels represent cells that have values near 1. The light-green pixels have smaller values. You can use a histogram to show the distribution of values in the rank-4 matrix:

ods graphics / width=400px height=300px;
call histogram(Ak);

You can think of the cell values as being a "score" that predicts whether each cell belongs to the Christmas tree (green) or not (white). The histogram indicates that the values in the matrix appear to be a mixture of two distributions. The high values in the histogram correspond to dark-green pixels (cells); the low values correspond to light-green pixels. To remove the light-green "fog" in the rank-4 image, you can use a "threshold value" to convert the continuous values to binary (0/1) values. Essentially, this operation predicts which cells are part of the tree and which are not.

For every choice of the threshold parameter, there will be correct and incorrect predictions:

  • A "true positive" is a cell that belongs to the tree and is colored green.
  • A "false positive" is a cell that does not belong to the tree but is colored green.
  • A "true negative" is a cell that does not belong to the tree and is colored white.
  • A "false negative" is a cell that belongs to the tree but is colored white.

By looking at the histogram, you might guess that a threshold value of 0.5 will do a good job of separating the low and high values. The following statements use 0.5 to convert the rank-4 matrix into a binary matrix, which you can think of as the predicted values of the de-noising process:

t = 0.5;  /* threshold parameter */
s = "Discrete Filter: Threshold = " + strip(char(t,4,2)) ;
call HeatmapDisc(Ak > t) colorramp=ramp displayoutlines=0 showlegend=0 title=s;

Considering that 10% of the original image was corrupted, the heat map of the reconstructed matrix is impressive. It looks like a Christmas tree, but has much less noise. The reconstructed matrix agrees with the original (pre-noise) matrix for 98% of the cells. In addition:

  • There are only a few false positives (green dots) that are far from the tree.
  • There are some false negatives in the center of the tree, but many fewer than were in the original noisy image.
  • The locations where the image is the most ragged is along the edge and trunk of the Christmas tree. In that region, the de-noising could not tell the difference between tree and non-tree.

The effect of the threshold parameter

You might wonder what the reconstruction would look like for different choices of the cutoff threshold. The following statements create two other heat maps, one for the threshold value 0.4 and the other for the threshold value 0.6. As you might have guessed, smaller threshold values result in more false positives (green pixels away from the tree), whereas higher threshold values result in more false negatives (white pixels inside the tree).

do t = 0.4 to 0.6 by 0.2;
   s = "Discrete Filter: Threshold = " + strip(char(t,4,2)) ;
   call HeatmapDisc(Ak > t) colorramp=ramp displayoutlines=0 showlegend=0 title=s;


Although I have presented this experiment in terms of an image of a Christmas tree, it is really a result about matrices and low-rank approximations. I started with a binary 0/1 matrix, and I added noise to 10% of the cells. The SVD factorization enables you to approximate the matrix by using a rank-4 approximation. By using a cutoff threshold, you can approximate the original matrix before the random noise was added. Although the SVD algorithm is a general-purpose algorithm that was not designed for de-noising images, it does a good job eliminating the noise and estimating the original matrix.

If you are interested in exploring these ideas for yourself, you can download the complete SAS program that creates the graphs in this article.

The post Math-ing around the Christmas tree: Can the SVD de-noise an image? appeared first on The DO Loop.

12月 112019

Binary matrices are used for many purposes. I have previously written about how to use binary matrices to visualize missing values in a data matrix. They are also used to indicate the co-occurrence of two events. In ecology, binary matrices are used to indicate which species of an animal are present in which ecological site. For example, if you remember your study of Darwin's finches in high school biology class, you might recall that certain finches (species) are present or absent on various Galapagos islands (sites). You can use a binary matrix to indicate which finches are present on which islands.

Recently I was involved in a project that required performing a permutation test on rows of a binary matrix. As part of the project, I had to solve three smaller problems involving rows of a binary matrix:

  1. Given two rows, find the locations at which the rows differ.
  2. Given two binary matrices, determine how similar the matrices are by computing the proportion of elements that are the same.
  3. Given two rows, swap some of the elements that differ.

This article shows how to solve each problem by using the SAS/IML matrix language. A future article will discuss permutation tests for binary matrices. For clarity, I introduce the following macro that uses a temporary variable to swap two sets of values:

/* swap values of A and B. You can use this macro in the DATA step or in the SAS/IML language */
%macro SWAP(A, B, TMP=_tmp);
   &TMP = &A; &A = &B; &B = &TMP;

Where do binary matrices differ?

The SAS/IML matrix language enables you to treat matrices as high-level objects. You often can answer questions about matrices without writing loops that iterate over the elements of the matrices. For example, if you have two matrices of the same dimensions, you can determine the cells at which the matrices are unequal by using the "not equal" (^=) operator. The following SAS/IML statements define two 2 x 10 matrices and use the "not equal" operator to find the cells that are different:

proc iml;
A = {0 0 1 0 0 1 0 1 0 0 ,
     1 0 0 0 0 0 1 1 0 1 };
B = {0 0 1 0 0 0 0 1 0 1 ,
     1 0 0 0 0 1 1 1 0 0 };
colLabel = "Sp1":("Sp"+strip(char(ncol(A))));
rowLabel = "A1":("A"+strip(char(nrow(A))));
/* 1. Find locations where binary matrices differ */
Diff = (A^=B);
print Diff[c=colLabel r=rowLabel];

The matrices A and B are similar. They both have three 1s in the first row and fours 1s in the second row. However, the output shows that the matrices are different for the four elements in the sixth and tenth columns. Although I used entire matrices for this example, the same operations work on row vectors.

The proportion of elements in common

You can use various statistics to measure the similarity between the A and B matrices. A simple statistic is the proportion of elements that are in common. These matrices have 20 elements, and 16/20 = 0.8 are common to both matrices. You can compute the proportion in common by using the express (A=B)[:], or you can use the following statements if you have previously computed the Diff matrix:

/* 2. the proportion of elements in B that are the same as in A */
propDiff = 1 - Diff[:];
print propDiff;

As a reminder, the mean subscript reduction operator ([:]) computes the mean value of the elements of the matrix. For a binary matrix, the mean value is the proportion of ones.

Swap elements of rows

The first two tasks were easy. A more complicated task is swapping values that differ between rows. The swapping operation is not difficult, but it requires finding the k locations where the rows differ and then swapping all or some of those values. In a permutation test, the number of elements that you swap is a random integer between 1 and k, but for simplicity, this example uses the SWAP macro to swap two cells that differ. For clarity, the following example uses temporary variables such as x1, x2, d1, and d2 to swap elements in the matrix A:

/* specify the rows whose value you want to swap */
i1 = 1;                         /* index of first row to compare and swap */
i2 = 2;                         /* index of second row to compare and swap */
/* For clarity, use temp vars x1 & x2 instead of A[i1, ] and A[i2, ] */
x1 = A[ i1, ];                  /* get one row */
x2 = A[ i2, ];                  /* get another row */
idx = loc( x1^=x2 );            /* find the locations where rows differ */
if ncol(idx) > 0 then do;       /* do the rows differ? */
   d1 = x1[ ,idx];              /* values at the locations that differ */
   d2 = x2[ ,idx];
   print (d1//d2)[r={'r1' 'r2'} L='Original'];
   /* For a permutation test, choose a random number of locations to swap */
   /* numSwaps = randfun(1, "Integer", 1, n);
      idxSwap = sample(1:ncol(idx), numSwaps, "NoReplace");  */
   idxSwap = {2 4};              /* for now, hard-code locations to swap */
   %SWAP(d1[idxSwap], d2[idxSwap]);
   print (d1//d2)[r={'d1' 'd2'} L='New'];
   /* update matrix */
   A[ i1, idx] = d1;
   A[ i2, idx] = d2;
print A[c=colLabel r=rowLabel];

The vectors x1 and x2 are the rows of A to compare. The vectors d1 and d2 are the subvectors that contain only the elements of x1 and x2 that differ. The example swaps the second and fourth columns of d1 and d2. The new values are then inserted back into the matrix A. You can compare the final matrix to the original to see that the process swapped two elements in each of two rows.

Although the examples are for binary matrices, these techniques work for any numerical matrices.

The post Swap elements in binary matrices appeared first on The DO Loop.

11月 272019
Visualization of a quadratic function and a linear subspace

This article discusses how to restrict a multivariate function to a linear subspace. This is a useful technique in many situations, including visualizing an objective function that is constrained by linear equalities. For example, the graph to the right is from a previous article about how to evaluate quadratic polynomials. The graph shows a heat map for a quadratic polynomial of two variables. The diagonal line represents a linear constraint between the X and Y variables. If you restrict the polynomial to the diagonal line, you obtain a one-dimensional function that you can easily graph and visualize. By repeating this process for other lines, you can construct a "stack" of lower-dimensional slices that enable you to understand the graph of the original bivariate function.

This example generalizes. For a function of many variables, you can restrict the function to a lower-dimensional subspace. By visualizing the restricted function, you can understand the high-dimensional function better. I previously demonstrated this technique for multidimensional regression models by using the SLICEFIT option in the EFFECTPLOT statement in SAS. This article shows how to use ideas from vector calculus to restrict a function to a parameterized linear subspace.

Visualize high-dimensional data and functions

There are basically two techniques for visualizing high-dimensional objects: projection and slicing. Many methods in multivariate statistics compute a linear subspace such that the projection of the data onto the subspace has a desirable property. One example is principal component analysis, which endeavors to find a low dimensional subspace that captures most of the variation in the data. Another example is linear discriminant analysis, which aims to find a linear subspace that maximally separates groups in the data. In both cases, the data are projected onto the computed subspaces to reveal characteristics of the data. There are also statistical methods that attempt to find nonlinear subspaces.

Whereas projection is used for data, slicing is often used to visualize the graphs of functions. In regression, you model a response variable as a function of the explanatory variables. Slicing the function along a linear subspace of the domain can reveal important characteristics about the function. That is the method used by the SLICEFIT option in the EFFECTPLOT statement, which enables you to visualize multidimensional regression models. The most common "slice" for a regression model is to specify constant values for all but one continuous variable in the model.

When visualizing an objective function that is constrained by a set of linear equalities, the idea is similar. The main difference is that the "slice" is usually determined by a general linear combination of variables.

Restrict a function to a one-dimensional subspace

A common exercise in multivariate calculus is to restrict a bivariate function to a line. (This is equivalent to a "slice" of the bivariate function.) Suppose that you choose a point, x0, and a unit vector, u, which represents the "direction". Then a parametric line through x0 in the direction of u is given by the vector expression v(t) = x0 + t u, where t is any real number. If you restrict a multivariate function to the image of v, you obtain a one-dimensional function.

For example, consider the quadratic polynomial in two variables:
f(x,y) = (9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6
The function is visualized by the heat map at the top of this article. Suppose x0 = {0, -1} and choose u to be the unit vector in the direction of the vector {3, 1}. (This choice corresponds to a linear constraint of the form x – 3y = 3.)

The following SAS/IML program constructs the parameterized line v(t) for a uniform set of t values. The quadratic function is evaluated at this set of values and the resulting one-dimensional function is graphed:

proc iml;
/* Evaluate the quadratic function at each column of X and return a row vector. */
start Func(XY);
   x = XY[1,]; y = XY[2,];
   return (9*x##2  + x#y + 4*y##2) - 12*x - 4*y + 6;
x0 = {0, -1};          /* evaluate polynomial at this point */
d  = {3, 1};           /* vector that determines direction */
u  = d / norm(d);      /* unit vector (direction) */
t = do(-2, 2, 0.1);    /* parameter values */
v = x0 + t @ u;        /* evenly spaced points along the linear subspace */
f = Func(v);           /* evaluate the 2-D function along the 1-D subspace */
title "Quadratic Form Evaluated on Linear Subspace";
call series(t, f) grid={x y};
Quadratic function restricted to linear subspace

The graph shows the restriction of the 2-D function to the 1-D subspace. According to the graph, the restricted function reaches a minimum value for t ≈ 0.92. The corresponding (x, y) values and the corresponding function value are shown below:

tMin = 0.92;           /* t* = parameter near the minimum */
vMin = x0 + tMin * u;  /* corresponding (x(t*), y(t*)) */
fMin = Func(vMin);     /* f(x(t*), y(t*)) */
print tMin vMin fMin;

If you look back to the heat map at the top of this article, you will see that the value (x,y) = (0.87,-0.71) correspond to the location for which the function achieves a minimum value when restricted to the linear subspace. The value of the function at that (x,y) value is approximately 6.6.

The SAS/IML program uses the Kronecker direct-product operator (@) to create points along the line. The operation is v = x0 + t @ u. The symbols x0 and u are both column vectors with two elements. The Kronecker product creates a 2 x k matrix, where k is the number of elements in the row vector t. The Kronecker product enables you to vectorize the program instead of looping over the elements of t and forming the individual vectors x0 + t[i]*u.

You can generalize this example to evaluate a multivariate function on a two-dimensional subspace. If u1 and u2 are two orthogonal, p-dimensional, unit vectors, the expression x0 + s*u1 + t*u2 spans a two-dimensional subspace of Rp. You can use the ExpandGrid function in SAS/IML to generate ordered pairs (s, t) on a two-dimensional grid.


In summary, you can slice a multivariate function along a linear subspace. Usually, 1-D or 2-D subspaces are used and the resulting (restricted) function is graphed. This helps you to understand the function. You can choose a series of slices (often parallel to each other) and "stack" the slices in order to visualize the multivariate function. Typically, this technique works best for functions that have three or four continuous variables.

You can download the SAS program that creates the graphs in this article.

The post Evaluate a function on a linear subspace appeared first on The DO Loop.

11月 252019

What is an efficient way to evaluate a multivariate quadratic polynomial in p variables? The answer is to use matrix computations! A multivariate quadratic polynomial can be written as the sum of a purely quadratic term (degree 2), a purely linear term (degree 1), and a constant term (degree 0). The purely quadratic term is called a quadratic form. This article shows how to use matrix computations to efficiently evaluate a multivariate quadratic polynomial.

Quadratic polynomials and matrix expressions

Visualization of a quadratic polynomial in SAS

As I wrote in a previous article about optimizing a quadratic function, the matrix of second derivatives and the gradient of first derivatives appear in the matrix representation of a quadratic polynomial. I will use the same example as in my previous article. Namely, the following quadratic polynomial in two variables:
f(x,y) = (9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6
       = (1/2) x` Q x + L` x + 6
where Q = {18  1, 1  8} is a 2x2 symmetric matrix, L = {-12  -4} is a column vector, and x = {x, y} is a column vector that represents the coordinates at which to evaluate the quadratic function.

The graph at the right visualizes this quadratic function by using a heat map. Small values of the function are shown in white. Larger values of the function are shown in blues and greens. The largest values are shown in reds and blacks. The global minimum of this function is approximately (x, y) = (0.64, 0.42), and that point is indicated by a star.

This example generalizes. Every quadratic function in p variables can be written as the sum of a quadratic form (1/2 x` Q x), a linear term (L` x) and a constant. Here Q is a p x p symmetric matrix of second derivatives (the Hessian matrix of f) and L and x are p-dimensional column vectors.

Evaluate quadratic polynomial in SAS

Because the computation involves vectors and matrices, the SAS/IML language is the natural place to use matrices to evaluate a quadratic function. The following SAS/IML statements implement a simple function to evaluate a quadratic polynomial (given in terms of Q, L, and a constant) at an arbitrary two-dimensional vector, x:

proc iml;
/* Evaluate  f(x) = 0.5 * x` * Q * x + L`*x + const, where
   Q is p x p symmetric matrix, 
   L and x are col vector with p elements.
   This version evaluates ONE vector x and returns a scalar value. */
start EvalQuad(x, Q, L, const=0);
   return 0.5 * x`*Q*x + L`*x + const;
/* compute Q and L for f(x,y)= 9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6 */
Q = {18 1,             /* matrix of second derivatives */
      1 8};
L = { -12, -4};        /* use column vectors for L and x */
const = 6;
x0 = {0, -1};          /* evaluate polynomial at this point */
f = EvalQuad(x0, Q, L, const);
print f[L="f(0,-1)"];

Evaluate a quadratic polynomial at multiple points

When I implement a function in the SAS/IML language, I try to "vectorize" it so that it can evaluate multiple points in a single call. Often you can use matrix operations to vectorize a function evaluation, but I don't see how to make the math work for this problem. The natural way to evaluate a quadratic polynomial at k vectors X1, X2, ..., Xk, is to pack those vectors into a p x k matrix X such that each column of X is a point at which to evaluate the polynomial. Unfortunately, the matrix computation of the quadratic form M = 0.5 * X`*Q*X results in a k x k matrix. Only the k diagonal elements are needed for evaluating the polynomial on the k input vectors, so although it is possible to compute M, doing so would be very inefficient.

In this case, it seems more efficient to loop over the columns of X. The following function implements a SAS/IML module that evaluates a quadratic polynomial at every column of X and returns a row vector of the results. The module is demonstrated by calling it on a matrix that has 5 columns.

/* Evaluate the quadratic function at each column of X and return a row vector. */
start EvalQuadVec(X, Q, L, const=0);
   f = j(1, ncol(X), .);
   do i = 1 to ncol(X);
      v = X[,i];
      f[i] = 0.5 * v`*Q*v + L`*v + const;
   return f;
/*    X1  X2  X3 X4  X5  */
vx = {-1 -0.5 0  0.5 1 ,
      -3 -2  -1  0   1 };
f = EvalQuadVec(vx, Q, L, const=0);
print (vx // f)[r={'x' 'y' 'f(x,y)'} c=('X1':'X5')];

Evaluate a quadratic polynomial on a uniform grid of points

You can use the EvalQuadVec function to evaluate a quadratic polynomial on any set of multiple points. In particular, you can use the ExpandGrid function to construct a regular 2-D grid of points. By evaluating the function at each point on the grid, you can visualize the function. The following statements create a heat map of the function on a regular grid. The heat map is shown at the top of this article.

x = do(-1, 1, 0.1); 
y = do(-3, 1.5, 0.1);
xy = expandgrid(x, y);             /* 966 x 2 matrix */
f = EvalQuadVec(xy`, Q, L, const); /* evaluate polynomial at all points */
/* write results to a SAS data set and visualize the function by using a heat map */
M = xy || f`;
create Heatmap from M[c={'x' 'y' 'f'}];  append from M;  close;
data optimal;
   xx=0.64; yy=0.42;  /* optional: add the optimal (x,y) value */
data All;  set Heatmap optimal; run;
title "Heat Map of Quadratic Function";
proc sgplot data=All;
   heatmapparm x=x y=y colorresponse=f / colormodel= (WHITE CYAN YELLOW RED BLACK);
   scatter x=xx y=yy / markerattrs=(symbol=StarFilled);
   xaxis offsetmin=0 offsetmax=0;
   yaxis offsetmin=0 offsetmax=0;

Quadratic approximations

Perhaps you do not often use quadratic polynomials. This technique is useful even for general nonlinear functions because it enables you to find the best quadratic approximation to a multivariate function at any point x0. The multivariate Taylor series at the point x0, truncated at second order, is
f(x) ≈ f(x0) + L` · (xx0) + (1/2) (xx0)` · Q · (xx0)
where L = ∇f(x0) is the gradient of f evaluate at x0 and Q = D2f(x0) is the symmetric Hessian matrix of second derivatives of f evaluated at x0.


In summary, you can use matrix computations to evaluate a multivariate quadratic polynomial. This article shows how to evaluate a quadratic polynomial at multiple points. For a polynomial of two variables, you can use this technique to visualize quadratic functions.

The post Evaluate a quadratic polynomial in SAS appeared first on The DO Loop.

11月 132019

Biplots are two-dimensional plots that help to visualize relationships in high dimensional data. A previous article discusses how to interpret biplots for continuous variables. The biplot projects observations and variables onto the span of the first two principal components. The observations are plotted as markers; the variables are plotted as vectors. The observations and/or vectors are not usually on the same scale, so they need to be rescaled so that they fit on the same plot. There are four common scalings (GH, COV, JK, and SYM), which are discussed in the previous article.

This article shows how to create biplots in SAS. In particular, the goal is to create the biplots by using modern ODS statistical graphics. You can obtain biplots that use the traditional SAS/GRAPH system by using the %BIPLOT macro by Michael Friendly. The %BIPLOT macro is very powerful and flexible; it is discussed later in this article.

There are four ways to create biplots in SAS by using ODS statistical graphics:

  • You can use PROC PRINQUAL in SAS/STAT software to create the COV biplot.
  • If you have a license for SAS/GRAPH software (and SAS/IML software), you can use Friendly's %BIPLOT macro and use the OUT= option in the macro to save the coordinates of the markers and vectors. You can then use PROC SGPLOT to create a modern version of Friendly's biplot.
  • You can use the matrix computations in SAS/IML to "manually" compute the coordinates of the markers and vectors. (These same computations are performed internally by the %BIPLOT macro.) You can use the Biplot module to create a biplot, or you can use the WriteBiplot module to create a SAS data set that contains the biplot coordinates. You can then use PROC SGPLOT to create the biplot.

For consistency with the previous article, all methods in this article standardize the input variables to have mean zero and unit variance (use the SCALE=STD option in the %BIPLOT macro). All biplots show projections of the same four-dimensional Fisher's iris data. The following DATA step assigns a blank label. If you do not supply an ID variable, some biplots display observations numbers.

data iris;
   set Sashelp.iris;
   id = " ";        /* create an empty label variable */

Use PROC PRINQUAL to compute the COV biplot

The PRINQUAL procedure can perform a multidimensional preference analysis, which is visualized by using a MDPREF plot. The MDPREF plot is closely related to biplot (Jackson (1991), A User’s Guide to Principal Components, p. 204). You can get PROC PRINQUAL to produce a COV biplot by doing the following:

  • Use the N=2 option to specify you want to compute two principal components.
  • Use the MDPREF=1 option to specify that the procedure should not rescale the vectors in the biplot. By default, MDPREF=2.5 and the vectors appear 2.5 larger than they should be. (More on scaling vectors later.)
  • Use the IDENTITY transformation so that the variables are not transformed in a nonlinear manner.

The following PROC PRINQUAL statements produce a COV biplot (click to enlarge):

proc prinqual data=iris plots=(MDPref) 
              n=2       /* project onto Prin1 and Prin2 */
              mdpref=1; /* use COV scaling */
   transform identity(SepalLength SepalWidth PetalLength PetalWidth);  /* identity transform */
   id ID;
   ods select MDPrefPlot;
COV biplot, created in SAS by using PROC PRINQUAL

Use Friendly's %BIPLOT macro

Friendly's books [SAS System for Statistical Graphics (1991) and Visualizing Categorical Data (2000)] introduced many SAS data analysts to the power of using visualization to accompany statistical analysis, and especially the analysis of multivariate data. His macros use traditional SAS/GRAPH graphics from the 1990s. In the mid-2000s, SAS introduced ODS statistical graphics, which were released with SAS 9.2. Although the %BIPLOT macro does not use ODS statistical graphics directly, the macro supports the OUT= option, which enables you to create an output data set that contains all the coordinates for creating a biplot.

The following example assumes that you have downloaded the %BIPLOT macro and the %EQUATE macro from Michael Friendly's web site.

/* A. Use %BIPLOT macro, which uses SAS/IML to compute the biplot coordinates. 
      Use the OUT= option to get the coordinates for the markers and vectors.
   B. Transpose the data from long to wide form.
   C. Use PROC SGPLOT to create the biplot
%let FACTYPE = SYM;   /* options are GH, COV, JK, SYM */
title "Biplot: &FACTYPE, STD";
        var=SepalLength SepalWidth PetalLength PetalWidth, 
        factype=&FACTYPE,  /* GH, COV, JK, SYM */
        std=std,           /* NONE, MEAN, STD  */
        scale=1,           /* if you do not specify SCALE=1, vectors are auto-scaled */
        out=biplotFriendly,/* write SAS data set with results */ 
        symbols=circle dot, inc=1);
/* transpose from long to wide */
data Biplot;
set biplotFriendly(where=(_TYPE_='OBS') rename=(dim1=Prin1 dim2=Prin2 _Name_=_ID_))
    biplotFriendly(where=(_TYPE_='VAR') rename=(dim1=vx dim2=vy _Name_=_Variable_));
proc sgplot data=Biplot aspect=1 noautolegend;
   refline 0 / axis=x; refline 0 / axis=y;
   scatter x=Prin1 y=Prin2 / datalabel=_ID_;
   vector  x=vx    y=vy    / datalabel=_Variable_
                             lineattrs=GraphData2 datalabelattrs=GraphData2;
   xaxis grid offsetmin=0.1 offsetmax=0.2;
   yaxis grid;
SYM biplot, created in SAS by using ODS statistical graphics

Because you are using PROC SGPLOT to display the biplot, you can easily configure the graph. For example, I added grid lines, which are not part of the output from the %BIPLOT macro. You could easily change attributes such as the size of the fonts or add additional features such as an inset. With a little more work, you can merge the original data and the biplot data and color-code the markers by a grouping variable (such as Species) or by a continuous response variable.

Notice that the %BUPLOT macro supports a SCALE= option. The SCALE= option applies an additional linear scaling to the vectors. You can use this option to increase or decrease the lengths of the vectors in the biplot. For example, in the SYM biplot, shown above, the vectors are long relative to the range of the data. If you want to display vectors that are only 25% as long, you can specify SCALE=0.25. You can specify numbers greater than 1 to increase the vector lengths. For example, SCALE=2 will double the lengths of the vectors. If you omit the SCALE= option or set SCALE=0, then the %BIPLOT macro automatically scales the vectors to the range of the data. If you use the SCALE= option, you should tell the reader that you did so.

SAS/IML modules that compute biplots

The %BIPLOT macro uses SAS/IML software to compute the locations of the markers and vectors for each type of biplot. I wrote three SAS/IML modules that perform the three steps of creating a biplot:

  • The CalcBiplot module computes the projections of the observations and scores onto the first few principal components. This module (formerly named CalcPrinCompBiplot) was written in the mid-2000s and has been distributed as part of the SAS/IML Studio application. It returns the scores and vectors as SAS/IML matrices.
  • The WriteBiplot module calls the CalcBiplot module and then writes the scores to a SAS data set called _SCORES and the vectors (loadings) to a SAS data set called _VECTORS. It also creates two macro variables, MinAxis and MaxAxis, which you can use if you want to equate the horizontal and vertical scales of the biplot axes.
  • The Biplot function calls the WriteBiplot module and then calls PROC SGPLOT to create a biplot. It is the "raw SAS/IML" version of the %BIPLOT macro.

You can use the CalcBiplot module to compute the scores and vectors and return them in IML matrices. You can use the WriteBiplot module if you want that information in SAS data sets so that you can create your own custom biplot. You can use the Biplot module to create standard biplots. The Biplot and WriteBiplot modules are demonstrated in the next sections.

Use the Biplot module in SAS/IML

The syntax of the Biplot module is similar to the %BIPLOT macro for most arguments. The input arguments are as follows:

  • X: The numerical data matrix
  • ID: A character vector of values used to label rows of X. If you pass in an empty matrix, observation numbers are used to label the markers. This argument is ignored if labelPoints=0.
  • varNames: A character vector that contains the names of the columns of X.
  • FacType: The type of biplot: 'GH', 'COV', 'JK', or 'SYM'.
  • StdMethod: How the original variables are scaled: 'None', 'Mean', or 'Std'.
  • Scale: A numerical scalar that specifies additional scaling applied to vectors. By default, SCALE=1, which means the vectors are not scaled. To shrink the vectors, specify a value less than 1. To lengthen the vectors, specify a value greater than 1. (Note: The %BIPLOT macro uses SCALE=0 as its default.)
  • labelPoints: A binary 0/1 value. If 0 (the default) points are not labeled. If 1, points are labeled by the ID values. (Note: The %BIPLOT macro always labels points.)

The last two arguments are optional. You can specify them as keyword-value pairs outside of the parentheses. The following examples show how you can call the Biplot module in a SAS/IML program to create a biplot:

ods graphics / width=480px height=480px;
proc iml;
/* assumes the modules have been previously stored */
load module=(CalcBiplot WriteBiplot Biplot);
use sashelp.iris;
read all var _NUM_ into X[rowname=Species colname=varNames];
title "COV Biplot with Scaled Vectors and Labels";
run Biplot(X, Species, varNames, "COV", "Std") labelPoints=1;   /* label obs */
title "JK Biplot: Relationships between Observations";
run Biplot(X, NULL, varNames, "JK", "Std");
title "JK Biplot: Automatic Scaling of Vectors";
run Biplot(X, NULL, varNames, "JK", "Std") scale=0;            /* auto scale; empty ID var */
title "SYM Biplot: Vectors Scaled by 0.25";
run Biplot(X, NULL, varNames, "SYM", "Std") scale=0.25;        /* scale vectors by 0.25 */
SYM biplot with auto-scaled vectors. Biplot created by using ODS statistical graphics

The program creates four biplots, but only the last one is shown. The last plot uses the SCALE=0.25 option to rescale the vectors of the SYM biplot. You can compare this biplot to the SYM biplot in the previous section, which did not rescale the length of the vectors.

Use the WriteBiplot module in SAS/IML

If you prefer to write an output data set and then create the biplot yourself, use the WriteBiplot module. After loading the modules and the data (see the previous section), you can write the biplot coordinates to the _Scores and _Vectors data sets, as follows. A simple DATA step appends the two data sets into a form that is easy to graph:

run WriteBiplot(X, NULL, varNames, "JK", "Std") scale=0;   /* auto scale vectors */
data Biplot;
   set _Scores _Vectors;    /* append the two data sets created by the WriteBiplot module */
title "JK Biplot: Automatic Scaling of Vectors";
title2 "FacType=JK; Std=Std";
proc sgplot data=Biplot aspect=1 noautolegend;
   refline 0 / axis=x; refline 0 / axis=y;
   scatter x=Prin1 y=Prin2 / ; 
   vector  x=vx    y=vy    / datalabel=_Variable_
                             lineattrs=GraphData2 datalabelattrs=GraphData2;
   xaxis grid offsetmin=0.1 offsetmax=0.1 min=&minAxis max=&maxAxis;
   yaxis grid min=&minAxis max=&maxAxis;
JK biplot, created in SAS by using ODS statistical graphics

In the program that accompanies this article, there is an additional example in which the biplot data is merged with the original data so that you can color-code the observations by using the Species variable.


This article shows four ways to use modern ODS statistical graphics to create a biplot in SAS. You can create a COV biplot by using the PRINQUAL procedure. If you have a license for SAS/IML and SAS/GRAPH, you can use Friendly's %BIPLOT macro to write the biplot coordinates to a SAS data set, then use PROC SGPLOT to create the biplot. This article also presents SAS/IML modules that compute the same biplots as the %BIPLOT macro. The WriteBiplot module writes the data to two SAS data sets (_Score and _Vector), which can be appended and used to plot a biplot. This gives you complete control over the attributes of the biplot. Or, if you prefer, you can use the Biplot module in SAS/IML to automatically create biplots that are similar to Friendly's but are displayed by using ODS statistical graphics.

You can download the complete SAS program that is used in this article. For convenience, I have also created a separate file that defines the SAS/IML modules that create biplots.

The post Create biplots in SAS appeared first on The DO Loop.