Matrix Computations

12月 112017
 
Self-similar Christmas tree created in SAS

Happy holidays to all my readers! My greeting-card to you is an image of a self-similar Christmas tree. The image (click to enlarge) was created in SAS by using two features that I blog about regularly: matrix computations and ODS statistical graphics.

Self-similarity in Kronecker products

I have previously shown that the repeated Kronecker product of a binary matrix leads to self-similar structures.

Specifically, if M is a binary matrix then the Kronecker product M@M is a block matrix in which each 1 in the original matrix is replaced with a copy of M and each 0 is replaced by a zero block. (Here '@' is the Kronecker product (or direct product) operator, which is implemented in SAS/IML software.) If you repeat this process three or more times and create a heat map of the product matrix, the self-similar structure is apparent.

For example, the following 5 x 7 binary matrix has a 1 in positions that approximate the shape of a Christmas tree. The direct product M@M@M is a matrix that has 53 = 125 rows and 73 = 343 columns. If you use the HEATMAPDISC subroutine in SAS/IML to create a heat map of the direct product, you obtain the following image in which a green cell represents a 1:

proc iml;
M = {0 0 0 1 0 0 0,
     0 0 1 1 1 0 0,
     0 1 1 1 1 1 0,
     1 1 1 1 1 1 1,
     0 0 0 1 0 0 0};
Tree = M @ M @ M;     /* M is 5 x 7, so Tree is 125 x 343 */
 
ods graphics / width=300px height=450px  ANTIALIASMAX=50000;
call heatmapdisc(Tree) title="Happy Holidays to All my Readers!"
     colorramp={White Green} displayoutlines=0 ShowLegend=0;

That's not a bad result for three SAS/IML statements! You can see that the "tree" has a self-similar structure: the large tree is composed of 17 smaller trees, each of which is composed of 17 mini trees.

Adding a star

My readers are worth extra effort, so I want to put a star on the top of the tree. One way to do this is to use a scatter plot and plot a special observation by using a star symbol. To plot the "tree" by using a scatter plot, it is necessary to "unpack" the 125 x 343 matrix into a list of (x, y) values (this is a wide-to-long conversion of the matrix). You can use the NDX2SUB function to extract the row and column of each 1 in the matrix, as follows:

/* extract (row, column) pairs for matrix elements that are '1' */
idx = loc(Tree = 1);                /* indices for the 1s */
s = ndx2sub(dimension(Tree), idx);  /* convert indices to subscripts */
create Tree from s[c={"RNames" "CNames"}];
append from s;
close;
 
call symputx("XPos", range(s[,2])/2); /* midrange = horiz position of star */
quit;

The previous statements create a SAS data set called TREE that contains the (x, y) positions of the 1s in the direct product matrix. The statements also saved the midrange value to a macro variable. The following SAS procedures create the location of the star, concatenate the two data sets, and create a scatter plot of the result, which is shown at the top of this article.

data star;
x = &XPos; y = 0; 
run;
 
data Tree2;
set Tree star;
run;
 
ods graphics / width=600px height=800px  ANTIALIASMAX=50000;
title "Happy Holidays";
proc sgplot data=Tree2 noborder noautolegend;
scatter x=CNames y=RNames / markerattrs=(color=ForestGreen symbol=SquareFilled size=3);
scatter x=x y=y / markerattrs=(symbol=StarFilled size=20)             /* yellow star */
                  filledoutlinedmarkers markerfillattrs=(color=yellow);
yaxis reverse display=none;
xaxis display=none;
run;

Previous SAS-created Christmas cards

All year long I blog about how to use SAS for serious pursuits like statistical modeling, data analysis, optimization, and simulation. Consequently, I enjoy my occasional forays into a fun and frivolous topic such as how to create a greeting card with SAS. If you like seeing geeky SAS-generated images, here is a list of my efforts from previous years:

Wherever you are and whatever holiday you celebrate, may you enjoy peace and happiness this season and in the New Year.

The post A self-similar Christmas tree appeared first on The DO Loop.

8月 302017
 

Aa previous article discussed the mathematical properties of the singular value decomposition (SVD) and showed how to use the SVD subroutine in SAS/IML software. This article uses the SVD to construct a low-rank approximation to an image. Applications include image compression and denoising an image.

Construct a grayscale image

The value of each pixel in a grayscale image can be stored in a matrix where each element of the matrix is a value between 0 (off) and 1 (full intensity). I want to create a small example that is easy to view, so I'll create a small matrix that contains information for a low-resolution image of the capital letters "SVD." Each letter will be five pixels high and three pixels wide, arranges in a 7 x 13 array of 0/1 values. The following SAS/IML program creates the array and displays a heat map of the data:

ods graphics / width=390px height=210px;
proc iml;
SVD = {0 0 0 0 0 0 0 0 0 0 0 0 0,
       0 1 1 1 0 1 0 1 0 1 1 0 0,
       0 1 0 0 0 1 0 1 0 1 0 1 0,
       0 1 1 1 0 1 0 1 0 1 0 1 0,
       0 0 0 1 0 1 0 1 0 1 0 1 0,
       0 1 1 1 0 0 1 0 0 1 1 0 0,
       0 0 0 0 0 0 0 0 0 0 0 0 0 };
A = SVD;
/* ColorRamp:  White    Gray1    Gray2    Gray3    Black   */
ramp =        {CXFFFFFF CXF0F0F0 CXBDBDBD CX636363 CX000000};
call heatmapcont(A) colorramp=ramp showlegend=0 title="Original Image" range={0,1};
Rank-5 data matrix for singular value decomposition (SVD)

A low-rank approximation to an image

Because the data matrix contains only five non-zero rows, the rank of the A matrix cannot be more than 5. The following statements compute the SVD of the data matrix and create a plot of the singular values. The plot of the singular values is similar to the scree plot in principal component analysis, and you can use the plot to help choose the number of components to retain when approximating the data.

call svd(U, D, V, A);  /* A = U*diag(D)*V` */
title "Singular Values";
call series(1:nrow(D), D) grid={x y} xvalues=1:nrow(D) label={"Component" "Singular Value"};
Singular values of rank-5 data matrix

For this example, it looks like retaining three or five components would be good choices for approximating the data. To see how low-rank approximations work, let's generate and view the rank-1, rank-2, and rank-3 approximations:

keep = 1:3;          /* use keep=1:5 to see all approximations */
do i = 1 to ncol(keep);
   idx = 1:keep[i];                           /* components to keep */
   Ak = U[ ,idx] * diag(D[idx]) * V[ ,idx]`;  /* rank k approximation */
   Ak = (Ak - min(Ak)) / range(Ak);           /* for plotting, stdize into [0,1] */
   s = "Rank = " + char(keep[i],1);
   call heatmapcont(Ak) colorramp=ramp showlegend=0 title=s range={0, 1};
end;
Low-rank approximation: Rank-1 approximation via SVD of a data matrix

The rank-1 approximation does a good job of determining the columns and do not contain that contain the letters. The approximation also picks out two rows that contain the horizontal strokes of the capital "S."

Low-rank approximation: Rank-2 approximation via SVD of a data matrix

The rank-2 approximation refines the image and adds additional details. You can begin to make out the letters "SVD." In particular, all three horizontal strokes for the "S" are visible and you can begin to see the hole in the capital "D."

Low-rank approximation: Rank-3 approximation via SVD of a data matrix

The rank-3 approximation contains enough details that someone unfamiliar with the message can read it. The "S" is reconstructed almost perfectly and the space inside the "V" and "D" is almost empty. Even though this data is five-dimensional, this three-dimensional approximation is very good.

Not only is a low-rank approximation easier to work with than the original five-dimensional data, but a low-rank approximation represents a compression of the data. The original image contains 7 x 13 = 91 values. For the rank-3 approximation, three columns of the U matrix contain 33 numbers and three columns of VT contain 15 numbers. So the total number of values required to represent the rank-3 approximation is only 48, which is almost half the number of values as for the original image.

Denoising an image

You can use the singular value decomposition and low-rank approximations to try to eliminate random noise that has corrupted an image. Every TV detective series has shown an episode in which the police obtain a blurry image of a suspect's face or license plate. The detective asks the computer technician if she can enhance the image. With the push of a button, the blurry image is replaced by a crystal clear image that enables the police to identify and apprehend the criminal.

The image reconstruction algorithms used in modern law enforcement are more sophisticated than the SVD. Nevertheless, the SVD can do a reasonable job of removing small random noise from an image, thereby making certain features easier to see. The SVD has to have enough data to work with, so the following statements duplicate the "SVD" image four times before adding random Gaussian noise to the data. The noise has a standard deviation equal to 25% of the range of the data. The noisy data is displayed as a heat map on the range [-0.25, 1.25] by using a color ramp that includes yellow for negative values and blue for values greater than 1.

call randseed(12345);
A = (SVD // SVD) || (SVD // SVD);                     /* duplicate the image */
A = A + randfun( dimension(A), "Normal", 0, 0.25);    /* add Gaussian noise */
/*       Yellow   White    Gray1    Gray2    Gray3    Black    Blue    */
ramp2 = {CXFFEDA0 CXFFFFFF CXF0F0F0 CXBDBDBD CX636363 CX000000 CX3182BD};
call heatmapcont(A) colorramp=ramp2 showlegend=0 title="Noisy Image" range={-0.5, 1.5};
Image corrupted by Gaussian noise

I think most police officers would be able to read this message in spite of the added noise, but let's see if the SVD can clean it up. The following statements compute the SVD and create a plot of the singular values:

call svd(U, D, V, A);  /* A = U*diag(D)*V` */
call series(1:nrow(D), D) grid={x y} xvalues=1:nrow(D) label={"Component" "Singular Value"};
Plot of singular values for a small image that is corrupted by Gaussian noise

There are 14 non-zero singular values. In theory, the main signal is contained in the components that correspond to the largest singular values whereas the noise is captured by the components for the smallest singular values. For these data, the plot of the singular values suggests that three or five (or nine) components might capture the main signal while ignoring the noise. The following statements create and display the rank-3 and rank-5 approximations. Only the rank-5 approximation is shown.

keep = {3 5};        /* which components to examine? */
do i = 1 to ncol(keep);
   idx = 1:keep[i];
   Ak = U[ ,idx] * diag(D[idx]) * V[ ,idx]`;
   Ak = (Ak - min(Ak)) / range(Ak); /* for plotting, standardize into [0,1] */
   s = "Rank = " + char(keep[i],2);
   call heatmapcont(Ak) colorramp=ramp showlegend=0 title=s range={0, 1};
end;
Low-rank approximation: Rank-5 reconstruction via SVD of a small image that is corrupted by Gaussian noise

The denoised low-rank image is not as clear as Hollywood depicts, but it is quite readable. It is certainly good enough to identify a potential suspect. I can hear the detective mutter, "Book 'em, Danno! Murder One."

Summary and further reading

In summary, the singular value decomposition (SVD) enables you to approximate a data matrix by using a low-rank approximation. This article uses a small example for which the full data matrix is rank-5. A plot of the singular values can help you choose the number of components to retain. For this example, a rank-3 approximation represents the essential features of the data matrix.

For similar analyses and examples that use the singular value decomposition, see

In SAS software, the SVD is heavily used in the SAS Text Miner product. For an overview of how a company can use text mining to analyze customer support issues, see Sanders and DeVault (2004) "Using SAS at SAS: The Mining of SAS Technical Support."

The post The singular value decomposition and low-rank approximations appeared first on The DO Loop.

7月 312017
 

A SAS user needed to convert a program from MATLAB into the SAS/IML matrix language and asked whether these is a SAS/IML equivalent to the fliplr and flipud functions in MATLAB. These functions flip the columns or rows (respectively) of a matrix; "LR" stands for "left-right" and "UD" stands for "up-down."

No, SAS/IML does not have equivalent built-in functions, but as Devo once sang: "When a problem comes along, you must whip it." This article shows how to implement these functions in SAS/IML. Or, to paraphrase Devo's maxim: "When a matrix comes along, you must flip it. It's not too late to flip it. Flip it good!"

The FLIPLR and FLIPUD functions in SAS/IML

To key to reversing the columns or rows of a matrix is to use subscript notation. Recall that in the SAS/IML language, the expression A[ ,1] extracts the first column of A. (When the row subscript is not specified, it means "use all rows.") Similarly, A[ ,1:3] extracts the first three columns of A (in order), whereas A[ ,3:1] extracts the first three columns of A in reverse order. A similar notation is valid for rows. The expression A[3:1, ] extracts the first three rows of A in reverse order. Therefore, the following SAS/IML functions implement the functionality of the fliplr and flipud functions for matrices:

proc iml;
/* reverse columns of a matrix */
start fliplr(A);
   return A[ , ncol(A):1];
finish;
 
/* reverse rows of a matrix */
start flipud(A);
   return A[nrow(A):1, ];
finish;
 
/* test the functions on a 2x3 matrix */
x = {1 2 3, 4 5 6};
lr = fliplr(x);
ud = flipud(x);
print x, lr, ud;

Reversing all elements

You can also reverse all elements of a matrix. Recall that a SAS/IML matrix is stored in row-major order, which means that the elements are enumerated by counting across the first row, then across the second row, and so forth. Thus if A is an n x m matrix, then A[n*m:1] returns the elements of A in reverse order. However, the elements are returned in a column vector. If you want the elements in a matrix that has the same dimensions as A, you can "shape it up" by using the SHAPE function, as follows:

/* reverse elements of a matrix */
start reverse(A);
   n = nrow(A); m = ncol(A);
   return shape(A[n*m:1], n, m);     /* Now flip it into SHAPE */
finish;
 
rev = reverse(x);
print rev;

The MATLAB language also supports the rot90 function, which rotates a matrix. I have previously shown how to rotate the elements of a matrix in the SAS/IML language.

In conclusion, although SAS/IML does not contain built-in functions for reversing the rows and columns of a matrix, these functions are easily defined in terms of matrix subscript operations. In short, when given a matrix, it is not hard to flip it, flip it good!

The post Flip it. Flip it good. appeared first on The DO Loop.

7月 242017
 

For a time series { y1, y2, ..., yN }, the difference operator computes the difference between two observations. The kth-order difference is the series { yk+1 - y1, ..., yN - yN-k }. In SAS, The DIF function in the SAS/IML language takes a column vector of values and returns a vector of differences.

For example, the following SAS/IML statements define a column vector that has five observations and calls the DIF function to compute the first-order differences between adjacent observations. By convention, the DIF function returns a vector that is the same size as the input vector and inserts a missing value in the first element.

proc iml;
x = {0, 0.1, 0.3, 0.7, 1};
dif = dif(x);            /* by default DIF(x, 1) ==> first-order differences */
print x dif;
First-order difference of time series

The difference operator is a linear operator that can be represented by a matrix. The first nonmissing value of the difference is x[2]-x[1], followed by x[3]-x[2], and so forth. Thus the linear operator can be represented by the matrix that has -1 on the main diagonal and +1 on the super-diagonal (above the diagonal). An efficient way to construct the difference operator is to start with the zero matrix and insert ±1 on the diagonal and super-diagonal elements. You can use the DO function to construct the indices for the diagonal and super-diagonal elements in a matrix:

start DifOp(dim);
   D = j(dim-1, dim, 0);        /* allocate zero martrix */
   n = nrow(D); m = ncol(D);
   diagIdx = do(1,n*m, m+1);    /* index diagonal elements */
   superIdx  = do(2,n*m, m+1);  /* index superdiagonal elements */
   *subIdx  = do(m+1,n*m, m+1); /* index subdiagonal elements (optional) */
   D[diagIdx] = -1;             /* assign -1 to diagonal elements */
   D[superIdx] = 1;             /* assign +1 to super-diagonal elements */
   return D;
finish;
 
B = DifOp(nrow(x));
d = B*x;
print B, d[L="Difference"];
Difference operator and first-order difference of a time series

You can see that the DifOp function constructs an (n-1) x n matrix, which is the correct dimension for transforming an n-dimensional vector into an (n-1)-dimensional vector. Notice that the matrix multiplication omits the element that previously held a missing value.

You probably would not use a matrix multiplication in place of the DIF function if you needed the first-order difference for a single time series. However, the matrix formulation makes it possible to use one matrix multiplication to find the difference for many time series.

The following matrix contains three time-series, one in each column. The B matrix computes the first-order difference for all columns by using a single matrix-matrix multiplication. The same SAS/IML code is valid whether the X matrix has three columns or three million columns.

/* The matrix can operate on a matrix where each column is a time series */
x = {0    0    0,
     0.1  0.2  0.3,
     0.3  0.8  0.5,
     0.7  0.9  0.8,
     1    1    1   };
B = DifOp(nrow(x));
d = B*x;                        /* apply the difference operator */
print d[L="Difference of Columns"];
First-order differences for multiple time series

Other operators in time series analysis can also be represented by matrices. For example, the first-order lag operator is represented by a matrix that has +1 on the super-diagonal. Moving average operators also have matrix representations.

The matrix formulation is efficient for short time series but is not efficient for a time series that contains thousands of elements. If the time series contains n elements, then the dense-matrix representation of the difference operator contains about n2 elements, which consumes a lot of RAM when n is large. However, as we have seen, the matrix representation of an operator is advantageous when you want to operate on a large number of short time series, as might arise in a simulation.

The post Difference operators as matrices appeared first on The DO Loop.

11月 072016
 

Rotation matrices are used in computer graphics and in statistical analyses. A rotation matrix is especially easy to implement in a matrix language such as the SAS Interactive Matrix Language (SAS/IML). This article shows how to implement three-dimensional rotation matrices and use them to rotate a 3-D point cloud.

Define a 3-D rotation matrix

In three dimensions there are three canonical rotation matrices:

  • The matrix Rx(α) rotates points counterclockwise by the angle α about the X axis. Equivalently, the rotation occurs in the (y, z) plane.
  • The matrix Ry(α) rotates points counterclockwise by the angle α in the (x, z) plane.
  • The matrix Rz(α) rotates points counterclockwise by the angle α in the (x, y) plane.

Each of the following SAS/IML functions return a rotation matrix. The RotPlane function takes an angle and a pair of integers. It returns the rotation matrix that corresponds to a counterclockwise rotation in the (xi, xj) plane. The Rot3D function has a simpler calling syntax. You specify and axis (X, Y, or Z) to get a rotation matrix in the plane that is perpendicular to the specified axis:

proc iml;
/* Rotate a vector by a counterclockwise angle in a coordinate plane. 
        [ 1    0       0   ]
   Rx = [ 0  cos(a) -sin(a)]        ==> Rotate in the (y,z)-plane
        [ 0  sin(a)  cos(a)]
 
        [ cos(a)  0   -sin(a)]
   Ry = [   0     1      0   ]      ==> Rotate in the (x,z)-plane
        [ sin(a)  0    cos(a)]
 
        [ cos(a) -sin(a) 0]
   Rz = [ sin(a)  cos(a) 0]         ==> Rotate in the (x,y)-plane
        [   0       0    1]
*/
start RotPlane(a, i, j);
   R = I(3);  
   c = cos(a); s = sin(a);
   R[i,i] = c;  R[i,j] = -s;
   R[j,i] = s;  R[j,j] =  c;
   return R;
finish;
 
start Rot3D(a, axis);   /* rotation in plane perpendicular to axis */
   if upcase(axis)="X" then       
      return RotPlane(a, 2, 3);
   else if upcase(axis)="Y" then
      return RotPlane(a, 1, 3);
   else if upcase(axis)="Z" then
      return RotPlane(a, 1, 2);
   else return I(3);
finish;
store module=(RotPlane Rot3D);
quit;

NOTE: Some sources define rotation matrices by leaving the object still and rotating a camera (or observer). This is mathematically equivalent to rotating the object in the opposite direction, so if you prefer a camera-based rotation matrix, use the definitions above but specify the angle -α. Note also that some authors change the sign for the Ry matrix; the sign depends whether you are rotating about the positive or negative Y axis.


3-D rotation matrices are simple to construct and use in a matrix language #SASTip
Click To Tweet


Applying rotations to data

Every rotation is a composition of rotations in coordinate planes. You can compute a composition by using matrix multiplication. Let's see how rotations work by defining and rotating some 3-D data. The following SAS DATA step defines a point at the origin and 10 points along a unit vector in each coordinate direction:

data MyData;               /* define points on coordinate axes */
x = 0; y = 0; z = 0; Axis="O"; output;    /* origin */
Axis = "X";
do x = 0.1 to 1 by 0.1;    /* points along unit vector in x direction */
   output;
end;
x = 0; Axis = "Y";
do y = 0.1 to 1 by 0.1;    /* points along unit vector in y direction */
   output;
end;
y = 0; Axis = "Z";
do z = 0.1 to 1 by 0.1;    /* points along unit vector in z direction */
   output;
end;
run;
 
proc sgscatter data=Mydata;
matrix X Y Z;
run;

If you use PROC SGSCATTER to visualize the data, the results (not shown) are not very enlightening. Because the data are aligned with the coordinate directions, the projection of the 3-D data onto the coordinate planes always projects 10 points onto the origin. The projected data does not look very three-dimensional.

However, you can slightly rotate the data to obtain nondegenerate projections onto the coordinate planes. The following computations form a matrix P which represents a rotation of the data by -π/6 in one coordinate plane followed by a rotation by -π/3 in another coordinate plane:

proc iml;
/* choose any 3D projection matrix as product of rotations */
load module=Rot3D;
pi = constant('pi');
Rz = Rot3D(-pi/6, "Z");    /* rotation matrix for (x,y) plane */
Rx = Rot3D(-pi/3, "X");    /* rotation matrix for (y,z) plane */ 
Ry = Rot3D(    0, "Y");    /* rotation matrix for (x,z) plane */
P = Rx*Ry*Rz;              /* cumulative rotation */
print P;
A rotation matrix is a product of canonical rotations

For a column vector, v, the rotated vector is P*v. However, the data in the SAS data set is in row vectors, so use the transposed matrix to rotate all observations with a single multiplication, as follows:

use MyData;
read all var {x y z} into M;
read all var "Axis";
close;
RDat = M * P`;                    /* rotated data */

Yes, that's it. That one line rotates the entire set of 3-D data. You can confirm the rotation by plotting the projection of the data onto the first two coordinates:

title "Rotation and Projection of Data";
Px = RDat[,1]; Py = RDat[,2]; 
call scatter(Px, Py) group=Axis 
   option="markerattrs=(size=12 symbol=CircleFilled)";
Scatter plot of rotated and projected 3-D data

Alternatively, you can write the rotated data to a SAS data set. You can add reference axes to the plot if you write the columns of the P` matrix to the same SAS data set. The columns are the rotated unit vectors in the coordinate directions, so plotting those coordinates by using the VECTOR statement adds reference axes:

create RotData from RDat[colname={"Px" "Py" "Pz"}];
append from RDat;
close;
 
A = P`;          /* rotation of X, Y, and Z unit vectors */
create Axes from A[colname={"Ax" "Ay" "Az"}];  append from A; close;
Labels = "X":"Z";
create AxisLabels var "Labels";  append; close;
QUIT;
 
data RotData;    /* merge all data sets */
merge MyData Rot Axes AxisLabels;
run;
 
proc sgplot data=RotData;
   vector x=Ax y=Ay / lineattrs=(thickness=3) datalabel=Labels;
   scatter x=Px y=Py / group=Axis markerattrs=(size=12 symbol=CircleFilled);
   xaxis offsetmax=0.1; yaxis offsetmax=0.1; 
run;
Scatter plot of rotated and projected 3-D data

All the data points are visible in this projection of the (rotated) data onto a plane. The use of the VECTOR statement to add coordinate axes is not necessary, but I think it's a nice touch.

Visualizing clouds of 3-D data

This article is about rotation matrices, and I showed how to use matrices to rotate a 3-D cloud of observations. However, I don't want to give the impression that you have to use matrix operations to plot 3-D data! SAS has several "automatic" 3-D visualization methods that more convenient and do not require that you program rotation matrices. The visualization methods include

I also want to mention that Sanjay Matange created a 3-D scatter plot macro that uses ODS graphics to visualize a 3-D point cloud. Sanjay also uses rotation matrices, but because he uses the DATA step and PROC FCMP, his implementation is longer and less intuitive than the equivalent operations in the SAS/IML matrix language. In his blog he says that his macro "is provided for illustration purposes."

In summary, the SAS/IML language makes it convenient to define and use rotation matrices. An application is rotating a 3-D cloud of points. In my next blog post I will present a more interesting visualization example.

tags: Matrix Computations, Statistical Graphics

The post Rotation matrices and 3-D data appeared first on The DO Loop.

10月 312016
 

Every year near Halloween I write an article in which I demonstrate a simple programming trick that is a real treat to use. This year's trick (which features the CMISS function and the crossproducts matrix in SAS/IML) enables you to count the number of observations that are missing for pairs of columns. In other words, this trick counts how many missing values are in columns 1 and 2, columns 1 and 3, and so on for all "k choose 2" pairwise combinations of k columns.

This trick generalizes, so keep reading even if you don't care about counting missing values. Given ANY indicator matrix, you can use this trick to count the pairwise occurrence of events.


Counting pairwise events: A connection to matrix multiplication #Statistics
Click To Tweet


Counting missing values

Of course, you can count combinations of missing values without using SAS/IML or matrices. I've previously shown how to use PROC MI to count all combinations of missing values among a set of k variables. However, as I said, this SAS/IML trick enables you to count more than just missing values: it generalizes to count ANY pairwise event.

To get started, read data into a SAS/IML matrix. The following statements read 5209 observations and eight variables into the matrix X:

proc iml;
varNames = {"AgeAtStart" "Diastolic" "Systolic" 
            "Height" "Weight" "MRW" "Smoking" "Cholesterol"};
use Sashelp.Heart;               /* open data set */
read all var varNames into X;    /* create numeric data matrix, X */
close Sashelp.Heart;

The first part of the trick is to convert the matrix into a 0/1 indicator matrix, where 1 indicates that a data value is missing and 0 indicates nonmissing. I like to use the CMISS function in Base SAS for this task.

The second part of the trick is to recognize that if C is an indicator matrix, the crossproducts matrix P=C`C is a matrix that gives the counts of the pairwise events in C. The element P[i,j] is the inner product of the i_th and j_th columns of C, and because C is an indicator matrix the inner product counts the number of simultaneous "events" for column i and column j.

In the SAS/IML language, the code to compute pairwise counts of missing values requires only two lines:

C = cmiss(X);                    /* create 0/1 indicator matrix */
P = C`*C;                        /* pairwise counts */
print P[c=varNames r=varNames label="Pairwise Missing Counts"];
pairwise1

The P matrix is symmetric. The diagonal elements P[i,i] show the number of missing values for the i_th variable. For example, the Height, Weight, and MRW variables have 6 missing values whereas the Cholesterol variable has 152 missing values. The off-diagonal elements P[i,j] show the number of observations that are simultaneously missing for the i_th and j_th variables. For example, 28 observations have missing values for both the Smoking and Cholesterol variables; two observations have missing values for both the Height and Weight variables.

Counting any pair of events

If C is any indicator matrix, the crossproducts matrix P=C`C counts the number of observations for which two columns of C are simultaneously 1.

For example, the following statements standardize the data to create z-scores. You can form an indicator matrix that has the value 1 if a z-score is exceeds 3 in absolute value; these are outliers for normally distributed data. The crossproducts matrix counts the number of observations that are univariate outliers (on the diagonal) and the number that are outliers for a pair of variables:

Z = (X - mean(X)) / std(X);      /* standardize data */
L = (abs(Z) > 3);                /* L indicates outliers */
Q = L`*L;                        /* counts for pairs of outliers */
print Q[c=varNames r=varNames label="Pairwise Outliers"];
pairwise2

The table indicates that 52 patients have a diastolic blood pressure (BP) that is more than three standard deviations above the mean. Similarly, 83 patients are outliers for systolic BP, and 35 patients are outliers for both measurements.

The following graph visualizes the patients that have high blood pressure in each category:

Outlier = j(nrow(L), 1, "No         ");       /* create grouping variable */
Outlier[ loc(L[,2] | L[,3]) ] = "Univariate";
Outlier[ loc(L[,2] & L[,3]) ] = "Bivariate";
title "Standardized Blood Pressure";
title2 "Outliers More than Three Standard Deviations from Mean";
call scatter(Z[,2], Z[,3]) group=Outlier 
         label={"Standardized Diastolic" "Standardized Systolic"}
         option="transparency=0.5 markerattrs=(symbol=CircleFilled)"
         other="refline 3/axis=x; refline 3/axis=y;";
pairwise3

In conclusion, you can create an indicator matrix for any set of conditions. By forming the crossproducts matrix, you get the counts of all observations that satisfy each condition individually and in pairs. I think this trick is a real treat!

tags: Data Analysis, Matrix Computations, Tips and Techniques

The post Counting observations for which two events occur appeared first on The DO Loop.

10月 052016
 

What is weighted regression? How does it differ from ordinary (unweighted) regression? This article describes how to compute and score weighted regression models.

Visualize a weighted regression

Technically, an "unweighted" regression should be called an "equally weighted " regression since each ordinary least squares (OLS) regression weights each observation equally. Similarly, an "unweighted mean" is really an equally weighted mean.

Recall that weights are not the same as frequencies. When talking about weights, it is often convenient to assume that the weights sum to unity. This article uses standardized weights, although you can use specify any set of weights when you use a WEIGHT statement in a SAS procedure.

One way to look at a weighted regression is to assume that a weight is related to the variance of an observation. Namely, the i_th weight, wi, indicates that the variance of the i_th value of the dependent variable is σ2 / wi, where σ2 is a common variance. Notice that an observation that has a small weight (near zero) has a relatively large variance. Intuitively, the observed response is not known with much precision; a weighted analysis makes that observation less influential.


Weighted regression assumes that some response values are more precise than others. #StatWisdom
Click To Tweet


For example, the following SAS data set defines (x,y) values and weights (w) for 11 observations. Observations whose X value is close to x=10 have relatively large weights. Observations far from x=10 have small weights. The last observation (x=14) is assigned a weight of zero, which means that it will be completely excluded from the analysis.

data RegData;
input y x w;
datalines;
2.3  7.4  0.058 
3.0  7.6  0.073 
2.9  8.2  0.114 
4.8  9.0  0.144 
1.3 10.4  0.151 
3.6 11.7  0.119 
2.3 11.7  0.119 
4.6 11.8  0.114 
3.0 12.4  0.073 
5.4 12.9  0.035 
6.4 14.0  0 
;

You can use PROC SGPLOT to visualize the observations and weights. In fact, PROC SGPLOT supports a REG statement that adds a regression line to the plot. The REG statement supports adding both weighted and unweighted regression lines:

proc sgplot data=RegData;
scatter x=x y=y / filledoutlinedmarkers markerattrs=(size=12 symbol=CircleFilled)
                colorresponse=w colormodel=TwoColorRamp;     /* color markers by weight */
reg x=x y=y / nomarkers legendlabel="Unweighted Regression"; /* usual OLS regression */
reg x=x y=y / weight=w nomarkers legendlabel="Weighted Regression"; /* weighted regression */
xaxis grid; yaxis grid;
run;
Weighted regression curves in PROC SGPLOT

Each marker is colored by its weight. Dark blue markers (observations for which 8 < x < 12) have relatively large weights. Light blue markers have small weights and do not affect the weighted regression model very much.

You can see the effect of the weights by comparing the weighted and unweighted regression lines. The unweighted regression line (in blue) is pulled upward by the observations near x=13 and x=14. These observations have small weights, so the weighted regression line (in red) is not pulled upwards.

Weighted regression by using PROC REG

If you want to compute parameter estimates and other statistics, you can use the REG procedure in SAS/STAT to run the weighted and unweighted regression models. The following statement run PROC REG twice: once without a WEIGHT statement and once with a WEIGHT statement:

proc reg data=RegData plots(only)=FitPlot;
   Unweighted: model y = x;
   ods select NObs ParameterEstimates  FitPlot;
quit;
 
proc reg data=RegData plots(only)=FitPlot;
   Weighted: model y = x;
   weight w;
   ods select NObs ParameterEstimates  FitPlot;
quit;

The output (not shown) indicates that the unweighted regression model is Y = -0.23 + 0.36*X. In contrast, the weighted regression model is Y = 2.3 + 0.085*X. This confirms that the slope of the weighted regression line is smaller than the slope of the unweighted line.

A weighted regression module in SAS/IML

You can read the SAS documentation to find the formulas that are used for a weighted OLS regression model. The formula for the parameter estimates is a weighted version of the normal equations: b = (X`WX)-1(X`WY), where Y is the vector of observed responses, X is the design matrix, and W is the diagonal matrix of weights.

In SAS/IML it is more efficient to use elementwise multiplication than to multiply with a diagonal matrix. If you work through the matrix equations, you will discover that weighted regression is easily accomplished by using the normal equations on the matrices that result from multiplying the rows of Y and X by the square root of the weights. This is implemented in the following SAS/IML module:

proc  iml;
/* weighted polynomial regression for one regressor. Y, X, and w are col vectors */
start PolyRegEst(Y, X, w, deg);
   Yw = sqrt(w)#Y;                  /* mult rows of Y by sqrt(w) */
   XDesign = j(nrow(X), deg+1);
   do j = 0 to deg;                 /* design matrix for polynomial regression */
      Xdesign[,j+1] = X##j;
   end;
   Xw = sqrt(w)#Xdesign;            /* mult rows of X by sqrt(w) */
   b = solve(Xw`*Xw, Xw`*Yw);       /* solve normal equation */
   return b;
finish;
 
use RegData; read all var {Y X w}; close;    /* read data and weights */
 
/* loop to perform one-variable polynomial regression for deg=0, 1, and 2 */
do deg = 0 to 2;                    
   b = PolyRegEst(Y, X, w, deg);
   d = char(deg,1);                          /* '0', '1', or '2' */
   labl = "Estimates (deg=" + d + ")";
   print b[L=labl rowname=("b0":("b"+d))];   /* print estimates for each model */
end;
Parameter estimates for weighted regression models

The output shows the parameter estimates for three regression models: a "mean model" (degree 0), a linear model (degree 1), and a quadratic model (degree 2). Notice that the parameter estimates for the weighted linear regression are the same as estimates computed by PROC REG in the previous section.

Score the weighted regression models

The previous section describes how to use SAS/IML to compute parameter estimates of weighted regression models, and you can also use SAS/IML to score the regression models. The scoring does not require knowing the weight variable, only the parameter estimates. The following module uses Horner's method to evaluate a polynomial on a grid of points:

/* Score regression fit on column vector x */
start PolyRegScore(x, coef);
   p = nrow(coef);
   y = j(nrow(x), 1, coef[p]);            /* initialize to coef[p] */
   do j = p-1 to 1 by -1; 
      y = y # x + coef[j];
   end;
   return(y);
finish;

You can compute predicted values for each model on a grid of points. You can then write the predicted values to a SAS data set and combine the predicted values and the original data. Lastly, you can use the SERIES statement in PROG SGPLOT to overlay the three regression models on the original data:

t = T( do(min(x), max(x), (max(x)-min(x))/25) ); /* uniform grid in range(X) */ 
Yhat = j(nrow(t), 3);
do d = 0 to 2;
   b = PolyRegEst(Y, X, w, d);                 /* weighted regression model of degree d */
   Yhat[,d+1] = PolyRegScore(t, b);            /* score model on grid */
end;
 
Z = t || Yhat;                                 /* write three predicted curves to data set */
create RegFit from Z[c={"t" "Pred0" "Pred1" "Pred2"}];
append from Z;
QUIT;
 
data RegAll;                                   /* merge predicted curves with original data */
label w="Weight" Pred0="Weighted Mean" 
      Pred1="Weighted Linear Fit" Pred2="Weighted Quadratic Fit";
merge RegData RegFit;
run;
 
title "Weighted Regression Models";            /* overlay weighted regression curves and data */
proc sgplot data=RegAll;
scatter x=x y=y / filledoutlinedmarkers markerattrs=(size=12 symbol=CircleFilled)
                  colorresponse=w colormodel=TwoColorRamp;
series x=t  y=Pred0 / curvelabel;
series x=t  y=Pred1 / curvelabel;
series x=t  y=Pred2 / curvelabel;
xaxis grid; yaxis grid;
run;
Predicted values for weighted regression models

A visualization of the weighted regression models is shown to the left. The weighted linear fit is the same line that was shown in the earlier graph. The weighted mean and the weighted quadratic fit are the zero-degree and second-degree polynomial models, respectively. Of course, you could also create these curves in SAS by using PROC REG or by using the REG statement in PROC SGPLOT.

Summary

Weighted regression has many applications. The application featured in this article is to fit a model in which some response values are known with more precision than others. We saw that it is easy to create a weighted regression model in SAS by using PROC REG or PROC SGPLOT. It is only slightly harder to write a SAS/IML function to use matrix equations to "manually" compute the parameter estimates. No matter how you compute the model, observations that have relatively large weights are more influential in determining the parameter estimates than observations that have small weights.

tags: Data Analysis, Matrix Computations

The post Visualize a weighted regression appeared first on The DO Loop.

10月 052016
 

What is weighted regression? How does it differ from ordinary (unweighted) regression? This article describes how to compute and score weighted regression models.

Visualize a weighted regression

Technically, an "unweighted" regression should be called an "equally weighted " regression since each ordinary least squares (OLS) regression weights each observation equally. Similarly, an "unweighted mean" is really an equally weighted mean.

Recall that weights are not the same as frequencies. When talking about weights, it is often convenient to assume that the weights sum to unity. This article uses standardized weights, although you can use specify any set of weights when you use a WEIGHT statement in a SAS procedure.

One way to look at a weighted regression is to assume that a weight is related to the variance of an observation. Namely, the i_th weight, wi, indicates that the variance of the i_th value of the dependent variable is σ2 / wi, where σ2 is a common variance. Notice that an observation that has a small weight (near zero) has a relatively large variance. Intuitively, the observed response is not known with much precision; a weighted analysis makes that observation less influential.


Weighted regression assumes that some response values are more precise than others. #StatWisdom
Click To Tweet


For example, the following SAS data set defines (x,y) values and weights (w) for 11 observations. Observations whose X value is close to x=10 have relatively large weights. Observations far from x=10 have small weights. The last observation (x=14) is assigned a weight of zero, which means that it will be completely excluded from the analysis.

data RegData;
input y x w;
datalines;
2.3  7.4  0.058 
3.0  7.6  0.073 
2.9  8.2  0.114 
4.8  9.0  0.144 
1.3 10.4  0.151 
3.6 11.7  0.119 
2.3 11.7  0.119 
4.6 11.8  0.114 
3.0 12.4  0.073 
5.4 12.9  0.035 
6.4 14.0  0 
;

You can use PROC SGPLOT to visualize the observations and weights. In fact, PROC SGPLOT supports a REG statement that adds a regression line to the plot. The REG statement supports adding both weighted and unweighted regression lines:

proc sgplot data=RegData;
scatter x=x y=y / filledoutlinedmarkers markerattrs=(size=12 symbol=CircleFilled)
                colorresponse=w colormodel=TwoColorRamp;     /* color markers by weight */
reg x=x y=y / nomarkers legendlabel="Unweighted Regression"; /* usual OLS regression */
reg x=x y=y / weight=w nomarkers legendlabel="Weighted Regression"; /* weighted regression */
xaxis grid; yaxis grid;
run;
Weighted regression curves in PROC SGPLOT

Each marker is colored by its weight. Dark blue markers (observations for which 8 < x < 12) have relatively large weights. Light blue markers have small weights and do not affect the weighted regression model very much.

You can see the effect of the weights by comparing the weighted and unweighted regression lines. The unweighted regression line (in blue) is pulled upward by the observations near x=13 and x=14. These observations have small weights, so the weighted regression line (in red) is not pulled upwards.

Weighted regression by using PROC REG

If you want to compute parameter estimates and other statistics, you can use the REG procedure in SAS/STAT to run the weighted and unweighted regression models. The following statement run PROC REG twice: once without a WEIGHT statement and once with a WEIGHT statement:

proc reg data=RegData plots(only)=FitPlot;
   Unweighted: model y = x;
   ods select NObs ParameterEstimates  FitPlot;
quit;
 
proc reg data=RegData plots(only)=FitPlot;
   Weighted: model y = x;
   weight w;
   ods select NObs ParameterEstimates  FitPlot;
quit;

The output (not shown) indicates that the unweighted regression model is Y = -0.23 + 0.36*X. In contrast, the weighted regression model is Y = 2.3 + 0.085*X. This confirms that the slope of the weighted regression line is smaller than the slope of the unweighted line.

A weighted regression module in SAS/IML

You can read the SAS documentation to find the formulas that are used for a weighted OLS regression model. The formula for the parameter estimates is a weighted version of the normal equations: b = (X`WX)-1(X`WY), where Y is the vector of observed responses, X is the design matrix, and W is the diagonal matrix of weights.

In SAS/IML it is more efficient to use elementwise multiplication than to multiply with a diagonal matrix. If you work through the matrix equations, you will discover that weighted regression is easily accomplished by using the normal equations on the matrices that result from multiplying the rows of Y and X by the square root of the weights. This is implemented in the following SAS/IML module:

proc  iml;
/* weighted polynomial regression for one regressor. Y, X, and w are col vectors */
start PolyRegEst(Y, X, w, deg);
   Yw = sqrt(w)#Y;                  /* mult rows of Y by sqrt(w) */
   XDesign = j(nrow(X), deg+1);
   do j = 0 to deg;                 /* design matrix for polynomial regression */
      Xdesign[,j+1] = X##j;
   end;
   Xw = sqrt(w)#Xdesign;            /* mult rows of X by sqrt(w) */
   b = solve(Xw`*Xw, Xw`*Yw);       /* solve normal equation */
   return b;
finish;
 
use RegData; read all var {Y X w}; close;    /* read data and weights */
 
/* loop to perform one-variable polynomial regression for deg=0, 1, and 2 */
do deg = 0 to 2;                    
   b = PolyRegEst(Y, X, w, deg);
   d = char(deg,1);                          /* '0', '1', or '2' */
   labl = "Estimates (deg=" + d + ")";
   print b[L=labl rowname=("b0":("b"+d))];   /* print estimates for each model */
end;
Parameter estimates for weighted regression models

The output shows the parameter estimates for three regression models: a "mean model" (degree 0), a linear model (degree 1), and a quadratic model (degree 2). Notice that the parameter estimates for the weighted linear regression are the same as estimates computed by PROC REG in the previous section.

Score the weighted regression models

The previous section describes how to use SAS/IML to compute parameter estimates of weighted regression models, and you can also use SAS/IML to score the regression models. The scoring does not require knowing the weight variable, only the parameter estimates. The following module uses Horner's method to evaluate a polynomial on a grid of points:

/* Score regression fit on column vector x */
start PolyRegScore(x, coef);
   p = nrow(coef);
   y = j(nrow(x), 1, coef[p]);            /* initialize to coef[p] */
   do j = p-1 to 1 by -1; 
      y = y # x + coef[j];
   end;
   return(y);
finish;

You can compute predicted values for each model on a grid of points. You can then write the predicted values to a SAS data set and combine the predicted values and the original data. Lastly, you can use the SERIES statement in PROG SGPLOT to overlay the three regression models on the original data:

t = T( do(min(x), max(x), (max(x)-min(x))/25) ); /* uniform grid in range(X) */ 
Yhat = j(nrow(t), 3);
do d = 0 to 2;
   b = PolyRegEst(Y, X, w, d);                 /* weighted regression model of degree d */
   Yhat[,d+1] = PolyRegScore(t, b);            /* score model on grid */
end;
 
Z = t || Yhat;                                 /* write three predicted curves to data set */
create RegFit from Z[c={"t" "Pred0" "Pred1" "Pred2"}];
append from Z;
QUIT;
 
data RegAll;                                   /* merge predicted curves with original data */
label w="Weight" Pred0="Weighted Mean" 
      Pred1="Weighted Linear Fit" Pred2="Weighted Quadratic Fit";
merge RegData RegFit;
run;
 
title "Weighted Regression Models";            /* overlay weighted regression curves and data */
proc sgplot data=RegAll;
scatter x=x y=y / filledoutlinedmarkers markerattrs=(size=12 symbol=CircleFilled)
                  colorresponse=w colormodel=TwoColorRamp;
series x=t  y=Pred0 / curvelabel;
series x=t  y=Pred1 / curvelabel;
series x=t  y=Pred2 / curvelabel;
xaxis grid; yaxis grid;
run;
Predicted values for weighted regression models

A visualization of the weighted regression models is shown to the left. The weighted linear fit is the same line that was shown in the earlier graph. The weighted mean and the weighted quadratic fit are the zero-degree and second-degree polynomial models, respectively. Of course, you could also create these curves in SAS by using PROC REG or by using the REG statement in PROC SGPLOT.

Summary

Weighted regression has many applications. The application featured in this article is to fit a model in which some response values are known with more precision than others. We saw that it is easy to create a weighted regression model in SAS by using PROC REG or PROC SGPLOT. It is only slightly harder to write a SAS/IML function to use matrix equations to "manually" compute the parameter estimates. No matter how you compute the model, observations that have relatively large weights are more influential in determining the parameter estimates than observations that have small weights.

tags: Data Analysis, Matrix Computations

The post Visualize a weighted regression appeared first on The DO Loop.

7月 132016
 

Last week I showed how to represent a Markov transition matrix in the SAS/IML matrix language. I also showed how to use matrix multiplication to iterate a state vector, thereby producing a discrete-time forecast of the state of the Markov chain system. This article shows that the expected behavior of a Markov chain can often be determined just by performing linear algebraic operations on the transition matrix.


Absorbing Markov chains in #SAS
Click To Tweet


Absorbing Markov chains

Whereas the system in my previous article had four states, this article uses an example that has five states. The ith row of the following transition matrix gives the probabilities of transitioning from State i to any other state:

proc iml;
/* i_th row contains transition probabilities from State i */
P = { 0    0.5  0    0.5  0,
      0.5  0    0.5  0    0,
      0    0.5  0    0    0.5,
      0    0    0    1    0,
      0    0    0    0    1 };

For example, the last row of the matrix indicates that if the system is in State 5, the probability is 1 that it stays in State 5. This is the definition of an absorbing state. An absorbing state is common for many Markov chains in the life sciences. For example, if you are modeling how a population of cancer patients might respond to a treatment, possible states include remission, progression, or death. Death is an absorbing state because dead patients have probability 1 that they remain dead. The non-absorbing states are called transient. The current example has three transient states (1, 2, and 3) and two absorbing states (4 and 5).

If a Markov chain has an absorbing state and every initial state has a nonzero probability of transitioning to an absorbing state, then the chain is called an absorbing Markov chain. The Markov chain determined by the P matrix is absorbing. For an absorbing Markov chain, you can discover many statistical properties of the system just by using linear algebra. The formulas and examples used in this article are taken from the online textbook by Grimstead and Snell.

The first step for analyzing an absorbing chain is to permute the rows and columns of the transition matrix so that all of the transient states are listed first and the absorbing states are listed last. (The P matrix is already in this form.) If there are k absorbing states, the transition matrix in block form looks like the following:

Block form of an absorbing Markov transition matrix

The bottom right block of the transition matrix is a k x k identity matrix and represents the k absorbing states. The top left block contains the probabilities of transitioning between transient states. The upper right block contains the probabilities of transitioning from a transient state to an absorbing state. The lower left block contains zeros because there is no chance of transitioning from an absorbing state to any other state.

The following SAS/IML statements show how to extract the Q and R matrices from the P matrix:

k = sum(vecdiag(P)=1);      /* number of absorbing states */
nt = ncol(P) - k;           /* number of transient states */
 
Q = P[1:nt, 1:nt];          /* prob of transitions between transient states */ 
R = P[1:nt, nt+1:ncol(P)];  /* prob of transitions to absorbing state */

Expected behavior of absorbing Markov chains

By definition, all initial states for an absorbing system will eventually end up in one of the absorbing states. The following questions are of interest. If the system starts in the transient state i, then:

  1. What is the expected number of steps the system spends in transient state j?
  2. What is the expected number of steps before entering an absorbing state?
  3. What is the probability that the system will be absorbed into the jth absorbing state?

The answers to these questions are obtained by defining the so called fundamental matrix, which is N = (I-Q)-1. The fundamental matrix answers the first question because the entries of N are expected number of steps that the system spends in transient state j if it starts in transient state i:

transState = char(1:nt);        /* labels of transient states */
absState = char(nt+1:ncol(P));  /* labels of absorbing states */
 
/* Fundamental matrix gives the expected time that the system is 
   in transient state j if it starts in transient state i */
N = inv(I(nt) - Q);  
print N[L="Expected Time in State j" r=transState c=transState];
Expected time in State j for an absorbing Markov chain

The first row indicates that if the system starts in State 1, then on average it will spend 1.5 units of time in State 1 (including the initial time period), 1 unit of time in State 2, and 0.5 units of time in State 3 before transitioning to an absorbing state. Similarly, if the system starts in State 2, you can expect 1 unit, 2 units, and 1 unit of time in States 1, 2, and 3, respectively, before transitioning to an absorbing state.

Obviously, if you sum up the values for each row, you get the expect number of steps until the system transitions to an absorbing state:

/* Expected time before entering an absorbing state if the system
   starts in the transient state i */
t = N[,+];
print t[L="Expected Time until Absorption" r=transState];
Expected ime until absorption for an absorbing Markov chain

The remaining question is, to me, the most interesting. If the system starts in a transient state, you know that it will eventually transition into one of the k absorbing states. But which one? What is the probability that the system ends in the jth absorbing state if it starts in the ith transient state? The answer is obtained by the matrix multiplication N*R because the matrix N is the expected number of steps in each transient state and R contains the probability of transitioning from a transient state to an absorbing state. For our example, the computation is as follows:

/* The probability that an absorbing chain will be absorbed
   into j_th absorbing state if it starts in i_th transient state */
B = N*R;
print B[L="Absorption Probabilities" r=transState c=absState];
Absorption probabilities for an absorbing Markov chain with two absorbing states

The first row of the matrix indicates that if the system starts in State 1, it will end up in State 4 three quarters of the time and in State 5 one quarter of the time. The second rows indicates that if the system starts in State 2, there is a 50-50 chance that it will end up in State 4 or State 5.

Because this Markov chain is a stochastic process, you cannot say with certainty whether the system will eventually be absorbed into State 4 or State 5. However, starting the system in State 1 means that there is a higher probability that the final state will be State 4. Similarly, starting in State 3 means a higher probability that the final state will be in State 5.

There are many other statistics that you can compute for absorbing Markov chains. Refer to the references for additional computations.

References

tags: Matrix Computations, Time series

The post Absorbing Markov chains in SAS appeared first on The DO Loop.

7月 072016
 

Many computations in elementary probability assume that the probability of an event is independent of previous trials. For example, if you toss a coin twice, the probability of observing "heads" on the second toss does not depend on the result of the first toss.

However, there are situations in which the current state of the system determines the probability of the next state. A stochastic process in which the probabilities depend on the current state is called a Markov chain.

A Markov transition matrix models the way that the system transitions between states. A transition matrix is a square matrix in which the (i,j)th element is the probability of transitioning from state i into state j. The sum of each row is 1. For reference, Markov chains and transition matrices are discussed in Chapter 11 of Grimstead and Snell's Introduction to Probability.


Markov transition matrices in #SAS
Click To Tweet


A transition matrix of probabilities

A Wikipedia article on Markov chains uses a sequence of coin flips to illustrate transitioning between states. This blog post uses the same example, which is described below.

Imagine a game in which you toss a fair coin until the sequence heads-tails-heads (HTH) appears. The process has the following four states:

  • State 1: No elements of the sequence are in order. If the next toss is tails (T), the system stays at State 1. If the next toss is heads (H), the system transition to State 2.
  • State 2: H. The first element of the sequence is in order. If the next toss is H, the system stays at State 2. If the next toss is T, the system transitions to State 3.
  • State 3: HT. The first two elements of the sequence in order. If the next toss is H, transition to State 4. If the next toss is T, transition to State 1.
  • State 4: HTH. The game is over. Stay in this state.

The transition matrix is given by the following SAS/IML matrix. The first row contains the transition probabilities for State 1, the second row contains the probabilities for State 2, and so on.

proc iml;
/* Transition matrix. Columns are next state; rows are current state */
/*     Null  H   HT  HTH */
P =    {0.5  0.5 0   0,   /* Null */
        0    0.5 0.5 0,   /* H    */
        0.5  0   0   0.5, /* HT   */
        0    0   0   1};  /* HTH  */
states = "0":"3";
print P[r=states c=states L="Transition Matrix"];

Analyzing sequences of coin tosses can be interesting and sometimes counterintuitive. Let's describe the expected behavior of this system.

The state vector

You can use a four-element row vector to represent the probabilities that the system is in each state. If the system is in the ith state, put a 1 in the ith element and zero elsewhere. Thus the state vector {1 0 0 0} indicates that the system is in State 1.

You can also use the state vector to represent probabilities. If the system has a 50% chance of being in State 1 and a 50% chance of being in State 2, the state of the system is represented by the state vector {0.5 0.5 0 0}.

The time-evolution of the system can be studied by multiplying the state vector and the transition matrix. If s is the current state of the system, then s*P gives the vector of probabilities for the next state of the system. Similarly, (s*P)*P = s*P2 describes the probabilities of the system being in each state after two tosses.

For the HTH-sequence game, suppose that you start a new game. The game starts in State 1. The following computation describes the evolution of the state vector:

s0 = {1 0 0 0};    /* a new game is in State 1 */
s1 = s0 * P;       /* probability distribution of states after 1 toss */
s2 = s1 * P;       /* probability distribution of states after 2 tosses */
print (s1//s2)[L="Prob of State" c=("1":"4") r={"1 toss" "2 tosses"}];
First two states in a Markov chain

The first row of the output gives the state probabilities for the system after one coin toss. The system will be in State 1 with probability 0.5 (if you toss T) and will be in State 2 with probability 0.5 (if you toss H). The second row is more interesting. The computation use the probabilities from the first toss to compute probabilities for the second toss. After two tosses, the probability is 0.25 for being in State 1 (TT), the probability is 0.5 for being in State 2 (TH or HH), and the probability is 0.25 for being in State 3 (HT).

Modeling a sequence of coin tosses

In general you can repeatedly multiple the state vector by the transition matrix to find the state probabilities after k time periods. For efficiency you should avoid concatenating results inside a loop. Instead, allocate a matrix and use the ith row to hold the ith state vector, as follows:

/* Iterate to see how the probability distribution evolves */
numTosses = 10;
s0 = {1 0 0 0};                     /* initial state */
s = j(numTosses+1, ncol(P), .);     /* allocate room for all tosses */
s[1,] = s0;                         /* store initial state */
do i = 2 to nrow(s);
   s[i,] = s[i-1,] * P;             /* i_th row = state after (i-1) iterations */
end;
iteration = 0:numTosses;            /* iteration numbers */
print s[L="Prob of State" c=("1":"4") r=(char(iteration))];
First 10 states in a Markov chain with one absorbing state

The output shows the state probabilities for a sequence of 10 coin tosses. Recall that the last row of the transition matrix ensures that the sequence stays in State 4 after it reaches State 4. Therefore the probability of the system being in State 4 is nondecreasing. The output shows that there is a 65.6% chance that the sequence HTH will appear in 10 tosses or less.

You can visualize the evolution of the probability distributions by making a series plot for each column of this output. You can download the SAS program that creates the plot and contains all of the computations in this article. The plot is shown below:

Predicted states for a Markov chain by iterating a trasition matrix

The plot shows the probability distributions after each toss when the system starts in State 1. After three time periods the system can be in any state. Over the long term, the system has a high probability of being in State 4 and a negligible chance of being in the other three states.

Modeling transitions in a population

You can also apply the transition matrix to a population of games. For example, suppose that many students are playing the game. At a certain instant, 50% of the games are in State 1, 30% are in State 2, and 20% are in State 3. You can represent that population of initial states by using the initial vector

s0 = {0.5 0.3 0.2 0};

The following graph gives the probability of the states for the population for the next 30 coin tosses:

Predicted states for a Markov chain by iterating a trasition matrix

The initial portion of the graph looks different from the previous graph because the population starts in a different initial distribution of states. However, the long-term behavior of this system is the same: all games eventually end in State 4. For this initial population, the graph shows that you should expect 80% of the games to be in State 4 by the 13th toss.

A real example: Predicting caseloads for social workers and parole officers

An interesting application of using Markov chains was presented by Gongwei Chen at SAS Global Forum 2016. Chen built a discrete-time Markov chain model to forecast workloads at the Washington State Department of Corrections. Social workers and parole officers supervise convicted offenders who have been released from prison or who were sentenced to supervision by the court system. Individuals who exhibit good behavior can transition from a highly supervised situation into less supervision. Other individuals might commit a new offense that requires an increase in supervision. Chen used historical data to estimate the transition matrix between the different supervisory states. His model helps the State of Washington forecast the resources needed to supervise offenders.

Summary

In summary, it is easy to represent a transition matrix and a state vector in SAS/IML. You can iterate the initial distribution of states to forecast the state of the system after an arbitrary number of time periods. This is done by using matrix multiplication.

A Markov chain model involves iterating a linear dynamical system. The qualitative asymptotic behavior of such systems can be described by using the tools of linear algebra. In a future article, I will describe how you can compute statistical properties of Markov chain models from the transition matrix.

tags: Matrix Computations, Time series

The post Markov transition matrices in SAS/IML appeared first on The DO Loop.