12月 082010
 
Sample covariance matrices and correlation matrices are used frequently in multivariate statistics. This post shows how to compute these matrices in SAS and use them in a SAS/IML program. There are two ways to compute these matrices:
  1. Compute the covariance and correlation with PROC CORR and read the results into PROC IML
  2. Compute the matrices entirely with PROC IML

Computing a Covariance and Correlation Matrix with PROC CORR

You can use PROC CORR to compute the correlation matrix (or, more correctly, the "Pearson product-moment correlation matrix," since there are other measures of correlation which you can also compute with PROC CORR). The following statements compute the covariance matrix and the correlation matrix for the three numerical variables in the SASHELP.CLASS data set.

ods select Cov PearsonCorr;
proc corr data=sashelp.class noprob outp=OutCorr /** store results **/
          nomiss /** listwise deletion of missing values **/
          cov;   /**  include covariances **/
var Height Weight Age;
run;

The OutCorr data set contains various statistics about the data, as shown by running PROC PRINT:

proc print data=OutCorr; run;

Obs

_TYPE_

_NAME_

Height

Weight

Age

1

COV

Height

26.287

102.493

6.2099

2

COV

Weight

102.493

518.652

25.1857

3

COV

Age

6.210

25.186

2.2281

4

MEAN

62.337

100.026

13.3158

5

STD

5.127

22.774

1.4927

6

N

19.000

19.000

19.0000

7

CORR

Height

1.000

0.878

0.8114

8

CORR

Weight

0.878

1.000

0.7409

9

CORR

Age

0.811

0.741

1.0000

If you want to use the covariance or correlation matrix in PROC IML, you can read the appropriate values into a SAS/IML matrix by using a WHERE clause on the USE statement:


proc iml;
use OutCorr where(_TYPE_="COV");
read all var _NUM_ into cov[colname=varNames];

use OutCorr where(_TYPE_="CORR");
read all var _NUM_ into corr[colname=varNames];
close OutCorr;

Notice that this SAS/IML code is independent of the number of variables in the data set.

Computation of the Covariance and Correlation Matrix in PROC IML

If the data are in SAS/IML vectors, you can compute the covariance and correlation matrices by using matrix multiplication to form the matrix that contains the corrected sum of squares of cross products (CSSCP).

Suppose you are given p SAS/IML vectors x1, x2, ..., xp. To form the covariance matrix for these data:

  1. Use the horizontal concatenation operator to concatenate the vectors into a matrix whose columns are the vectors.
  2. Center each vector by subtracting the sample mean.
  3. Form the CSSCP matrix (also called the "X-prime-X matrix") by multiplying the matrix transpose and the matrix.
  4. Divide by n-1 where n is the number of observations in the vectors.
This process assumes that there are no missing values in the data. Otherwise, it needs to be slightly amended. Formulas for various matrix quantities are given in the SAS/STAT User's Guide.

The following SAS/IML statements define a SAS/IML module that computes the sample covariance matrix of a data matrix. For this example the data are read from a SAS data set, so Step 1 (horizontal concatenation of vectors) is skipped.

proc iml;
start Cov(A);             /** define module to compute a covariance matrix **/
   n = nrow(A);           /** assume no missing values **/
   C = A - A[:,];         /** subtract mean to center the data **/
   return( (C` * C) / (n-1) );
finish;

/** read or enter data matrix into X **/
varNames = {"Height" "Weight" "Age"};
use sashelp.class; read all var varNames into X; close sashelp.class;

cov = Cov(X);
print cov[c=varNames r=varNames];

cov

Height

Weight

Age

Height

26.286901

102.49342

6.2099415

Weight

102.49342

518.65205

25.185673

Age

6.2099415

25.185673

2.2280702

Computing the Pearson correlation matrix requires the same steps, but also that the columns of the centered data matrix be scaled to have unit standard deviation. SAS/IML software already has a built-in CORR function, so it is not necessary to define a Corr module, but it is nevertheless instructive to see how such a module might be written:

start MyCorr(A);
   n = nrow(A);                   /** assume no missing values     **/
   C = A - A[:,];                 /** center the data              **/
   stdCol = sqrt(C[##,] / (n-1)); /** std deviation of columns     **/
   stdC = C / stdCol;             /** assume data are not constant **/
   return( (stdC` * stdC) / (n-1) );
finish;

corr = MyCorr(X);
print corr[c=varNames r=varNames];

corr

Height

Weight

Age

Height

1

0.8777852

0.8114343

Weight

0.8777852

1

0.7408855

Age

0.8114343

0.7408855

1

You should use the built-in CORR function instead of the previous module, because the built-in function handles the case of constant data.

Computation of the Covariance and Correlation Matrix in PROC IML (post-9.2)

In November 2010, SAS released the 9.22 (pronounced "nine point twenty-two") release of SAS/IML software. This release includes the following:

  • a built-in COV function which handles missing values in either of two ways
  • new features for the built-in CORR function including
    • handling missing values in either of two ways
    • support for different measures of correlation, including rank-based correlations

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)