- Compute the covariance and correlation with PROC CORR and read the results into PROC IML
- Compute the matrices entirely with PROC IML
Computing a Covariance and Correlation Matrix with PROC CORR
You can use PROC CORR to compute the correlation matrix (or, more correctly, the "Pearson product-moment correlation matrix," since there are other measures of correlation which you can also compute with PROC CORR). The following statements compute the covariance matrix and the correlation matrix for the three numerical variables in the SASHELP.CLASS data set.
ods select Cov PearsonCorr; proc corr data=sashelp.class noprob outp=OutCorr /** store results **/ nomiss /** listwise deletion of missing values **/ cov; /** include covariances **/ var Height Weight Age; run;
The OutCorr
data set contains various statistics about the data, as shown by running PROC PRINT:
proc print data=OutCorr; run;
Obs |
_TYPE_ |
_NAME_ |
Height |
Weight |
Age |
1 |
COV |
Height |
26.287 |
102.493 |
6.2099 |
2 |
COV |
Weight |
102.493 |
518.652 |
25.1857 |
3 |
COV |
Age |
6.210 |
25.186 |
2.2281 |
4 |
MEAN |
62.337 |
100.026 |
13.3158 | |
5 |
STD |
5.127 |
22.774 |
1.4927 | |
6 |
N |
19.000 |
19.000 |
19.0000 | |
7 |
CORR |
Height |
1.000 |
0.878 |
0.8114 |
8 |
CORR |
Weight |
0.878 |
1.000 |
0.7409 |
9 |
CORR |
Age |
0.811 |
0.741 |
1.0000 |
If you want to use the covariance or correlation matrix in PROC IML, you can read the appropriate values into a SAS/IML matrix by using a WHERE clause on the USE statement:
proc iml; use OutCorr where(_TYPE_="COV"); read all var _NUM_ into cov[colname=varNames]; use OutCorr where(_TYPE_="CORR"); read all var _NUM_ into corr[colname=varNames]; close OutCorr;
Notice that this SAS/IML code is independent of the number of variables in the data set.
Computation of the Covariance and Correlation Matrix in PROC IML
If the data are in SAS/IML vectors, you can compute the covariance and correlation matrices by using matrix multiplication to form the matrix that contains the corrected sum of squares of cross products (CSSCP).
Suppose you are given p SAS/IML vectors x_{1}, x_{2}, ..., x_{p}. To form the covariance matrix for these data:
- Use the horizontal concatenation operator to concatenate the vectors into a matrix whose columns are the vectors.
- Center each vector by subtracting the sample mean.
- Form the CSSCP matrix (also called the "X-prime-X matrix") by multiplying the matrix transpose and the matrix.
- Divide by n-1 where n is the number of observations in the vectors.
The following SAS/IML statements define a SAS/IML module that computes the sample covariance matrix of a data matrix. For this example the data are read from a SAS data set, so Step 1 (horizontal concatenation of vectors) is skipped.
proc iml; start Cov(A); /** define module to compute a covariance matrix **/ n = nrow(A); /** assume no missing values **/ C = A - A[:,]; /** subtract mean to center the data **/ return( (C` * C) / (n-1) ); finish; /** read or enter data matrix into X **/ varNames = {"Height" "Weight" "Age"}; use sashelp.class; read all var varNames into X; close sashelp.class; cov = Cov(X); print cov[c=varNames r=varNames];
cov | |||
Height |
Weight |
Age | |
Height |
26.286901 |
102.49342 |
6.2099415 |
Weight |
102.49342 |
518.65205 |
25.185673 |
Age |
6.2099415 |
25.185673 |
2.2280702 |
Computing the Pearson correlation matrix requires the same steps, but also that the columns of the centered data matrix be scaled to have unit standard deviation. SAS/IML software already has a built-in CORR function, so it is not necessary to define a Corr
module, but it is nevertheless instructive to see how such a module might be written:
start MyCorr(A); n = nrow(A); /** assume no missing values **/ C = A - A[:,]; /** center the data **/ stdCol = sqrt(C[##,] / (n-1)); /** std deviation of columns **/ stdC = C / stdCol; /** assume data are not constant **/ return( (stdC` * stdC) / (n-1) ); finish; corr = MyCorr(X); print corr[c=varNames r=varNames];
Height |
Weight |
Age | |
Height |
1 |
0.8777852 |
0.8114343 |
Weight |
0.8777852 |
1 |
0.7408855 |
Age |
0.8114343 |
0.7408855 |
1 |
You should use the built-in CORR function instead of the previous module, because the built-in function handles the case of constant data.
Computation of the Covariance and Correlation Matrix in PROC IML (post-9.2)
In November 2010, SAS released the 9.22 (pronounced "nine point twenty-two") release of SAS/IML software. This release includes the following:
- a built-in COV function which handles missing values in either of two ways
- new features for the built-in CORR function including
- handling missing values in either of two ways
- support for different measures of correlation, including rank-based correlations