11月 232019
 

“Ocean acidification is sometimes referred to as global warming's equally evil twin.” ~ Elizabeth Kolbert This is the second post in my two-part series about climate change. You can read part 1 of this series here. When engaging in data exploration for insights, it’s good practice to start with a [...]

Climate change: Hurricanes, acidification and extinction was published on SAS Voices by Leo Sadovy

11月 222019
 

sxlion

大家有一个普通的印象:SAS的更新很慢,很老很落后。可能跟它的版本命名有关,SAS9.0是2004年出来的,到现在都快20年了,版本号 还停留在9字头,并且还没有继续更新的迹象。当然这个与SAS公司的明面

原创文章: ”难道是SAS10?云分析服务时代的到来“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/454

11月 222019
 

sxlion

大家有一个普通的印象:SAS的更新很慢,很老很落后。可能跟它的版本命名有关,SAS9.0是2004年出来的,到现在都快20年了,版本号 还停留在9字头,并且还没有继续更新的迹象。当然这个与SAS公司的明面

原创文章: ”难道是SAS10?云分析服务时代的到来“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/454

11月 202019
 

Colleges and universities have access to enormous stores of data and analytics has the power to help higher education tackle some of its biggest challenges. Larry Burns, Assistant Director of Institutional Research and Information Management (IRIM), Oklahoma State University (OSU) knows a great deal about the power of analytics to [...]

Oklahoma State University’s analytics journey was published on SAS Voices by Georgia Mariani

11月 202019
 

In a linear regression model, the predicted values are on the same scale as the response variable. You can plot the observed and predicted responses to visualize how well the model agrees with the data, However, for generalized linear models, there is a potential source of confusion. Recall that a generalized linear model (GLIM) has two components: a linear predictor, which is a linear combination of the explanatory variables, and a transformation (called the inverse link function) that maps the linear predictor onto the scale of the data. Consequently, SAS regression procedures support two types of predicted values and prediction limits. In the SAS documentation, the first type is called "predictions on the linear scale" whereas the second type is called "predictions on the data scale."

For many SAS procedures, the default is to compute predicted values on the linear scale. However, for GLIMs that model nonnormal response variables, it is more intuitive to predict on the data scale. The ILINK option, which is shorthand for "apply the inverse link transformation," converts the predicted values to the data scale. This article shows how the ILINK option works by providing an example for a logistic regression model, which is the most familiar generalized linear models.

Review of generalized linear models

The SAS documentation provides an overview of GLIMs and link functions. The documentation for PROC GENMOD provides a list of link functions for common regression models, including logistic regression, Poisson regression, and negative binomial regression.

Briefly, the linear predictor is
η = X*β
where X is the design matrix and β is the vector of regression coefficients. The link function (g) is a monotonic function that relates the linear predictor to the conditional mean of the response. Sometimes the symbol μ is used to denote the conditional mean of the response (μ = E[Y|x]), which leads to the formula
g(μ) = X*β

In SAS, you will often see options and variables names (in output data sets) that contains the substring 'XBETA'. When you see 'XBETA', it indicates that the statistic or variable is related to the LINEAR predictor. Because the link function, g, is monotonic, it has an inverse, g-1. For generalized linear models, the inverse link function maps the linear-scale predictions to data-scale predictions: if η = x β is a predicted value on the linear scale, then g-1(η) is the predicted value for x on the data scale.

When the response variable is binary, the GLIM is the logistic model. If you use the convention that Y=1 indicates an event and Y=0 indicates the absence of an event, then the "data scale" is [0, 1] and the GLIM predicts the probability that the event occurs. For the logistic GLIM, the link function is the logit function:
g(μ) = logit(μ) = log( μ / (1 - μ) )
The inverse of the logit function is called the logistic function:
g-1(η) = logistic(η) = 1 / (1 + exp(-η))

To demonstrate the ILINK option, the next sections perform the following tasks:

  1. Use PROC LOGISTIC to fit a logistic model to data. You can use the STORE statement to store the model to an item store.
  2. Use the SCORE statement in PROC PLM to score new data. This example scores data by using the ILINK option.
  3. Score the data again, but this time do not use the ILINK option. Apply the logistic transformation to the linear estimates to demonstrate that relationship between the linear scale and the data scale.

The transformation between the linear scale and the data scale is illustrated by the following graph:

Fit a logistic model

The following data are from the documentation for PROC LOGISTIC. The model predicts the probability of Pain="Yes" (the event) for patients in a study, based on the patients' sex, age, and treatment method ('A', 'B', or 'P'). The STORE statement in PROC LOGISTIC creates an item store that can be used for many purposes, including scoring new observations.

data Neuralgia;
   input Treatment $ Sex $ Age Duration Pain $ @@;
   DROP Duration;
   datalines;
P F 68  1 No  B M 74 16 No  P F 67 30 No  P M 66 26 Yes B F 67 28 No  B F 77 16 No
A F 71 12 No  B F 72 50 No  B F 76  9 Yes A M 71 17 Yes A F 63 27 No  A F 69 18 Yes
B F 66 12 No  A M 62 42 No  P F 64  1 Yes A F 64 17 No  P M 74  4 No  A F 72 25 No
P M 70  1 Yes B M 66 19 No  B M 59 29 No  A F 64 30 No  A M 70 28 No  A M 69  1 No
B F 78  1 No  P M 83  1 Yes B F 69 42 No  B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes
A M 70 12 No  A F 69 12 No  B F 65 14 No  B M 70  1 No  B M 67 23 No  A M 76 25 Yes
P M 78 12 Yes B M 77  1 Yes B F 69 24 No  P M 66  4 Yes P F 65 29 No  P M 60 26 Yes
A M 78 15 Yes B M 75 21 Yes A F 67 11 No  P F 72 27 No  P F 70 13 Yes A M 75  6 Yes
B F 65  7 No  P F 68 27 Yes P M 68 11 Yes P M 67 17 Yes B M 70 22 No  A M 65 15 No
P F 67  1 Yes A M 67 10 No  P F 72 11 Yes A F 74  1 No  B M 80 21 Yes A F 69  3 No
;
 
title 'Logistic Model on Neuralgia';
proc logistic data=Neuralgia;
   class Sex Treatment;
   model Pain(Event='Yes')= Sex Age Treatment;
   store PainModel / label='Neuralgia Study';  /* store model for post-fit analysis */
run;

Score new data by using the ILINK option

There are many reasons to use PROC PLM, but an important purpose of PROC PLM is to score new observations. Given information about new patients, you can use PROC PLM to predict the probability of pain if these patients are given a specific treatment. The following DATA step defines the characteristics of four patients who will receive Treatment B. The call to PROC PLM scores and uses the ILINK option to predict the probability of pain:

/* Use PLM to score new patients */
data NewPatients;
   input Treatment $ Sex $ Age Duration;
   DROP Duration;
   datalines;
B M 67 15 
B F 73  5 
B M 74 12 
B F 79 16 
;
 
/* predictions on the DATA scale */
proc plm restore=PainModel noprint;
   score data=NewPatients out=ScoreILink
         predicted lclm uclm / ilink; /* ILINK gives probabilities */
run;
 
proc print data=ScoreILink; run;

The Predicted column contains probabilities in the interval [0, 1]. The 95% prediction limits for the predictions are given by the LCLM and UCLM columns. For example, the prediction interval for the 67-year-old man is approximately [0.03, 0.48].

These values and intervals are transformations of analogous quantities on the linear scale. The logit transformation maps the predicted probabilities to the linear estimates. The inverse logit (logistic) transformation maps the linear estimates to the predicted probabilities.

Linear estimates and the logistic transformation

The linear scale is important because effects are additive on this scale. If you are testing the difference of means between groups, the tests are performed on the linear scale. For example, the ESTIMATE, LSMEANS, and LSMESTIMATE statements in SAS perform hypothesis testing on the linear estimates. Each of these statements supports the ILINK option, which enables you to display predicted values on the data scale.

To demonstrate the connection between the predicted values on the linear and data scale, the following call to PROC PLM scores the same data according to the same model. However, this time the ILINK option is omitted, so the predictions are on the linear scale.

/* predictions on the LINEAR scale */
proc plm restore=PainModel noprint;
   score data=NewPatients out=ScoreXBeta(
         rename=(Predicted=XBeta LCLM=LowerXB UCLM=UpperXB))
         predicted lclm uclm;         /* ILINK not used, so linear predictor */
run;
 
proc print data=ScoreXBeta; run;

I have renamed the variables that PROC PLM creates for the estimates on the linear scale. The XBeta column shows the predicted values. The LowerXB and UpperXB columns show the prediction interval for each patient. The XBeta column shows the values you would obtain if you use the parameter estimates table from PROC LOGISTIC and apply those estimates to the observations in the NewPatients data.

To demonstrate that the linear estimates are related to the estimates in the previous section, the following SAS DATA step uses the logistic (inverse logit) transformation to convert the linear estimates onto the predicted probabilities:

/* Use the logistic (inverse logit) to transform the linear estimates
     (XBeta) into probability estimates in [0,1], which is the data scale.
   You can use the logit transformation to go the other way. */
data LinearToData;
   set ScoreXBeta;   /* predictions on linear scale */
   PredProb   = logistic(XBeta);
   LCLProb    = logistic(LowerXB);
   UCLProb    = logistic(UpperXB); 
run;
 
proc print data=LinearToData;
   var Treatment Sex Age PredProb LCLProb UCLProb;
run;

The transformation of the linear estimates gives the same values as the estimates that were obtained by using the ILINK option in the previous section.

Summary

In summary, there are two different scales for predicting values for a generalized linear model. When you report predicted values, it is important to specify the scale you are using. The data scale makes intuitive sense because it is the same scale as the response variable. You can use the ILINK option in SAS to get predicted values and prediction intervals on the data scale.

The post Predicted values in generalized linear models: The ILINK option in SAS appeared first on The DO Loop.

11月 182019
 

Getting value from analytics is becoming top of mind for businesses. Organizations have invested millions of dollars in data, people and technology and are looking for a return on their investment. That requires  operationalizing analytics so that it can be used for strategic decision making -- often referred to as [...]

Resources to help you conquer the 'last mile' of analytics was published on SAS Voices by Sarah Gates

11月 182019
 

My colleague, Mike Drutar, recently showed how to create a "strip plot" that shows the distribution of temperatures for each calendar month at a particular location. Mike created the strip plot in SAS Visual Analytics by using a point-and-click interface. This article shows how to create a similar graph by using SAS programming statements and the SGPLOT procedure in Base SAS. Along the way, I'll point out some tips and best practices for creating a strip plot.

Daily temperature data for Albany, NY

The data in this article is 25 years of daily temperatures in Albany, NY, from 1995 to 2019. I have analyzed this data previously when discussing how to model periodic data. The following DATA step downloads the data from the internet and adds a SAS date variable:

/* Read the data directly from the Internet */
filename webfile url "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/NYALBANY.txt" 
 /* some corporate users might need to add  proxy='http://...:80' */;
data TempData;
infile webfile;
input month day year Temperature;
format Date date9.;
Date = MDY(month, day, year);
if Temperature=-99 then delete;   /* -99 is used for missing values */
run;

A basic strip plot

In a basic strip plot, a continuous variable is plotted against levels of a categorical variable. If the values of the continuous variable are distinct, this technique theoretically can show all data values. In practice, however, there will be overplotting of markers, especially for large data sets. You can use a jittering and semi-transparent markers to reduce the effect of overplotting. The SCATTER statement in PROC SGPLOT supports the JITTER option and the TRANSPARENCY= option.

Drutar's strip plot displays temperatures for each month of the year. For the Albany temperature data, you might assume that you need to create a new categorical variable that has the values 'Jan', 'Feb', ..., 'Dec'. However, you do not need to create a new variable. You can use the FORMAT statement and the MONNAMEw. format to convert the Date variable into discrete values "on the fly." This technique is described in the article "Use SAS formats to bin numerical variables." If you use the TYPE=DISCRETE option on the XAXIS statement, you obtain a basic strip plot of temperature versus each calendar month.

/* Create a strip plot.
   Use a format to bin dates into months:
   https://blogs.sas.com/content/iml/2016/08/08/sas-formats-bin-numerical-variables.html
*/
proc sgplot data=TempData;
   format Date MONNAME3.;                 /* bin Date into 12 months */
   scatter x=Date y=Temperature / 
           jitter transparency=0.85       /* handle overplotting */
           markerattrs=(symbol=CircleFilled) legendlabel="Daily Temperature";
   xaxis type=discrete display=(nolabel); /* create categorical axis */
   yaxis grid label="Temperature (F)"; 
run;
Basic strip plot in SAS

You can see dark areas in the graph. These indicate high-density regions where the daily temperatures are similar. For some applications, it is useful to further emphasize these regions by overlaying statistical estimates that show the average and range of each strip, as shown in the next section.

Of course, if your data already contains a categorical variable, you can create a strip plot directly. You will not need to use the FORMAT trick.

Overlay a visualization of the center and variation

To indicate the center of each month's temperature distribution, Drutar displays the median value for each month. He also overlays a line segment that shows the data range (min to max). In my strip plot, I will overlay the median but will use the interquartile range (Q1 to Q3) to display the variation in the data. You can use PROC MEANS to create a SAS data set that contains the statistics for each month:

/* write statistics for each month to a data set */
proc means data=Tempdata noprint;
   format Date MONNAME3.;            /* bin Date into 12 months */
   class Date;                       /* output the statistics for each month */
   var Temperature;
   output out=MeanOut(where=(_TYPE_=1)) median=Median Q1=Q1 Q3=Q3;
run;

To create a new graph that overlays the statistics, append the statistics and the data. You can then use a high-low plot to show the variation in the data and a second SCATTER statement to overlay the median values, as follows:

data StripPlot;
   set TempData MeanOut;                  /* append statistics to data */
run;
 
proc sgplot data=StripPlot;
   format Date MONNAME3.;                 /* bin Date into 12 months */
   scatter x=Date y=Temperature / 
           jitter transparency=0.85       /* handle overplotting */
           markerattrs=(symbol=CircleFilled) legendlabel="Daily Temperature";
   highlow x=date low=Q1 high=Q3 /        /* use high-low plot to display range of data */
           legendlabel="IQR" lineattrs=GraphData3(thickness=5);
   scatter x=date y=Median /              /* plot the median value for each strip */
           markerattrs=GraphData2(size=12 symbol=CircleFilled);
   xaxis type=discrete display=(nolabel); /* create categorical axis */
   yaxis grid label="Temperature (F)"; 
run;
Strip plot in SAS with overlays of median and interquartile range statistics

In summary, you can use the SCATTER statement in the SGPLOT procedure to create a basic strip plot. One axis will be the continuous variable, the other will be a discrete (categorical) variable. You can use the JITTER option to reduce overplotting. For data that contain thousands of observations, you might also want to use the TRANSPARENCY= option to display semi-transparent markers. Typically, you will use higher transparency values for larger data. Finally, you can use PROC MEANS to create an output data set that contains summary statistics for each strip. This article computes and displays the median and interquartile range for each strip, but you could also use the mean and standard deviation.

The post Create a strip plot in SAS appeared first on The DO Loop.

11月 152019
 

“The future is already here — it's just not very evenly distributed.”  ~ William Gibson, author The same can be said for climate change – global warming is here, in a big way, but its effects are still an arm's length away for many of us. How is climate change [...]

Climate change: It's all about the CO2 was published on SAS Voices by Leo Sadovy

11月 152019
 

Designing interactive reports can be a fun and unique challenge. As user interface experience designers can attest, there are several aspects that go into developing a successful and effective self-service tool. Granted I’m not designing the actual software, but reports require a similar approach to be sure that visualizations are clear and that users can get to the answers they are looking for. Enter prompts.

Reports prompt users to better understand trends, how their data points compare to the whole, and to narrow the scope of data. Being able to pick the placement of these prompts quickly and easily will open the possibilities of your report layouts! I’m specifically speaking about Report and Page level prompts. Traditionally, these global prompt controls were only able to be placed at the top; see the yellow highlighted areas below.

Let’s take a look at an example report with the traditional Report and Page prompt layout. The Report prompts are extremely easy to pick out, since they sit above the pages, but the Page prompts can sometimes blend in with other prompts contained in the report body.

Introduced in the SAS Visual Analytics 8.4 release is the ability to control the layout position of these prompts. Using my example report, let’s change the placement of these prompts. In Edit mode, open the Options pane and use the top level drop-down to select the report name. This will activate the report level, and the report level Options will display. Next, under the Report Controls subgroup, move the placement radio button to the west cardinal point.

Depending on the type of control objects you are using in your report, you may not like this layout yet. For instance, you can see here that my date slider is taking up too much space.

When you activate the slide control, use the Options pane to alter the Slider Direction and Layout. You can even use the Style option to change the font size. You can see that after these modifications, the Report prompt space can be configured to your liking.

Next, let’s change the placement for the Page prompts, for demonstration purposes. From the Options pane, use the top drop-down to select the page name. This will activate the page level, and the page level Options will display. Next, under the Page Controls subgroup, move the placement radio button to the west cardinal position.

You can see that the direction of the button bar control was automatically changed to vertical. Now we can clearly see which prompts belong to the page level.

If I switch to view mode, and adjust the browser size, you can get a better feel for the Report and Page prompt layout changes.

But as with many things, just because you can, doesn’t mean you should. This is where the report designer’s creativity and style can really take flight. Here is the same report, but with my preferred styling.

Notice that I kept the Report prompts along the top but moved the Page prompts to the left of the report. I also added two containers and configured a gray border for each container to better separate the objects. This helps the user quickly see that the drop-down will filter the word cloud is only. I also used the yellow highlighting through styling and a display rule to emphasize the selected continent. The bar chart is fed from an aggregated data source which is why the report prompt is not filtering out the other continents.

Feel free to send me your latest report design ideas!

Additional material related to Report and Page prompts:

New control prompt placement option in SAS Visual Analytics was published on SAS Users.

11月 132019
 

Biplots are two-dimensional plots that help to visualize relationships in high dimensional data. A previous article discusses how to interpret biplots for continuous variables. The biplot projects observations and variables onto the span of the first two principal components. The observations are plotted as markers; the variables are plotted as vectors. The observations and/or vectors are not usually on the same scale, so they need to be rescaled so that they fit on the same plot. There are four common scalings (GH, COV, JK, and SYM), which are discussed in the previous article.

This article shows how to create biplots in SAS. In particular, the goal is to create the biplots by using modern ODS statistical graphics. You can obtain biplots that use the traditional SAS/GRAPH system by using the %BIPLOT macro by Michael Friendly. The %BIPLOT macro is very powerful and flexible; it is discussed later in this article.

There are four ways to create biplots in SAS by using ODS statistical graphics:

  • You can use PROC PRINQUAL in SAS/STAT software to create the COV biplot.
  • If you have a license for SAS/GRAPH software (and SAS/IML software), you can use Friendly's %BIPLOT macro and use the OUT= option in the macro to save the coordinates of the markers and vectors. You can then use PROC SGPLOT to create a modern version of Friendly's biplot.
  • You can use the matrix computations in SAS/IML to "manually" compute the coordinates of the markers and vectors. (These same computations are performed internally by the %BIPLOT macro.) You can use the Biplot module to create a biplot, or you can use the WriteBiplot module to create a SAS data set that contains the biplot coordinates. You can then use PROC SGPLOT to create the biplot.

For consistency with the previous article, all methods in this article standardize the input variables to have mean zero and unit variance (use the SCALE=STD option in the %BIPLOT macro). All biplots show projections of the same four-dimensional Fisher's iris data. The following DATA step assigns a blank label. If you do not supply an ID variable, some biplots display observations numbers.

data iris;
   set Sashelp.iris;
   id = " ";        /* create an empty label variable */
run;

Use PROC PRINQUAL to compute the COV biplot

The PRINQUAL procedure can perform a multidimensional preference analysis, which is visualized by using a MDPREF plot. The MDPREF plot is closely related to biplot (Jackson (1991), A User’s Guide to Principal Components, p. 204). You can get PROC PRINQUAL to produce a COV biplot by doing the following:

  • Use the N=2 option to specify you want to compute two principal components.
  • Use the MDPREF=1 option to specify that the procedure should not rescale the vectors in the biplot. By default, MDPREF=2.5 and the vectors appear 2.5 larger than they should be. (More on scaling vectors later.)
  • Use the IDENTITY transformation so that the variables are not transformed in a nonlinear manner.

The following PROC PRINQUAL statements produce a COV biplot (click to enlarge):

proc prinqual data=iris plots=(MDPref) 
              n=2       /* project onto Prin1 and Prin2 */
              mdpref=1; /* use COV scaling */
   transform identity(SepalLength SepalWidth PetalLength PetalWidth);  /* identity transform */
   id ID;
   ods select MDPrefPlot;
run;
COV biplot, created in SAS by using PROC PRINQUAL

Use Friendly's %BIPLOT macro

Friendly's books [SAS System for Statistical Graphics (1991) and Visualizing Categorical Data (2000)] introduced many SAS data analysts to the power of using visualization to accompany statistical analysis, and especially the analysis of multivariate data. His macros use traditional SAS/GRAPH graphics from the 1990s. In the mid-2000s, SAS introduced ODS statistical graphics, which were released with SAS 9.2. Although the %BIPLOT macro does not use ODS statistical graphics directly, the macro supports the OUT= option, which enables you to create an output data set that contains all the coordinates for creating a biplot.

The following example assumes that you have downloaded the %BIPLOT macro and the %EQUATE macro from Michael Friendly's web site.

/* A. Use %BIPLOT macro, which uses SAS/IML to compute the biplot coordinates. 
      Use the OUT= option to get the coordinates for the markers and vectors.
   B. Transpose the data from long to wide form.
   C. Use PROC SGPLOT to create the biplot
*/
%let FACTYPE = SYM;   /* options are GH, COV, JK, SYM */
title "Biplot: &FACTYPE, STD";
%biplot(data=iris, 
        var=SepalLength SepalWidth PetalLength PetalWidth, 
        id=id,
        factype=&FACTYPE,  /* GH, COV, JK, SYM */
        std=std,           /* NONE, MEAN, STD  */
        scale=1,           /* if you do not specify SCALE=1, vectors are auto-scaled */
        out=biplotFriendly,/* write SAS data set with results */ 
        symbols=circle dot, inc=1);
 
/* transpose from long to wide */
data Biplot;
set biplotFriendly(where=(_TYPE_='OBS') rename=(dim1=Prin1 dim2=Prin2 _Name_=_ID_))
    biplotFriendly(where=(_TYPE_='VAR') rename=(dim1=vx dim2=vy _Name_=_Variable_));
run;
 
proc sgplot data=Biplot aspect=1 noautolegend;
   refline 0 / axis=x; refline 0 / axis=y;
   scatter x=Prin1 y=Prin2 / datalabel=_ID_;
   vector  x=vx    y=vy    / datalabel=_Variable_
                             lineattrs=GraphData2 datalabelattrs=GraphData2;
   xaxis grid offsetmin=0.1 offsetmax=0.2;
   yaxis grid;
run;
SYM biplot, created in SAS by using ODS statistical graphics

Because you are using PROC SGPLOT to display the biplot, you can easily configure the graph. For example, I added grid lines, which are not part of the output from the %BIPLOT macro. You could easily change attributes such as the size of the fonts or add additional features such as an inset. With a little more work, you can merge the original data and the biplot data and color-code the markers by a grouping variable (such as Species) or by a continuous response variable.

Notice that the %BUPLOT macro supports a SCALE= option. The SCALE= option applies an additional linear scaling to the vectors. You can use this option to increase or decrease the lengths of the vectors in the biplot. For example, in the SYM biplot, shown above, the vectors are long relative to the range of the data. If you want to display vectors that are only 25% as long, you can specify SCALE=0.25. You can specify numbers greater than 1 to increase the vector lengths. For example, SCALE=2 will double the lengths of the vectors. If you omit the SCALE= option or set SCALE=0, then the %BIPLOT macro automatically scales the vectors to the range of the data. If you use the SCALE= option, you should tell the reader that you did so.

SAS/IML modules that compute biplots

The %BIPLOT macro uses SAS/IML software to compute the locations of the markers and vectors for each type of biplot. I wrote three SAS/IML modules that perform the three steps of creating a biplot:

  • The CalcBiplot module computes the projections of the observations and scores onto the first few principal components. This module (formerly named CalcPrinCompBiplot) was written in the mid-2000s and has been distributed as part of the SAS/IML Studio application. It returns the scores and vectors as SAS/IML matrices.
  • The WriteBiplot module calls the CalcBiplot module and then writes the scores to a SAS data set called _SCORES and the vectors (loadings) to a SAS data set called _VECTORS. It also creates two macro variables, MinAxis and MaxAxis, which you can use if you want to equate the horizontal and vertical scales of the biplot axes.
  • The Biplot function calls the WriteBiplot module and then calls PROC SGPLOT to create a biplot. It is the "raw SAS/IML" version of the %BIPLOT macro.

You can use the CalcBiplot module to compute the scores and vectors and return them in IML matrices. You can use the WriteBiplot module if you want that information in SAS data sets so that you can create your own custom biplot. You can use the Biplot module to create standard biplots. The Biplot and WriteBiplot modules are demonstrated in the next sections.

Use the Biplot module in SAS/IML

The syntax of the Biplot module is similar to the %BIPLOT macro for most arguments. The input arguments are as follows:

  • X: The numerical data matrix
  • ID: A character vector of values used to label rows of X. If you pass in an empty matrix, observation numbers are used to label the markers. This argument is ignored if labelPoints=0.
  • varNames: A character vector that contains the names of the columns of X.
  • FacType: The type of biplot: 'GH', 'COV', 'JK', or 'SYM'.
  • StdMethod: How the original variables are scaled: 'None', 'Mean', or 'Std'.
  • Scale: A numerical scalar that specifies additional scaling applied to vectors. By default, SCALE=1, which means the vectors are not scaled. To shrink the vectors, specify a value less than 1. To lengthen the vectors, specify a value greater than 1. (Note: The %BIPLOT macro uses SCALE=0 as its default.)
  • labelPoints: A binary 0/1 value. If 0 (the default) points are not labeled. If 1, points are labeled by the ID values. (Note: The %BIPLOT macro always labels points.)

The last two arguments are optional. You can specify them as keyword-value pairs outside of the parentheses. The following examples show how you can call the Biplot module in a SAS/IML program to create a biplot:

ods graphics / width=480px height=480px;
proc iml;
/* assumes the modules have been previously stored */
load module=(CalcBiplot WriteBiplot Biplot);
use sashelp.iris;
read all var _NUM_ into X[rowname=Species colname=varNames];
close;
 
title "COV Biplot with Scaled Vectors and Labels";
run Biplot(X, Species, varNames, "COV", "Std") labelPoints=1;   /* label obs */
 
title "JK Biplot: Relationships between Observations";
run Biplot(X, NULL, varNames, "JK", "Std");
 
title "JK Biplot: Automatic Scaling of Vectors";
run Biplot(X, NULL, varNames, "JK", "Std") scale=0;            /* auto scale; empty ID var */
 
title "SYM Biplot: Vectors Scaled by 0.25";
run Biplot(X, NULL, varNames, "SYM", "Std") scale=0.25;        /* scale vectors by 0.25 */
SYM biplot with auto-scaled vectors. Biplot created by using ODS statistical graphics

The program creates four biplots, but only the last one is shown. The last plot uses the SCALE=0.25 option to rescale the vectors of the SYM biplot. You can compare this biplot to the SYM biplot in the previous section, which did not rescale the length of the vectors.

Use the WriteBiplot module in SAS/IML

If you prefer to write an output data set and then create the biplot yourself, use the WriteBiplot module. After loading the modules and the data (see the previous section), you can write the biplot coordinates to the _Scores and _Vectors data sets, as follows. A simple DATA step appends the two data sets into a form that is easy to graph:

run WriteBiplot(X, NULL, varNames, "JK", "Std") scale=0;   /* auto scale vectors */
QUIT;
 
data Biplot;
   set _Scores _Vectors;    /* append the two data sets created by the WriteBiplot module */
run;
 
title "JK Biplot: Automatic Scaling of Vectors";
title2 "FacType=JK; Std=Std";
proc sgplot data=Biplot aspect=1 noautolegend;
   refline 0 / axis=x; refline 0 / axis=y;
   scatter x=Prin1 y=Prin2 / ; 
   vector  x=vx    y=vy    / datalabel=_Variable_
                             lineattrs=GraphData2 datalabelattrs=GraphData2;
   xaxis grid offsetmin=0.1 offsetmax=0.1 min=&minAxis max=&maxAxis;
   yaxis grid min=&minAxis max=&maxAxis;
run;
JK biplot, created in SAS by using ODS statistical graphics

In the program that accompanies this article, there is an additional example in which the biplot data is merged with the original data so that you can color-code the observations by using the Species variable.

Summary

This article shows four ways to use modern ODS statistical graphics to create a biplot in SAS. You can create a COV biplot by using the PRINQUAL procedure. If you have a license for SAS/IML and SAS/GRAPH, you can use Friendly's %BIPLOT macro to write the biplot coordinates to a SAS data set, then use PROC SGPLOT to create the biplot. This article also presents SAS/IML modules that compute the same biplots as the %BIPLOT macro. The WriteBiplot module writes the data to two SAS data sets (_Score and _Vector), which can be appended and used to plot a biplot. This gives you complete control over the attributes of the biplot. Or, if you prefer, you can use the Biplot module in SAS/IML to automatically create biplots that are similar to Friendly's but are displayed by using ODS statistical graphics.

You can download the complete SAS program that is used in this article. For convenience, I have also created a separate file that defines the SAS/IML modules that create biplots.

The post Create biplots in SAS appeared first on The DO Loop.