6月 112020
 

Whether you enjoy debugging or hate it, for programmers, debugging is a fact of life. It’s easy to misspell a keyword, scramble your array subscripts, or (heaven forbid!) forget a semicolon. That’s why we include a chapter on debugging in The Little SAS® Book and its companion book, Exercises and Projects for the Little SAS® Book. We believe that learning to debug makes you a better programmer. Once you understand a bug, you will be better prepared to avoid it in the future.

To help hone your debugging skills, here is an example of the type of problems you can find in our book of exercises and projects. See if you can find the bugs.

Programming exercise

  1. A friend tells you that she is learning SAS and wrote the following program. Unfortunately, the program won’t run. Help her improve her programming skills by finding the mistakes.

TITLE Height, Weight, and BMI;
TITLE2 by Sex and Age Group;
PROC CONTENT DATA = SASHELP.class; RUN;
DATA; SET SASHELP.class;
Height_m = Heigth * 0.0254;
Weight_kg = Weight * 0.4536;
BMI = Weight_kg / Height_m**2;
PROC FORMAT; VALUE
$sex 'M' = 'Boys' 'F' = 'Girls';
VALUE agegp 11-12 = 'Preteens
13-16 = 'Teens';
PROC TABULATE;
CLASS Sex Age; VAR Height_m Weight_kg;
TABLES (Height_m Weight_kg BMI)*
MEAN, Sex Age ALL;
FORMAT Sex $sex. Age agegp.;
RUN;
QUIT;

 

  • a. Examine the SAS data set SASHELP.CLASS including variable attributes.
  • b. Clean up the formatting of the program by adding appropriate indention and line spacing to show the structure of the DATA and PROC steps. Make changes as needed to make the program conform to standard best practices.
  • c. Fix any errors in the code so that the program will run correctly.
  • d. Add comments to the revised program for each bug that you fix so that your friend can understand her mistakes.

Solution

In the book, we provide solutions for odd-numbered multiple choice and short answer questions, and hints for the programming exercises. So here is a hint for this exercise:

  1. Hint: This program contains four bugs. It also contains “red herrings” that are unusual for SAS code, but nonetheless do run properly and so are not actual bugs. Be sure you know how SAS handles data set names by default. SAS Enterprise Guide can format code for you; right-click the Program window and select Format Code from the pop-up menu. To format code in SAS Studio, click the Format Code icon at the top of Program window.

For more about The Little SAS Book and its companion book of exercises and projects, check out these blogs:

What's wrong with this code? was published on SAS Users.

6月 112020
 

Rapid demand response forecasting techniques are forecasting processes that can incorporate key information quickly enough to act upon in real time by agile supply chains.   Retailers and consumer goods suppliers are urgently trying to determine how changes in consumer behavior will affect their regions, channels, categories, brands and products during [...]

Rapid demand response forecasting helps retailers adapt during COVID-19 was published on SAS Voices by Charlie Chase

6月 102020
 

What do two health systems, 4000 miles apart (6600 kilometers) have in common in their COVID-19 response?  Despite different policy responses to the pandemic, they are more alike than you might think. Brought together by SAS and the FutureNHS COVID-19 analytics huddle, these two health systems recently shared their experience [...]

More alike than different: Two health systems share their COVID-19 response was published on SAS Voices by Alyssa Farrell

6月 102020
 

This article introduces the iml action, which is available in SAS Viya 3.5. The iml action supports most of the same syntax and functionality as the SAS/IML matrix language, which is implemented in PROC IML. With minimal changes, most programs that run in PROC IML also run in the iml action. In addition, the iml action supports new programming features for parallel programming.

Most actions in SAS Viya perform a specific task, but the iml action is different. The iml action provides a set of general programming tools that you can use to implement a custom parallel algorithm. The programmer can control many aspects of the computation, including how the computation is distributed among nodes and threads on a cluster of machines (or threads on a single machine).

Future articles will address the parallel programming capabilities of the iml action. This article provides an overview of the iml action. What is it? How do you get access to it? How is it similar to and different from PROC IML?

What is the iml action?

Recall that the SAS/IML language is a matrix-vector programming language that supports a rich library of functions in statistics, data analysis, matrix computations, numerical analysis, simulation, and optimization. In SAS 9 ("traditional SAS"), you can access the SAS/IML language by licensing the SAS/IML product and calling the IML procedure. PROC IML is also available in the SAS University Edition.

In SAS Viya 3.5. you get access to the SAS/IML language by licensing the SAS IML product. (Notice that there is no “slash” in the product name.) The SAS IML product gives you access to the iml action and to the IML procedure. Thus, in Viya, you can run all existing PROC IML programs, and you can also write new programs that run in the iml action and use SAS Cloud Analytic Services (CAS).

The iml action belongs to the iml action set. In addition to supporting most of the statements and functions in the SAS/IML language, the iml action supports new functionality that enables you to take advantage of the distributed computational resources in SAS Viya. In particular, you can use the iml action to implement custom parallel algorithms that use multiple nodes and threads on a cluster of machines. Even on one machine, you can run custom parallel programs on a multicore processor.

How is the iml action similar to PROC IML?

The iml action and the IML procedure share a common syntax. The mathematical and statistical function library is essentially the same in the action and in the procedure. Both environments support arithmetic and linear algebraic operations on matrices, operations to subset and query matrices, and programming features such as writing loops and using IF-THEN/ELSE logic.

Of the 300 functions and statements in the SAS/IML run-time library, only a handful of statements are not supported in the iml action. Most differences are related to the difference between the SAS 9 and SAS Viya environments. PROC IML interacts with traditional SAS constructs (such as data sets and catalogs) and supports calling SAS procedures and interacting with files on your local computer. The iml action interacts with analogous constructs in the Viya environment. It can read and write CAS tables, write analytic stores (astores), and can call other Viya actions.

Why use the iml action?

The iml action runs on a CAS server. Why might you choose to use the iml action instead of PROC IML? Or convert an existing PROC IML program into the iml action? There are two main reasons:

  • You want to use the SAS/IML language as part of a sequence of actions that analyze data that are in CAS tables. By using the iml action, you can read and write CAS tables directly. If you use PROC IML, you need to pull the data from CAS into a SAS data set, run the analysis in PROC IML, and then push the results to a CAS table.
  • You want to take advantage of the capabilities of the CAS server to perform parallel processing. You can use the iml action to create custom parallel computations.

How is the iml action different from PROC IML?

As mentioned previously, the iml action does not support every function and statement that PROC IML supports. The unsupported functions and statements are primarily in four areas:

  • Base SAS functions that are not supported in the CAS DATA step. For example, the old random number generator functions RANUNI and RANNOR are not supported in CAS because they cannot generate independent streams of random number in parallel.
  • Statements that read or write SAS data sets or text files.
  • Functions that create graphics. CAS is for computations. You can download the results of the computation to whatever language you are using to call the CAS actions. For example, I download the results to SAS and create SAS graphs, but you could also use Python or R.
  • SAS/IML functions that were deprecated in earlier releases of SAS.

See the documentation for the iml action for a complete list of the PROC IML functions and statements that are not supported in the iml action.

Will my program run faster in the iml action than PROC IML?

If you take an existing PROC IML program and run it in the iml action, it will take about the same amount of time to run. Sure, it might run a little faster if the machines in your CAS cluster are newer and more powerful than the SAS server or your PC, but that speedup is due to hardware. An existing program does not automatically run faster in the iml action because it runs serially until it encounters a programming statement that can be executed in parallel. There are two main sets of statements that run in parallel:

  • Reading and writing data from CAS tables.
  • Functions that distribute computations. You, the programmer, need to call these functions to write parallel programs.

So, yes, you can get certain programs to run faster in the iml action, but it doesn't happen automatically. You have to add new input/output statements or call functions that execute tasks in parallel.

Should you convert from PROC IML to the iml action?

What does this mean for the SAS/IML programmer whose company is changing from SAS 9 to Viya? Do you need to convert hundreds of existing PROC IML programs to run in the iml action? No, absolutely not. As mentioned previously, when you license SAS IML on Viya, you get both PROC IML and the iml action. The existing programs that you wrote in SAS 9 will continue to run in PROC IML in Viya.

The Viya platform provides an opportunity to use the iml action but does not require it. Under what circumstances might you want to convert a program from PROC IML to the iml action? Or write a new program in the iml action instead of using PROC IML? In my opinion, it comes down to two issues: workflow and performance.

  • Workflow: If your company is using CAS actions and CAS-enabled procedures, and if your data are stored in CAS tables, then it makes sense to use the iml action instead of the IML procedure. The SAS/IML language is often used to pre- or post-process data for other procedures or actions. The iml action can read and write CAS tables that are created by other CAS actions or that will be consumed by other CAS actions.
  • Performance: Suppose that you have a computation that takes a long time to process in PROC IML, but the computation is "embarrassingly parallel." An embarrassingly parallel problem, is one that consists of many identical independent subtasks. The iml action supports several functions for distributing a computation to multiple threads. Examples of embarrassingly parallel computations in statistics and machine learning include Monte Carlo simulation, resampling methods such as the bootstrap, ensemble models, and many "brute force" computations.

Further reading

I continue this exploration of the iml action in subsequent articles. Related articles include:

  • A Getting Started example that shows how to call the iml action and discusses how the action is similar to and different from PROC IML.
  • An example that shows how to read and write CAS tables from the iml action.
  • The MAPREDUCE function, which enables you to distribute a computation across threads and nodes.
  • The PARTASKS function, which enables you to distribute multiple independent computations across threads and nodes.
  • The SCORE function, which enables you to evaluate a function in parallel on every row of a CAS table.

For more information and examples, see Wicklin and Banadaki (2020), "Write Custom Parallel Programs by Using the iml Action," which is the basis for these blog posts. Another source is the SAS IML Programming Guide, which includes documentation and examples for the iml action.

The post An introduction to the iml action in SAS Viya appeared first on The DO Loop.

6月 082020
 

Critics of sports analytics (and there are some entertaining ones) love to point out that analytics isn’t capable of capturing the things that don’t show up on a box score. A player who dives on the floor to save a loose ball, a quarterback strategically misleading a defender to free [...]

Going beyond the box score: Text analysis in sports was published on SAS Voices by Frank Silva

6月 082020
 

A SAS customer asked how to specify interaction effects between a classification variable and a spline effect in a SAS regression procedure. There are at least two ways to do this. If the SAS procedure supports the EFFECT statement, you can build the interaction term in the MODEL statement. For procedures that do not support the EFFECT statement, you can generate the interaction effects yourself. Because the customer's question was specifically about the GAMPL procedure, which fits generalized additive models, I also show a third technique, which enables you to model interactions in the GAMPL procedure.

To illustrate the problem, consider the following SAS DATA step, which simulates data from two different response functions:

data Have;
call streaminit(1);
Group="A"; 
do x = 1 to 10 by .1;
   y =    2*x + sin(x) + rand("Normal"); output;
end;
Group="B";
do x = 1 to 10 by .1;
   y = 10 - x - sin(x) + rand("Normal"); output;
end;
run;

The graph shows the data and overlays a regression model. The model predicts the response for each level of the Group variable. Notice that the two curves have different shapes, which means that there is an interaction between the Group variable and the x variable. The goal of this article is to show how to use a spline effect to predict the response for each group.

Use the grammar of the procedure to form interactions

Many SAS regression procedures support the EFFECT statement, the CLASS statement, and enable you to specify interactions on the MODEL statement. Examples include the GLMMIX, GLMSELECT, LOGISTIC, QUANTREG, and ROBUSTREG procedures. For example, the following call to PROC GLMSELECT specifies several model effects by using the "stars and bars" syntax:

  • The syntax Group | x includes the classification effect (Group), a linear effect (x), and an interaction effect (Group*x).
  • The syntax Group * spl includes an interaction effect between the classification variable and the spline.
title "GLMSELECT Model with Spline and Interaction Effects";
ods select none;
proc glmselect data=Have
        outdesign(addinputvars fullmodel)=SplineBasis; /* optional: write design matrix to data set */
   class Group;
   effect spl = spline(x);
   model y = Group|x  Group*spl / selection=none;
   output out=SplineOut predicted=p;            /* output predicted values */
quit;
ods select all;
 
%macro PlotInteraction(dsname);
   proc sgplot data=&dsname;
      scatter x=x y=y / group=Group;
      series x=x y=p / group=Group curvelabel;
      keylegend / location=inside position=E;
   run;
%mend;
%PlotInteraction(SplineOut);

The predicted curves are shown in the previous section. I wrapped the call to PROC SGPLOT in a macro so that I can create the same plot for other models.

Output the design matrix for the interactions

Some SAS procedures do not support the EFFECT statement. For these procedures, you cannot form effects like Group*spline(x) within the procedure. However, the GLMSELECT procedure supports the OUTDESIGN= option, which writes the design matrix to a SAS data set. I used the option in the previous section to create the SplineBasis data set. The data set includes variables for the spline basis and for the interaction between the Group variable and the spline basis. Therefore, you can use the splines and interactions with splines in other SAS procedures.

For example, the GLM procedure does not support the EFFECT statement, so you cannot generate a spline basis within the procedure. However, you can read the design matrix from the previous call to PROC GLMSELECT. The interaction effects are named spl_Group_1_A, spl_Group_1_B, spl_Group_2_A, ...., where the suffix "_i_j" indicates the interaction effect between the i_th spline basis and the j_th level of the Group variable. You can type the names of the design variables, or you can use the "colon notation" to match all variables that begin with the prefix "spl_Group". The following call to PROC GLM computes the same model as in the previous section:

proc glm data=SplineBasis;
   class Group;
   model y = Group|x  spl_Group: ;           /* spl_Group: => spl_Group_1_A,  spl_Group_1_B, ... */
   output out=GLMOut predicted=p;            /* output predicted values */
quit;
 
/* show that the predicted values are the same */
data Comp;
   merge SplineOut GLMOut(keep=p rename=(p=p2));
   diff = p - p2;
run;
proc means data=comp Mean Min Max;  var diff;  run;

The output of PROC MEANS shows that the two models are the same. There is no difference between the predicted values from PROC GLM (which reads the design matrix) and the values from PROC GLMSELECT (which reads the raw data).

The splines of the interactions versus the interactions of the splines

Some nonparametric regression procedures, such as the GAMPL procedure, have their own syntax to generate spline effects. In fact, PROC GAMPL uses thin-plate splines, which are different from the splines that are supported by the EFFECT statement. Recently, a SAS customer asked how he could model interaction terms in PROC GAMPL.

The GAMPL procedure supports semiparametric models. In the parametric portion of the model, it is easy to specify interactions by using the standard "stars and bars" notation in SAS. However, the SPLINE() transformation in PROC GAMPL does not support interactions between a classification variable and a continuous variable. To be specific, consider the following semiparametric model:

title "No Interaction Between Classification and Spline Effects";
proc gampl data=Have; *plots=components;
   class Group;
   model Y = param(Group | x)     /* parametric terms */
             spline(x);           /* nonprametric, but no interaction */
   output out=GamPLOut pred=p r=r;
   id Y X Group;
run;
 
%PlotInteraction(SplineOut);

The output shows that the model does not capture the nonlinear features of the data. This is because the spline term is computed for all values of x, not separately for each group. When the groups are combined, the spline effect is essentially linear, which is probably not the best model for these data.

Unfortunately, in SAS/STAT 15.1, PROC GAMPL does not support a syntax such as "spline(Group*x)", which would enable the shape of the nonparametric curve to vary between levels of the Group variable. In fact, the 15.1 documentation states, "only continuous variables (not classification variables) can be specified in spline-effects."

However, if your goal is to use a flexible model to fit the data, there is a model that might work. Instead of the interaction between the Group variable and spline(x), you could use the spline effects for the interaction between the Group variable and x. These are not the same models: the interaction with the splines is not equal to the spline of the interactions! However, let's see how to fit this model to these data.

The idea is to form the interaction effect Group*x in a separate step. You can use the design matrix (SplineBasis) that was created in the first section, but to make the process as transparent as possible, I am going to manually generate the dummy variables for the interaction effect Group*x:

data Interact; 
set Have; 
x_Group_A = x*(Group='A'); 
x_Group_B = x*(Group='B');
run;

You can use the spline effects for the variable x_Group_A and x_Group_B in regression models. The following model is not the same as the previous models, but is a flexible model that seems to fit these data:

title "Interaction Between Classification and Spline Effects";
proc gampl data=Interact plots=components;
   class Group;
   model Y = param(Group | x) 
             spline(x_Group_A) spline(x_Group_B);  /* splines of interactions */
   output out=GamPLOut pred=p;
   id Y X Group;
run;
 
title "GAMPL with Interaction Effects";
%PlotInteraction(GamPLOut);

The graph shows that (visually) the model seems to capture the undulations in the data as well as the earlier model. Although PROC GAMPL does not currently support interactions between classification variables and spline effects, this trick provides a way to model the data.

The post Interactions with spline effects in regression models appeared first on The DO Loop.