ods layout

3月 202012
 

Most SAS procedures support the BY statement, which allows you to create a report or analysis for each distinct value of a variable in your data set. The syntax is simple, and SAS procedures are usually tuned to do a good job of processing the data efficiently.

However, the BY statement approach has some limitations:

For example, suppose you want to show a PROC PRINT output followed by a PROC SGPLOT chart for each value of a variable in a data set. Using the BY statement in each of these two steps would produce a series of PROC PRINT results, followed by a series of PROC SGPLOT results. (Oh, and don't forget that you can use SGPLOT and SGPANEL for classification plots without explicit BY processing.)

There is a SAS programming pattern that allows you to extend the concept of BY processing to larger segments of your SAS program. In pseudo-code, this allows you to implement program logic such as:

for each value BY_VAL of VAR in DATASET 
  do;
    PROC step1 (for VAR=BY_VAL)
    run;
    PROC step2 (for VAR=BY_VAL)
    run;
    /* other PROC or DATA steps as needed */
    /* each for the case of VAR=BY_VAL    */
  end;

In real SAS code, the programming pattern uses the SAS macro language. (Of course! Because with SAS macro, there is almost nothing that you can't do.)

Let's take a simple code example through this transformation. Let's combine a PROC FREQ step with a PROC SGPLOT step. Here's the example without BY processing or classification. The results show a report and plot for ALL values of Type within SASHELP.CARS. It's okay, but it doesn't provide much insight into the different classifications of car Type (Hybrid, Truck, Sedan, and so on).

title "Type by Origin in SASHELP.CARS";
proc freq data=sashelp.cars;
  table origin*type /nocum nopercent list;
run;
 
proc sgplot data=sashelp.cars;
  histogram mpg_city;
  density mpg_city / type=kernel;
run;

Step 1. Get your program working for one value using WHERE=

Before introducing any macro statements into the mix, it's a good idea to get your non-macro logic correct. Things are much easier to debug and troubleshoot before you add any macro processing. This modification shows what the program would look like with one known value of Type:

title "HYBRID by Origin in SASHELP.CARS";
proc freq data=sashelp.cars (where=(type="Hybrid"));
  table origin*type /nocum nopercent list;
run;
 
proc sgplot data=sashelp.cars(where=(type="Hybrid"));
  histogram mpg_city;
  density mpg_city / type=kernel;
run;

Example output:

Step 2. Count the distinct values and create macro variables for each

Once you're happy with the output for the single distinct value in you created in Step 1, it's time to gather the information you need to repeat that step for each distinct value in the data. Before you can achieve this with a SAS macro loop, you're going to need two bits of information: how many distinct values are there (for the loop index), and what ARE those distinct values (for each iteration). Here's the pattern of code to figure this out:

/* create macro vars with values and a total count of distinct values */
proc sql noprint;
  select strip(put(count(distinct VAR),15.)) into :varCount from DATASET;
  select distinct VAR into :varVal1- :varVal&varCount  from DATASET;
quit;

This pattern uses the SELECT INTO feature of PROC SQL, which allows you to populate SAS macro variables with the output of any query operation. I use this code pattern often, so I created an editor abbreviation to make it easy to insert into any SAS program as I work. Then I can simply replace the key parts of the statements with the values I need from the program, such as for this example:

proc sql noprint;
  select strip(put(count(distinct TYPE),15.)) into :varCount from SASHELP.CARS;
  select distinct TYPE into :varVal1- :varVal&varCount  from SASHELP.CARS;
quit;

I replaced "VAR" with my class variable, "TYPE". And I replaced "DATASET" with "SASHELP.CARS". That's it.

UPDATE 26Mar2012: With new features in SAS 9.3, the above code pattern becomes even simpler. See how to improve on this SAS programming pattern.

When I run just these statements in SAS Enterprise Guide, I can use my SAS Macro Variable Viewer to see how my macro variables were initialized.

Step 3. Wrap your program logic in a macro function, and add a %DO loop

Now it's time to wrap the program segment in a %MACRO statement, and then add a %DO loop so that the segment is processed varCount times. (That's going to be 6 times, in our example.)

%macro ReportOnEachType;
  %do index = 1 %to &varCount;
    title "HYBRID by Origin in SASHELP.CARS";
    proc freq data=sashelp.cars (where=(type="Hybrid"));
      table origin*type /nocum nopercent list;
    run;
 
    proc sgplot data=sashelp.cars(where=(type="Hybrid"));
      histogram mpg_city;
      density mpg_city / type=kernel;
    run;
  %end;
%mend;
%ReportOnEachType;

Step 4. Fix your WHERE= processing to reference the macro variables

Now we've got the correct number of steps running, but the output is the same for each step -- not very interesting! We need to customize the program statements to use each distinct value of Type.

%macro ReportOnEachType;
  %do index = 1 %to &varCount;
    title "&&varVal&index. by Origin in SASHELP.CARS";
    proc freq data=sashelp.cars (where=(type="&&varVal&index."));
      table origin*type /nocum nopercent list;
    run;
 
    proc sgplot data=sashelp.cars(where=(type="&&varVal&index."));
      histogram mpg_city;
      density mpg_city / type=kernel;
    run;
  %end;
%mend;
%ReportOnEachType;

Note how we need to reference our macro variable that contains the distinct value: "&&varVal&index." You need a "double &" to dereference (fancy word for reference the reference) the correct macro variable for the current index value.

Step 5. (OPTIONAL) Keep tinkering towards perfection

Now you've "rolled your own" BY group processing, but don't stop there! With a few more tweaks we can make it even better:

/* create macro vars with values and a total count of distinct values */
proc sql noprint;
  select strip(put(count(distinct TYPE),15.)) into :varCount from SASHELP.CARS;
  select distinct TYPE into :varVal1- :varVal&varCount  from SASHELP.CARS;
quit;
 
%macro ReportOnEachType;
  title;
  ods noproctitle;
  ods layout start columns=2;
  %do index = 1 %to &varCount;
    ods region;
    proc freq data=sashelp.cars (where=(type="&&varVal&index."));
      SYSECHO "Processing freq for &&varVal&index., &index. of &varCount.";
      table origin*type /nocum nopercent list;
    run;
    ods region;
    ods graphics / width=500 height=400 imagename="plot&&varVal&index.";
    proc sgplot data=sashelp.cars(where=(type="&&varVal&index."));
      SYSECHO "Processing plot for &&varVal&index., &index. of &varCount.";
      histogram mpg_city;
      density mpg_city / type=kernel;
      xaxis label="MPG (City) for &&varVal&index.";
    run;
  %end;
  ods layout end;
%mend;
%ReportOnEachType;

I added ODS LAYOUT statements (officially experimental, but works well in HTML) to generate two-column output, with tables and charts side-by-side. I added SYSECHO statements so that I can track progress of the program as it runs in SAS Enterprise Guide. If you've got lots of data with lots of distinct values, it can take a while. You might appreciate the running status message. And I added some ODS GRAPHICS options to control the name of the output image files, just to make my results easier to track on the file system.

Here's an example of my final report. Can you take it even further? I'll bet that you can!

tags: BY group, macro programming, ods layout, SAS programming, SGPLOT