sas programming

3月 062017
 

After reading my article about how to use BY-group processing to run 1000 regression models, a SAS programmer asked whether it is possible to reorder the output of a BY-group analysis. The answer is yes: you can use the DOCUMENT procedure to replay a portion of your output in any order you like. It is a feature of the SAS ODS system that is not very well known to statisticians. Kevin Smith (2012) has called PROC DOCUMENT "the most underutilized tool in the ODS toolbox."

The end of this article contains links to several papers that explain how PROC DOCUMENT works. I was first introduced to PROC DOCUMENT by Warren Kuhfeld, who uses it to modify the graphical output that comes from statistical procedures.

Reordering output (the topic of this article) is much easier than modifying the output. There is a SAS Sample that shows how to use PROC DOCUMENT to reorder BY groups, but I will show a simpler variation. The SAS Sample mentions that you can also use SAS macros to order the output from BY-group analyses, but the macro method is less efficient, computationally speaking.

Use PROC DOCUMENT to reorder SAS output

Suppose you use multiple procedures in an analysis and each procedure uses BY groups. The SAS output will consist of all the BY groups for the first procedure followed by all the BY groups for the second procedure. Sometimes it is preferable to reorder the output so that the results from all procedures are grouped by the BY-group levels.

A basic reordering of the SAS output uses the following steps:

  1. Use the ODS DOCUMENT statement to store the tables and graphs from the analyses into a destination called the document.
  2. Use the LIST statement in PROC DOCUMENT to examine the names of the tables and graphs within the document.
  3. Use the REPLAY statement in PROC DOCUMENT to send the tables and graphs to other ODS destinations in whatever order you choose.

Step 1: Store the output in a document

The following statements generate descriptive statistics for six clinical variables and for each level of the Smoking_Status variable. The Smoking_Status variable indicates whether a patient in this study is a non-smoker, a light smoker, and so on. PROC MEANS generates statistics for the continuous variable; PROC FREQ generates counts for the levels of the discrete variables. The ODS DOCUMENT statement writes all of the ODS output information to a SAS document (technically an item store) called 'doc':

proc sort data=sashelp.heart out=heart; /* sort for BY group processing */
   by smoking_status;
run;
 
ods document name=doc(write);      /* wrap ODS DOCUMENT around the output */
   proc means data=heart;
      by smoking_status;
      var Cholesterol Diastolic Systolic;         /* 1 table, results for three vars */
   run;
 
   proc freq data=heart;
      by smoking_status;
      tables BP_Status Chol_Status Weight_Status; /* 3 tables, one for each of three vars */
   run;
ods document close;                /* wrap ODS DOCUMENT around the output */

Step 2: Examine the names of the objects in the document

At this point, the output is captured in an ODS document. Each object is assigned a name that is closely related to the ODS name that is displayed if you use ODS TRACE ON. You can use PROC DOCUMENT to list the contents of the SAS item store. This will tell you the names of the ODS objects in the document.

PROC DOCUMENT is an interactive procedure, which means that the procedure continues running until you submit the QUIT statement. You can submit a group of statements followed by a RUN statement to execute the procedure interactively. The following statement runs a LIST statement, but does not exit the procedure:
proc document name=doc(read);
   list / levels=all bygroups;  /* add column for each BY group var */
run;

The output shows the table names. The tables from PROC MEANS begin with "Means" and end with the name of the table, which is "Summary." (The references explain the purpose of the many "#1" strings, which are called sequence numbers.) Similarly, the tables from PROC FREQ begin with "Freq" and end with "OneWayFreqs." Notice that the listing includes the BY-group variable as a column. You can use this variable to list or replay only certain BY groups. For example, to list only the tables for non-smokers, you can run the following PROC DOCUMENT statements:

   /* caret (^) is "current directory," not "negation"! */
   list ^ (where=(Smoking_Status="Non-smoker")) / levels=all bygroups;
run;

The output shows that there are four tables that correspond to the output for the "Non-smoker" level of the BY group.

Step 3: Replay the output in any order

The previous section showed that you can use a WHERE clause to list all the tables (and graphs) in a BY group. The REPLAY Statement also supports a WHERE clause, and you can use the REPLAY statement to send only those tables to the open ODS destinations. The DOCUMENT procedure is still running, so the following statements produce the output:

   replay ^ (where=(Smoking_Status="Non-smoker")); /* levels=all is default */
run;

The output is not shown, but it consists of all tables for which Smoking_Status="Non-smoker," regardless of which procedure created the output.

Notice that the "Non-smoker" group was the fifth level (in alphabetical order) of the Smoking_Status variable. You can use multiple REPLAY statements to send output for other levels in any order. For example, you might want to output the tables only for smokers, and order the output according to how much tobacco the patients consume. The following statement output the tables for smokers, grouped by tobacco usage:

   /* replay output in a different order */
   replay ^(where=(Smoking_Status="Light (1-5)"));
   replay ^(where=(Smoking_Status="Moderate (6-15)"));
   replay ^(where=(Smoking_Status="Heavy (16-25)"));
   replay ^(where=(Smoking_Status="Very Heavy (> 25)"));
run;
quit;

For brevity I have left out many details about the syntax of PROC DOCUMENT, but I hope the main idea is clear: You can store the results of multiple computations in a document and then replay any subset of the results in any order. The references in the next section contain many useful tips and techniques to help you get the most out of PROC DOCUMENT.

References

For an innovative use of PROC DOCUMENT to modify statistical graphics, see the papers and books by Warren Kuhfeld, such as Kuhfeld (2016) "Highly Customized Graphs Using ODS Graphics."

The post Reorder the output from a BY-group analysis in SAS appeared first on The DO Loop.

2月 272017
 

Longtime SAS programmers know that the SAS DATA step and SAS procedures are very tolerant of typographical errors. You can misspell most keywords and SAS will "guess" what you mean. For example, if you mistype "PROC" as "PRC," SAS will run the program but write a warning to the log: WARNING 14-169: Assuming the symbol PROC was misspelled as PRC.

This feature provided a big productivity boost in the days before GUI program editors. Imagine submitting a program from a command line in the early 1980s. If you mistyped one keyword you would have to retype the entire statement. As a convenience, SAS implemented an algorithm that checks the "spelling distance" between the tokens that you submit and a list of valid keywords for the procedure that you are calling. DATA step programmers might be familiar with the SPEDIS function, which measures how close two words are to each other in the English language. The SAS language parser uses the same algorithm.

Not everyone wants this feature. Many companies in regulated industries (such as pharmaceuticals) turn off the autocorrect feature in SAS because they want to force their programmers to type every keyword correctly. You can determine whether AUTOCORRECT option is enabled on your system by running PROC OPTIONS:

proc options option=AUTOCORRECT value;  run;

The AUTOCORRECT option is turned on by default. You can turn off the option by submitting options NOAUTOCORRECT or by putting -NOAUTOCORRECT in a configuration file.

Today I've invited two people to argue for and against using this feature. Larry Literal is a programmer who believes that no program should ever accept a syntax error. Annie Intel sees nothing wrong with programs that self-correct. She argues that it is desirable for programs to interpret the intention of the programmer. Which do you agree with? Do you have something to add? Leave a comment.

Point: A program should not allow ambiguity

My name is Larry Literal and I believe that computer programming should be an exact science. There is no room for ambiguity. A program that runs because it is "close to" a correct program is an abomination. I do not want a computer to change the code that I write!

When my system administrator installs a new version of SAS, the first thing I do is turn off the autocorrect feature. (I've also turned off the autocorrect feature on my phone. What a pain!) My main argument against the AUTOCORRECT option is that it makes code unreadable. Take a look at the following program:

/* The correct program is:
   proc freq dta=sashelp.class order=freq;
      table sex / chisq;
   run;
*/
prc freq dta=sashelp.class ordor=freqq;
   tble sex / chsq;
runn;

Every keyword in this program is mistyped. The only tokens that are specified correctly are the name of the procedure, the name of the data set, and the name of the variable. The program looks more like the Klingon language than the SAS language, yet this program runs if you use the AUTOCORRECT option!

And what happens if SAS introduces a new keyword that is closer to a mistyped word than a previous keyword? Then the procedure might do something different even though I have not changed the program! The autocorrect feature is an abomination and should never be used!

Counterpoint: Computers should interpret what you say

Really, Larry? "An abomination"? What century are you living in?

My name is Annie Intel, but my friends call me "A.I." I think the SAS autocorrect feature was way ahead of its time. Today we have autocorrecting logic on smartphones and word processors. Applying the same techniques to computer programs is no different. In fact, if you use a modern SAS program editor, the editor will suggest valid keywords and flag any keyword that is not valid.

Let's be real: Larry's example is not realistic. No programmer is going to use that garbled call to PROC FREQ in a production job. The autocorrect feature does not "make code unreadable." It is a convenience while developing a program, not an excuse to write nonsense. Any competent programmer will check the log for warning messages and correct the typos.

Larry claims that he doesn't want a computer munging and altering the code he writes. But optimizing compilers have been doing exactly that for decades! Programmers write instructions in a high-level language and an optimizing compiler maps the code to a set of machine instructions. The compiler will sometimes rearrange the structure of the program to get better performance. If it is okay for a compiler to map a program into an optimal version of itself, why is it not okay for a parser to do the same by correcting misspellings?

I want computers to recognize my intentions. When I give a voice command to my smartphone or personal home device, the audio signal is mapped to an action. I am allowed a certain amount of flexibility. "Turn on the lights" and "turn da light on" are equivalent phrases that should be understood and mapped to the same action. The SAS AUTOCORRECT feature is similar. The interpreter has a context (the name of the procedure) which is used to standardize your input. I think it is very cool. In the future, I think more programming languages will accept ambiguities.

The post Point/Counterpoint: Should a programming language accept misspelled keywords? appeared first on The DO Loop.

2月 212017
 

What?!?  You mean a period (.) isn't the only SAS numeric missing value? Well, there are 27 others: .A .B, to .Z and ._ (period underscore). Your first question might be: "Why would you need more than one missing value?"  One situation where multiple missing values are useful involves survey data. Suppose [...]

The post The Other 27 SAS Numeric Missing Values appeared first on SAS Learning Post.

2月 082017
 

Colors are the subject of many romantic poems and songs, but there isn't much romance to be found in their hexadecimal values. With apologies to Van Morrison:

...Skipping and a jumping
In the misty morning fog with
Our hearts a thumpin' and you
My cx662F14 eyed girl

When it comes to specifying colors within a SAS program, you can always rely on the simple color names: red, blue, yellow, and so on. (You know, the colors you might remember from your first box of Crayola crayons.) You can even predict a few more exotic names such as "lightgreen" and "darkyellow" and even "olivedrab". Are you familiar with HTML color name standards? Most of those names work as well. But for true color precision, you might want to use the hexadecimal values or at least the super-descriptive SAS color names.


In SAS Enterprise Guide, when you type a piece of SAS syntax that expects a color value, you'll find that the program editor pops up a helpful "color picker," displaying a long list of acceptable color names and their hex values. You can scroll through the list or use "type ahead" to find the color you want, then click or press Enter to accept it.

There's a keyboard shortcut that will invoke the color picker at any time: Ctrl+Shift+C. Use that when working on a SAS macro program or at any place the SAS program editor might not otherwise predict. By default, the editor will drop in the color name. You can change that behavior by visiting Program->Editor Options, Autocomplete tab. Select between the SAS color name or the more-obscure hex value. (Guaranteed to make your program more difficult to read, and thus helpful for job security.)

Are you using SAS Studio? You can also color your world with just a few keystrokes. This screenshot is from SAS Studio 3.6:

More colorful resources

tags: SAS Enterprise Guide, SAS programming

The post Tip for coding your color values in SAS Enterprise Guide appeared first on The SAS Dummy.

2月 062017
 

Suppose you create a scatter plot in SAS with PROC SGPLOT. What color does PROC SGPLOT use for the markers? If you specify the GROUP= option so that markers are colored by a grouping variable, what colors are used to represent the various groups? The following scatter plot shows the colors that are used by default for the HTMLBlue style. They are shades of blue, red, green, brown, and magenta.

data A;        /* example data with groups 1, 2, ..., 5 */
do Color = 1 to 5;
   x = Color; y = Color;  output;
end;
run;
 
title "Marker Colors Used for GROUP= Option";
title2 "HTMLBlue Style";
proc sgplot data=A;
xaxis grid; yaxis grid;
scatter x=x y=y / group=Color markerattrs=(size=24 symbol=SquareFilled);
run;
stylescolors1

Notice that these marker colors are not fully saturated colors, so they are not the SAS color names RED, BLUE, GREEN, BROWN, and MAGENTA. So what colors are these? What are their RGB values?


What colors does PROC SGPLOT uses for groups? #SASTip
Click To Tweet


Colors come from styles

Colors are defined by styles, and you can use ODS style elements to set marker colors. A style defines elements called GraphDataDefault, GraphData1, GraphData2, GraphData3, and so forth. Each element contains several attributes such as colors and line patterns. The complete list of style elements and attributes for ODS graphics is in the documentation, but for this article, the important fact is that the GraphDatan:ContrastColor attribute determines the marker color for the nth group. These are the colors in the previous scatter plot.

Display an ODS style template

Styles are defined by ODS templates. You can use the SOURCE statement in PROC TEMPLATE to display a template. In the style template, the GraphDatan:ContrastColor attributes are set by using keywords named gcdata1, gcdata2, gcdata3, etc.

If you display the template for the styles.HTMLBlue template, you will see that the HTMLBlue style inherits from the Statistical style, and it is the Statistical style that defines the contrast colors. The following statements display the contents of the Statistical template to the SAS log:

proc template;
source styles.statistical;
quit;

The template is long, and it is hard to scroll through the log to discover which colors are associated with each attribute. But that's no problem: you can use SAS to find and display only the information about contrast colors.

Display the marker colors as RGB and hexadecimal values

For years Warren Kuhfeld has been showing SAS customers how to view, edit, and use ODS templates to customize the graphs that are produced by SAS statistical procedures. A powerful technique that he uses is to write a template to a file and then use the DATA step to modify the template.

I will not modify the template but merely display information from it. The following DATA step writes the template to a text file and then uses the DATA step to find all instances of the keyword 'gcdata' in the template. For lines that contain the string 'gcdata', the program extracts the color for each keyword. The keyword-value pairs are saved to a data set, which is sorted and displayed:

libname temp "C:/temp";
proc template;
source styles.statistical / file='temp.tmp'; /* write template to text file */
quit;
 
data Colors;
keep Num Name Color R G B;
length Name Color $8;
infile 'temp.tmp';                    /* read from text file */
input;
/* example string:  'gcdata1' = cx445694 */
k = find(_infile_,'gcdata','i');      /* if k=0 then string not found */
if k > 0 then do;                     /* Found line that contains 'gcdata' */
   s = substr(_infile_, k);           /* substring from 'gcdata' to end of line */
   j = index(s, "'");                 /* index of closing quote  */
   Name = substr(s, 1, j-1);          /* keyword                 */
   if j = 7 then Num = 0;             /* string is 'gcdata'      */
   else                               /* extract number 1, 2, ... for strings */
      Num = inputn(substr(s, 7, j-7), "best2.");  /* gcdata1, gcdata2,...     */
   j = index(s, "=");                 /* index of equal sign     */
   Color = compress(substr(s, j+1));  /* color value for keyword */
   R = inputn(substr(Color, 3, 2), "HEX2.");   /* convert hex to RGB */
   G = inputn(substr(Color, 5, 2), "HEX2.");
   B = inputn(substr(Color, 7, 2), "HEX2.");
end;
if k > 0;
run;
 
proc sort data=Colors; by Num; run;
 
proc print data=Colors; 
var Name Color R G B;
run;
stylescolors2

Success! The output shows the contrast colors for the HTMLBlue style. The 'gcdata' color is the fill color (a dark blue) for markers when no GROUP= option is specified. The 'gcdatan' colors are used for markers that are colored by group membership. Obviously you could use this same technique to display other style attributes, such as line patterns or bar colors ('gdata').

If you prefer a visual summary of the attributes for an ODS style, see section "ODS Style Comparisons" in the SAS/STAT documentation. That section is part of the chapter "Statistical Graphics Using ODS," which could have been titled "Everything you always wanted to know about ODS graphics but were afraid to ask."

An application of setting marker colors

I prefer to style elements and discrete attribute maps to set colors for markers. But if you are rushed for time, you might want to use the STYLEATTRS statement to set the colors that are used for the GROUP= option. The STYLEATTRS statement requires a color list of hexadecimal colors or SAS color names. The following call to PROC SGPLOT uses the RGB/hex values for GraphData1:ContrastColor and so forth:

/* use colors for HTMLBlue style */
%let gcdata1 = cx445694;        
%let gcdata2 = cxA23A2E;
%let gcdata3 = cx01665E;
title "Origin in {Europe, USA}";
proc sgplot data=sashelp.cars;
where origin^='Asia' && type^="Hybrid";                 /* omit first category */
   styleattrs DataContrastColors = (&gcdata2 &gcdata3); /* use 2nd and 3rd colors */
   scatter x=weight y=mpg_city / group=Origin markerattrs=(symbol=CircleFilled);
   keylegend / location=inside position=TopRight across=1;
run;

It would be great if you could specify a style-independent syntax such as

styleattrs DataContrastColors=(GraphData2:ContrastColor GraphData3:ContrastColor);

Unfortunately, that syntax is not supported. The STYLEATTRS statement requires a list of color values or SAS color names.

Although this trick is interesting, in general I prefer to use styles (rather than hard-coded color values) in production code. However, if you want to know the RGB/hex values for a style, this trick shows how you can get them from an ODS template.

tags: SAS Programming, Statistical Graphics

The post What colors does PROC SGPLOT use for markers? appeared first on The DO Loop.

1月 242017
 

In a previous blog, Random Sampling: What's Efficient?, I discussed the efficiency of various techniques for selecting a simple random sample from a large SAS dataset.  PROC SURVEYSELECT easily does the job: proc surveyselect data=large out=sample method=srs /* simple random sample */ rate=.01; /* 1% sample rate */ run; Note: […]

The post Stratified random sample: What's efficient? appeared first on SAS Learning Post.

1月 162017
 

For SAS programmers, the PUT statement in the DATA step and the %PUT macro statement are useful statements that enable you to display the values of variables and macro variables, respectively. By default, the output appears in the SAS log. This article shares a few tips that help you to use these statements more effectively.

Tip 1: Display the name and value of a variable

The PUT statement supports a "named output" syntax that enables you to easily display a variable name and value. The trick is to put an equal sign immediately after the name of a variable: PUT varname=; For example, the following statement displays the text "z=" followed by the value of z:

data _null_;
x = 9.1; y = 6; z = sqrt(x**2 + y**2);
put z=;           /* display variable and value */
run;
z=10.9

Tip 2: Display values of arrays

You can extend the previous tip to arrays and to sets of variables. The PUT statement enables you to display elements of an array (or multiple variables) by specifying the array name in parentheses, followed by an equal sign in parentheses, as follows:

data _null_;
array x[5];
do k = 1 to dim(x);
   x[k] = k**2;
end;
put (x[*]) (=);     /* put each element of array on separate lines */
put (x1 x3 x5) (=); /* put each variable/value on separate lines */
run;
x1=1 x2=4 x3=9 x4=16 x5=25
x1=1 x3=9 x5=25

This syntax is not supported for _TEMPORARY_ arrays. However, as a workaraoun, you can use the CATQ function to concatenate array values into a character variable, as follows:

temp = catq('d', ',', of x[*]);         /* x can be _TEMPORARY_ array */
put temp=;

Incidentally, if you ever want to apply a format to the values, the format name goes inside the second set of parentheses, after the equal sign: put (x1 x3 x5) (=6.2);

Tip 3: Display values on separate lines

The previous tip displayed all values on a single line. Sometimes it is useful to display each value on its own line. To do that, put a slash after the equal sign, as follows:

...
put (x[*]) (=/);                   /* put each element on separate lines */
...
x1=1
x2=4
x3=9
x4=16
x5=25

Tip 4: Display all name-value pairs

You can display all values of all variables by using the _ALL_ keyword, as follows:

data _null_;
x = 9.1; y = 6; z = sqrt(x**2 + y**2);
A = "SAS"; B = "Statistics";
put _ALL_;              /* display all variables and values */
run;
x=9.1 y=6 z=10.9 A=SAS B=Statistics _ERROR_=0 _N_=1

Notice that in addition to the user-defined variables, the _ALL_ keyword also prints the values of two automatic variables named _ERROR_ and _N_.

Tip 5: Display the name and value of a macro variable

Just as the PUT statement displays the value of an ordinary variable, you can use the %PUT statement to display the value of a macro variable. If you use the special "&=" syntax, SAS will display the name and value of a macro variable. For example, to display your SAS version, you can display the value of the SYSVLONG automatic system macro variable, as follows:

%put &=SYSVLONG;
SYSVLONG=9.04.01M4P110916

The results above are for my system, which is running SAS 9.4M4. Your SAS version might be different.

Tip 6: Display all name-value pairs for macros

You can display the name and value of all user-defined macros by using the _USER_ keyword. You can display the values of all SAS automatic system macros by using the _AUTOMATIC_ keyword.

%let N = 50;
%let NumSamples = 1e4;
%put _USER_;
GLOBAL N 50
GLOBAL NUMSAMPLES 1e4

Conclusion and References

There you have it: six tips to make it easier to display the value of SAS variables and macro variables. Thanks to Jiangtang Hu who pointed out the %PUT &=var syntax in his blog in 2012. For additional features of the PUT and %PUT statements, see:

tags: SAS Programming, Tips and Techniques

The post PUT it there! Six tips for using PUT and %PUT statements in SAS appeared first on The DO Loop.

1月 032017
 

How many of you have been given a SAS data set with variables such as Age, Height, and Weight and some or all of them were stored as character values instead of numeric?  Probably EVERYONE! Yes, we all know how to do the old "swap and drop" (rename and convert), but […]

The post Character to Numeric Conversion in SAS appeared first on SAS Learning Post.

12月 292016
 

This SAS Jedi is very excited about the SAS 9.4 M4 release, which brought many wonderful gifts just in time for Christmas. So in the interest of extending the Christmas spirit, I'm going to blog about some of my favorites! I've long loved the SAS DO statement variant which allows […]

The post SAS Jedi Christmas - SAS 9.4 M4 DS2 Do Loop Upgrade appeared first on SAS Learning Post.

12月 282016
 

SAS temporary arrays are an underutilized jewel in the SAS toolbox. I find that many beginning to intermediate SAS programmers are not familiar with temporary arrays. The good news is that there is nothing complicated about them and they are very useful. First of all, what is a temporary array? […]

The post SAS Temporary Arrays, Not Just for Experts appeared first on SAS Learning Post.