sas programming

1月 302012
 

The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program:

data points;
drop i;
do i=1 to 10;
   x=rannor(34343);
   y=rannor(12345);
   z=rannor(54321);
   output;
end;
run;

The program creates the POINTS data set. The data set contains three variables, each containing random numbers from the standard normal distribution. I'm guessing that the author of the program thinks that using rannor(12345) to define the y variable makes y independent from the x variable, which is defined by rannor(34343).

Sorry, but that is not correct.

The x, y, and z variables are, indeed, independent samples from a normal distribution, but that fact does not depend on using different seeds in the RANNOR function. In fact, in this DATA step, all random number seeds except the first one are completely ignored! Don't believe me? Run the following DATA step and compare the two data sets, as follows:

data points2;
drop i;
/* change all random number seeds except the first */
x=rannor(34343); y=rannor(1); z=rannor(2); output;
do i=2 to 10;
   x=rannor(10+i);
   y=rannor(100+i);
   z=rannor(1000+i);
   output;
end;
run;
 
proc compare base=points compare=points2;
run;
                           The COMPARE Procedure
                Comparison of WORK.POINTS with WORK.POINTS2
                               (Method=EXACT)
                               
NOTE: No unequal values were found. All values compared are exactly equal.

All values compared are exactly equal. Every observation, every variable, down to the last bit. But except for the first observation of the x variable, the second DATA step uses completely different random number seeds! How can the POINTS2 data set be identical to the POINTS data set?

As I explained in a previous post on random number seeds in SAS, the random number seed for a DATA step (or SAS/IML program) is set by the first call. SAS ignores subsequent seeds within the same DATA step or PROC step. In my previous post, I used the newer (and better) STREAMINIT function and the RAND function instead of the older RANNOR function, but the fact remains that first random number seed determines the random number stream for the entire DATA step.

For further details, see the SAS documentation, which shows an example similar to mine in which three data sets (imaginatively named A, B, and C) contain the same pseudorandom numbers.

Now that I've ranted against using different random number seeds, I will reveal that the DATA step at the beginning of my post is from an example in the SAS Knowledge Base! Yes, even experienced SAS programmers are sometimes confused by the subtleties of random number streams. There is nothing wrong with a program that uses multiple seeds, but such a program makes the reader think that all those seeds are actually doing something. They’re not.

Are you someone who uses different random number seeds for each variable in the same DATA step or PROC IML program? If so, you can safely stop. Multiple seeds do not make your random variables any more "random." Only the first seed matters.
tags: Getting Started, Sampling and Simulation, SAS Programming
1月 282012
 

So many of us struggle with this mountain. In fact, 68.27% of us get within sight of reaching the summit (while 95.47% of us are at least on a perceivable slope). We run, walk, crawl and sometimes slide our way uphill (from one direction or the other) until we finally reach the top.

That is, the top of the "bell" curve.

I came across this t-shirt design over at shirt.woot.com, one of a number of entries that celebrate dubious honors. (Also worth a look: Least Noticed Person Ever and Duck, Duck, Goose Champion.)

The design inspired me. I thought, "I'm an average sort of SAS programmer; this is a summit that I can actually reach." And with a middling amount of effort, I met my objective.

And you can do it too. But please, don't knock yourself out. I sure-as-heck didn't. In fact, I reached this dubious peak by climbing over the backs of others, lifting the code for generating a normal distribution from Rick, and the code for vector plots from the book by Sanjay and Dan.

Here's my less-than-original SAS program that yields a less-than-original design:

ods graphics /width=500 height=500;
 
data normal;
  do x = -3 to 3 by 0.1;
    y = pdf("Normal", x);
    output;
  end;
  x0 = 0;
  y0 = .43;
  you="YOU'VE ARRIVED!";
  output;
run;
 
proc sgplot data=normal noautolegend;
  title "Congratulations!";
  title2 "You've reached...";
  footnote "MEDIOCRITY";
  series x=x y=y;
  vector x=x0 y=y0 /
    xorigin=x0 yorigin=.5  
    arrowdirection=out 
    lineattrs=(color=red thickness=1) 
    datalabel=you;
  xaxis grid display=(novalues);
  yaxis grid display=(novalues);
  refline 0 / axis=y;
run;
tags: SAS dummy, SAS programming, SGPLOT
1月 232012
 

Statistical programmers often need mathematical constants such as π (3.14159...) and e (2.71828...). Programmers of numerical algorithms often need to know machine-specific constants such as the machine precision constant (2.22E-16 on my Windows PC) or the largest representable double-precision value (1.798E308 on my Windows PC).

Some computer languages build these constants into the language as reserved (or semi-reserved) words. This can lead to problems. For example, statisticians use the symbol pi to represent mixture probabilities in mixture distributions. In some languages, you can assign the symbol pi to a vector, which overrides the built-in value and causes cos(pi) to have an unexpected value. In other languages, the symbol pi is a reserved keyword and is unavailable for assignment.

In SAS, constants are not built into the language. Instead, they are available through a call to the CONSTANT function. You can call the CONSTANT function from the DATA step, from SAS/IML programs, and from SAS procedures (such as PROC FCMP, PROC MCMC, and PROC NLIN) that enable you to write SAS statements inside the procedure.

For example, the following SAS/IML statements get two mathematical and two machine-specific constant and assign them to program variables:

proc iml;
pi = constant("pi");
e = constant("e");
maceps = constant("maceps");
big = constant("big");
print pi e maceps big;

The constants are printed by using the default BEST9. format; you can use a format such as BEST16. to see that the values contain additional precision. For example, the value that is actually stored in the pi variable is 3.14159265358979.

Most programmers understand the need for constants like "pi." However, I often use machine-specific constants in order to write robust numerical algorithms that avoid numerical overflow and underflow. For example, suppose that you have some data and you want to apply an exponential transformation to the data. (In other word, you want to compute exp(x) for each value of x.) The exponential function increases very quickly, so exp(x) cannot be stored in a double-precision value when x is moderately large. As a result, the following computation causes a numerical overflow error:

x = do(0, 1000, 200); /* {0, 200, 400, ..., 1000} */
expx = exp(x); /* ERROR: numerical overflow */
ERROR: (execution) Invalid argument to function.

 count     : number of occurrences is 2
 operation : EXP at line 2154 column 11
 operands  : x

<...more error information (omitted)..>

The largest value of x that can be exponentiated is found by calling the CONSTANT function with the "LOGBIG" argument. You can use the "LOGBIG" value to locate values of x that are too large to be transformed. You can then assign missing values to exp(x) for those "large" values of x, as shown in the following statements:

logbig = constant("logbig");
idx = loc(x<logbig);  /* locate values for which exp(x) does not overflow */
expx = j(1,ncol(x), .); /* allocate result vector of missing values */
expx[idx] = exp( x[idx] ); /* exp(x) only for "sufficiently small" x */
print logbig, expx;

In this way, you can write robust code that does not fail when there are large values in the data. Similarly, you can use this idea to protect against overflow when raising a value to a power by using the '**' operator in the DATA step or the '##' operator in SAS/IML.

This blog post was inspired by a tweet from @RLangTip on Twitter: "Constants built into #rstats: letters, months and pi," which linked to documentation on constants that are built into the R language.

tags: Getting Started, SAS Programming, Tips and Techniques
1月 192012
 

In the past, getting your hands on SAS for learning purposes required one of two fortunate situations:

  • being a student enrolled in a college course (or high school!) where SAS is taught
  • working for an employer who is willing to sponsor your training, either in an official course or on-the-job.

Now, there is an affordable way for professionals to get hands-on access to SAS for a reasonable price: SAS OnDemand for Professionals: Enterprise Guide (in the USA and Canada only, for now).

This new SAS OnDemand offering complements the SAS OnDemand for Academics offering that has evolved over the past few years.  It's SAS, running on "the cloud", and you use a supplied version of SAS Enterprise Guide to access it (along with a good collection of sample data).  With SAS Enterprise Guide you can exercise most of the SAS features that you would need to practice for any career objective: learn SAS programming, hone your skills in business analytics, or use high-end statistical methods to analyze data.

CHOOSE YOUR PATH THROUGH THIS BLOG POST:

  • If you want to learn how to use SAS to query and transform data, calculate summary statistics, build graphs, create reports, and dabble in higher-end analytics -- but you don't want to have to write or understand programming code...continue on to the immediately following section, "Learning SAS without programming".
  • If you want to learn how to program in SAS, using DATA step, SAS macro language, SAS procedures and more, and you don't want to be held back by a point-and-click interface...skip this next section and go directly to read "Programming SAS with SAS Enterprise Guide".

Learning SAS without programming

SAS Enterprise Guide is often positioned as "the point-and-click interface to SAS".  Many of us were raised on the idea that "using SAS requires programming".  But it doesn't.  SAS Enterprise Guide has over 90 built-in tasks for accessing data, summarizing data, creating reports, and performing statistical analysis.

There are many popular books and training courses that show you how to get to the power of SAS without having to learn the syntax of SAS.  For example, we have books like SAS For Dummies, Little SAS Book for Enterprise Guide, and Basic Statistics using SAS Enterprise Guide: A Primer.  And there is a whole boatload of training courses on the topic.

If you've read some of the SAS Enterprise Guide tips that I've published on this blog and want to try them out, you can probably can do it with SAS OnDemand.

NOTE: If you don't want to know anything about SAS programming, skip the next section and read "SAS OnDemand for Professionals: Learn what you want, how you want"

Programming SAS with SAS Enterprise Guide

So, you're a SAS programmer?  Or you want to be one?  Perhaps you're pursuing a SAS programming certification?  With SAS Enterprise Guide, you can skip the point-and-click stuff and jump right into programming with File->New->Program.  If you've read some of the SAS programming tips that I've published on this blog, you can try them for yourself using the SAS OnDemand environment.

For some experienced SAS programmers, SAS Enterprise Guide presents a different SAS environment than what you're accustomed to.  But you can accomplish most programming tasks here, and might even find yourself more productive with the super program editor and the process flow approach for organizing your code.

You can learn more about programmer productivity in SAS Enterprise Guide by watching this SAS Talks webinar.  Or if you really want a leg up, take the course.

SAS OnDemand for Professionals: Learn what you want, how you want

Regardless of your path -- point-and-click, SAS programming, or a mix of each -- SAS OnDemand for Professionals: Enterprise Guide provides a good learning environment to gain and practice your SAS skills.

But the learning resources don't stop there.  You can use these blogs, discussion forums, and the entire SAS community to supplement your knowledge as you learn.  SAS professionals love to share their knowledge.  And you'll be proud to share what you know too, when you join their ranks.

tags: SAS Enterprise Guide, SAS OnDemand, SAS programming
1月 042012
 

In the immortal words of Britney Spears: Oops! I did it again.

At least, I'm afraid that I did. I think I might have helped a SAS student with a homework assignment, or perhaps provided an answer in preparation for a SAS certification exam. Or maybe it was a legitimate work-related question; I'd like to think so, anyway.

This time, the question came to me via LinkedIn. (By the way, LinkedIn contains a rich network of SAS professionals; in her blog post, Tricia provides some helpful guidance for making use of that network.)

The question pertains to some confusing behavior of the LAG function. Within a DATA step, the LAG function is thought to provide a peek into the value of a variable within a previous observation. But in this program, the LAG function didn't seem to be doing its job:

data test;
  infile datalines dlm=',' dsd;
  input a b c;
  datalines;
4272451,17878,17878 
4272451,17878,17878 
4272451,17887,17887 
4272454,17878,17878 
4272454,17881,17881 
4272454,17893,17893 
4272455,17878,17878 
4272455,17878,18200 
run;
 
data testLags;
  retain e f ( 1 1);
  set test;
  if a=lag(a) and b>lag(b) then
    e=e+1;
  else if a^=lag(a) or lag(a)=. then
      e=1;
  if a^=lag(a) or lag(a)=. then
      f=1;
  else if a=lag(a) and b>lag(b) then
      f=f+1;
run;
 
proc print data=testLags;
run;

The questioner thought that the e and f variables should have the same values in each record of output, but they don't. The two variables are calculated using the exact same statements, but with the seemingly-exclusive IF/THEN conditions reversed. Here's the output:

Obs    e    f       a         b        c

 1     1    1    4272451    17878    17878
 2     1    1    4272451    17878    17878
 3     2    2    4272451    17887    17887
 4     1    1    4272454    17878    17878
 5     2    1    4272454    17881    17881
 6     3    2    4272454    17893    17893
 7     1    1    4272455    17878    17878
 8     1    1    4272455    17878    18200

There is a SAS note that warns of the effect of using the LAG function conditionally. But in this example, each set of LAG functions are used unconditionally (before the THEN clause). Or are they?

Let's review how the LAG function works. It draws values from a queue of previous values, and within each DATA step iteration that you call the LAG function, it draws a previous value from the queue. The trick here is that this program does not call the LAG function for both A and B with each iteration of the DATA step! Because the IF statements combine two conditions with an AND, if the first condition resolves to false, the second condition is not evaluated. After all, in logic-speak, FALSE AND (ANY value) is always FALSE, so the DATA step can save work by not bothering to evaluate the remainder of the expression.

  if a=lag(a) /* if false*/ and b>lag(b) /* then this is not evaluated*/

And then the next time around, when the LAG(b) function is called again, it's "behind" one on the queue for the value of b.

One way to solve the issue (and remove the logical ambiguity): set two temporary variables to LAG(a) and LAG(b) at the start of the DATA step, and use those variables in the subsequent comparisons. With the LAG function now being called with each iteration no matter what, the results are consistent. Here's an example of the modified program:

data testlags2(drop=laga lagb);
  retain e f ( 1 1);
  set test;
  laga = lag(a);
  lagb = lag(b);
  if a=laga and b>lagb then
    e=e+1;
  else if a^=laga or laga=. then
      e=1;
  if a^=laga or laga=. then
      f=1;
  else if a=laga and b>lagb then
      f=f+1;
run;

Here are the new results when printed:

Obs    e    f       a         b        c

 1     1    1    4272451    17878    17878
 2     1    1    4272451    17878    17878
 3     2    2    4272451    17887    17887
 4     1    1    4272454    17878    17878
 5     2    2    4272454    17881    17881
 6     3    3    4272454    17893    17893
 7     1    1    4272455    17878    17878
 8     1    1    4272455    17878    18200
tags: lag, LinkedIn, SAS programming
12月 172011
 

On the heels of the release of the popular SAS macro variable viewer from last month, I'm providing another custom task that I hope will prove just as useful. This one is a SAS options viewer, similar in concept to the OPTIONS window in SAS display manager.

You can download the new task from this location. (The download is a ZIP file with a DLL and a README.pdf that explains how to install it. In fact, it's the same download package as the macro viewer task; I've packaged them both in the same DLL, so if you install one, you get them both. You're welcome! Both tasks require SAS Enterprise Guide 4.3, with a SAS 9.2 or 9.3 environment.)

If you've tried the macro variable viewer, then the user interface for the Options viewer will look familiar. It shares many of the same features: the window "floats" as a toolbox window so you can keep working while it's visible, you can filter the results (important, given the hundreds of options!), you can view options as a straight list or grouped by category. In addition, there are important features specific to SAS options, such as the ability to see details about how each option was set and where it is valid. Here is the complete set of features:

Always-visible window: Once you open the task from the Tools menu, you can leave it open for your entire SAS Enterprise Guide session. The window uses a "modeless" display, so you can still interact with other SAS Enterprise Guide features while the window is visible. This makes it easy to switch between SAS programs and other SAS Enterprise Guide windows and the options viewer to see results.

Select active SAS server: If your SAS environment contains multiple SAS workspace connections, you can switch among the different servers to see options values on multiple systems.

One-click refresh: Refresh the list of option values by clicking on the Refresh button in the toolbar.

View by group or as a straight list: View the option values in their group categories (for example, MEMORY, GRAPHICS, EMAIL, etc.) or as a straight list, sorted by option name or current value. Click on the column headers to sort the list.

Set window transparency: You can make the window appear "see-through" so that it doesn't completely obscure your other windows as you work with it.

Filter results: Type a string of characters in the "Filter results" field, and the list of options will be instantly filtered to those that contain the sequence that you type. The filtered results will match on option names as well as values, and the search is case-insensitive. This is a convenient way to narrow the list to just a few options that you're interested in. To clear the filter, click on the X button next to the "Filter results" field, or "blank out" the text field.

Show option details: The "About" pane at the bottom of the window shows details about the currently selected option, including the current value, its value at session startup (SAS 9.3 only), where the option can be set (in OPTIONS statement or startup) and how the current value was set. You can show or hide this pane by clicking a toggle button at the top of the window.

I can imagine several more features that might be useful, but I decided that these were enough for a first version. Try it out and leave feedback for me in the comments here. (That worked pretty well with the macro viewer task; in fact, this options viewer was one of your suggestions!)

See also:

tags: SAS custom tasks, SAS Enterprise Guide, SAS options, SAS programming
12月 172011
 
This week's SAS author's tip comes from Jack Shostak - manager of statistical programming at the Duke Clinical Research Institute. Despite his youthful appearance, Jack's been using SAS since 1985 and is the author of SAS Programming in the Pharmaceutical Industry and coauthor of Common Statistical Methods for Clinical Research with SAS Examples, [...]
12月 162011
 

A few colleagues and I were exchanging short snippets of SAS code that create Christmas trees and other holiday items by using the SAS DATA step to arrange ASCII characters. For example, the following DATA step (contributed by Udo Sglavo) creates a Christmas tree with ornaments and lights:

data _null_;
put @11 '@';
do i=0 to 9;
  b=substr(repeat('*0',i),1,2*i+1);
  put @(11-i) b;
end;
put @10 '| |';
run;
          @
          *
         *0*
        *0*0*
       *0*0*0*
      *0*0*0*0*
     *0*0*0*0*0*
    *0*0*0*0*0*0*
   *0*0*0*0*0*0*0*
  *0*0*0*0*0*0*0*0*
 *0*0*0*0*0*0*0*0*0*
         | |

If you delete some unnecessary spaces, you can produce the Christmas tree in 90 characters. As a result, the DATA step is short enough to post on Twitter! So tweet the following message to all of your SAS friends!

#SAS XMas Tree http://bit.ly/uQELwe  
DATA;put@9'@';do i=0 to 8;b=substr(repeat('*0',i),1,2*i+1);put@(9-i)b;end;
put@8'| |';RUN;

Can you do better? One of my colleagues create a 76-character DATA step that produces a plain Christmas (with a star (*) on top) made entirely of the '0' character. Use the comments to post your favorite SAS code that spreads holiday cheer!

tags: Just for Fun, SAS Programming
11月 232011
 

The SAS macro variable "inspector" is a custom task that plugs into SAS Enterprise Guide 4.3. You can use it to view the current values for all SAS macro variables that are defined within your SAS session. You can also evaluate "immediate" macro expressions in a convenient quick view window. If you develop or run SAS macro programs, this task can be a valuable debugging and learning tool.

UPDATE 28Nov2011: I've received several comments about this task, both here on the blog and in e-mail. Based on some of your very good suggestions, I've updated the task with additional features. If you downloaded the task before 28Nov2011, you might want to refresh your copy.

UPDATE 16Dec2011: The download package for this task now also includes a SAS Options viewer task, described on this separate blog post.

I've been working with SAS macros quite a bit lately, and I decided that a task like this could come in handy as a sort of "watch" window for macro values. I built the task using the custom task APIs and Microsoft .NET. I hope that you find the task useful. If you try it out and have suggestions for how to make it better, please share by adding a comment to the blog.

The custom task and an accompanying README.pdf file (containing description and detailed installation instructions) are available for download in this ZIP file. Installation is simple: copy the DLL to a designated folder on your PC, and SAS Enterprise Guide will detect the task automatically.

This add-in offers the following main features:

Always-visible window: Once you open the task from the Tools menu, you can leave it open for your entire SAS Enterprise Guide session. The window uses a "modeless" display, so you can still interact with other SAS Enterprise Guide features while the window is visible. This makes it easy to switch between SAS programs and other SAS Enterprise Guide windows and the macro variable viewer to see results.

Select active SAS server: If your SAS environment contains multiple SAS workspace connections, you can switch among the different servers to see macro values on multiple systems.

One-click refresh: Refresh the list of macro variables by clicking on the Refresh button in the toolbar.

View by scope or as a straight list: View the macro variables in their scope categories (for example, Global and Automatic) or as a straight list, sorted by variable name or current value. Click on the column headers to sort the list.

Filter results (NEW!): Type a string of characters in the "Filter results" field, and the list of macro variable results will be instantly filtered to those that contain the sequence that you type. The filtered results will match on variable names as well as values, and the search is case-insensitive. This is a convenient way to narrow the list to just a few variables that you're interested in. To clear the filter, click on the X button next to the "Filter results" field, or "blank out" the text field. (Note: this is in the 28Nov2011 update!)

Set window transparency: You can make the window appear "see-through" so that it doesn't completely obscure your other windows as you work with it.

Copy macro variables as %LET statements: Select one or more macro variables within the window, right-click and select Copy assignments. This generates a series of %LET statements -- one for each macro variable/value pair -- which you can then paste into a SAS program.

Macro expression "quick view": Have you ever wanted to test out a macro expression before using it in a longer program? This window allows you to get immediate feedback on a macro expression, whether a simple macro reference or a more complex expression with nested functions. If the expression generates a SAS warning or error, the feedback window shows that as well. Note: the expression can be any macro expression that is valid for the right-side of a macro variable assignment (%let statement).

More macro productivity in SAS Enterprise Guide is just a few clicks away! Download the task today and let me know what you think!

tags: .net, debugging, macro programming, SAS custom tasks, SAS Enterprise Guide, SAS programming
11月 182011
 

Rick posted a tip today about using abbreviations in the SAS program editor window (often referred to as the "enhanced editor"). Defining abbreviations is a great way to save keystrokes and re-use "templates" of code that you've squirreled away. (One of Rick's readers also picked up on the tip, and added it to his blog.)

If you use SAS Enterprise Guide 4.3, you can define those abbreviations even easier, and you can call them up just like any other part of the SAS syntax, using the autocomplete features of the program editor. Here's how.

1. Open a SAS program window (by opening an existing program or select File->New->Program)

2. Select Program->Add Abbreviation Macro

3. In the Add Abbreviation Macro window, type a short name for your code snippet, and then add the code you want to substitute when the abbreviation is triggered. Bonus: the code field in this window also features the SAS program editor that helps you to complete the valid syntax.

4. To use the abbreviation, simply type the abbreviation within your program. The editor will automatically "suggest" the abbreviation as a possibility as you type, and you can press the spacebar to commit the selection, just as with any other suggested keyword. If you hover the mouse cursor over the suggestion, you can see a preview of the text that will be substituted in. Note that the abbreviation entry shows as a special item (green diamond instead of blue square in this case), as another hint that this element is different than the built-in syntax. (Were the autocomplete icons inspired by our bowl of Lucky Charms one morning? I'll never tell.)

You don't have to rely on the autocomplete feature of the editor to get to your abbreviation. You can do as Rick suggests and assign a shortcut key by selecting Program->Enhanced Editor Keys. However, I'd stay away from Ctrl+I, as that currently maps to what I call the "indenter servant" -- the SAS code formatter. (But hey, you can change that key assignment too!)

tags: abbreviations, SAS Enterprise Guide, SAS program editor, SAS programming, SAS tips