sas programming

9月 182013
 

A couple of years ago I shared a method for copying any file within a SAS program. It was a simple approach, copying the file byte-by-byte from one fileref (SAS file reference) to another.

My colleague Bruno Müller, a SAS trainer in Switzerland, has since provided a much more robust method. Bruno's method has several advantages:

  • It's coded as a SAS macro, so it is simple to reuse -- similar to a function.
  • It copies the file content in chunks rather than byte-by-byte, so it's more efficient.
  • It provides good error checks and reports any errors and useful diagnostics to the SAS log.
  • It's an excellent example of a well-documented SAS program!

Bruno tells me that "copying files" within a SAS program -- especially from nontraditional file systems such as Web sites -- is a common need among his SAS students. I asked Bruno for his permission to share his solution here, and he agreed.

To use the macro, you simply define two filerefs: _bcin (source) and _bcout (target), then call the %binaryFileCopy() macro. Here is an example use that copies a file from my Dropbox account:

filename _bcin TEMP;
filename _bcout "C:\temp\streaming.sas7bdat";
proc http method="get" 
 url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" 
 out=_bcin
;
run;
 
%binaryFileCopy()
%put NOTE: _bcrc=&_bcrc;
 
filename _bcin clear;
filename _bcout clear;

The following is partial log output from the program:

NOTE: BINARYFILECOPY start  17SEP2013:20:50:33
NOTE: BINARYFILECOPY infile=_bcin C:\SASTempFiles\_TD5888\#LN00066
NOTE: BINARYFILECOPY outfile=_bcout C:\temp\streaming.sas7bdat

NOTE: BINARYFILECOPY processed 525312 bytes
NOTE: DATA statement used (Total process time):
      real time           0.20 seconds
      cpu time            0.07 seconds    

NOTE: BINARYFILECOPY end  17SEP2013:20:50:34
NOTE: BINARYFILECOPY processtime 00:00:00.344

You can download the program -- which should work with SAS 9.2 and later -- from here: binaryfilecopy.sas

Update: using FCOPY in SAS 9.4

Updated: 18Sep2013
Within hours of my posting here, Vince DelGobbo reminded me about the new FCOPY function SAS 9.4. With two filerefs assigned to binary-formatted files, you can use FCOPY to copy the content from one to the other. When I first tried it with my examples, I had problems because of the way FCOPY treats logical record lengths. However, Jason Secosky (the developer for FCOPY and tons of other SAS functions) told me that if I use RECFM=N on each FILENAME statement, the LRECL would not be a problem. And of course, he was correct.

Here's my example revisited:

filename _bcin TEMP recfm=n /* RECFM=N needed for a binary copy */;
filename _bcout "C:\temp\streaming.sas7bdat" recfm=n;
 
proc http method="get" 
 url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" 
 out=_bcin
;
run;
 
data _null_;
   length msg $ 384;
   rc=fcopy('_bcin', '_bcout');
   if rc=0 then
      put 'Copied _bcin to _bcout.';
   else do;
      msg=sysmsg();
      put rc= msg=;
   end;
run;
 
filename _bcin clear;
filename _bcout clear;
tags: Copy Files, FCOPY, macro programming, SAS 9.4, SAS programming
9月 052013
 

Last week I presented two talks at the University of Wisconsin at Milwaukee, which has established a new Graduate Certificate in Applied Data Analysis Using SAS. While in Milwaukee, I ran into an old friend: the ODS LISTING destination.

One of my presentations was a hands-on workshop titled Getting Started with the SAS/IML Language. In the UW-M computer lab, the students used SAS/IML Studio to run the exercises. I noticed that the student output was produced in the ODS LISTING destination, whereas my laptop was generating output for the HTML destination. That is, in the following screen capture, I was generating the output on the right side, whereas the student output looked like the left side (click to enlarge the image):

As I wandered through the lab, watching the students complete the exercises, I realized that I have grown accustomed to the HTML destination. HTML became the default ODS destination for the SAS Windowing environment in SAS 9.3. SAS/IML Studio made HTML the default destination in SAS 9.3m2, which shipped in August 2012. Thus I have been seeing HTML output exclusively for about a year.

I now prefer the HTML output, but when SAS 9.3 changed the default destination from LISTING to HTML, I had mixed feelings. The LISTING destination was an old friend, and I didn't want to see it move away. We had had good times together through the years.

However, I embraced the change. I did not override the new default when I installed SAS 9.3, and I gritted through the first few weeks of working with the HTML output. I discovered several benefits to the HTML destination, including the fact that that HTML output is "infinitely wide," and is therefore valuable when working with large matrices or wide tables. No more worrying about matrices wrapping when the output is wider than the LINESIZE option!

As I looked at the student output in the computer lab, I realized that I have made a new friend: the HTML destination. I like having it around when I work. I enjoy its beautiful tables and its integrated and interlaced ODS graphics.

When I encountered my old friend, the LISTING destination, in Milwaukee, I got the same feeling that I get when I play a classic video game like Pong, Space Invaders, or Asteroids: I briefly enjoy the nostalgic experience, but I realize that newer technology makes for a more enjoyable overall experience.

What is your default ODS destination in SAS? Are you still clinging to the LISTING destination? Have you converted to using HTML output? Why or why not? Share your story in the comments.

tags: SAS Programming
8月 262013
 

Recently I wrote about how to determine the age of your SAS release. Experienced SAS programmers know that you can programatically determine information about your SAS release by using certain automatic macro variables that SAS provides:

  • SYSVER: contains the major and minor version of the SAS release
  • SYSVLONG: contains the information in SYSVER, and information about the maintenance release
  • SYSVLONG4: contains the information in SYSVLONG, and the year of release

For example, the following DATA step displays information about the SAS release. The results shown are for the second maintenance release of SAS 9.3.

data _NULL_;
%put SYSVER = &SYSVER;
%put SYSVLONG = &SYSVLONG;
%put SYSVLONG4 = &SYSVLONG4;
run;
SYSVER = 9.3
SYSVLONG = 9.03.01M2D082312
SYSVLONG4 = 9.03.01M2D08232012

These macro variables are usually used in macro code to conditionally include code (see the %INCLUDE statement) or to control the flow of execution through a macro, such as in the following example:

%if %sysevalf(&sysver < 9) %then %do;
   %put SAS 9.0 or later is required.  Terminating.;
   %goto exit;
%end;

Recently I wrote a SAS/IML function that decomposes the SYSVLONG macro into its components. You can write similar code for the SAS DATA step. The following program uses the FIND function and the SUBSTR function to parse and extract relevant information about a SAS release. If you ever have the need to extract details from the SYSVLONG macro variable, you might find this function helpful.

proc iml;
/* Helper function that returns information about the current SAS system release.
   This function decomposes the SYSVLONG system macro variable and returns four 
   numbers that are associated with the version.
*/
start GetSASVersion( major, minor, iteration, maint );
   sysvlong = symget("SYSVLONG");                /* system macro variable */
   pos1 = find(sysvlong, ".");
   major = substr(sysvlong, 1, pos1-1);          /* major version */
   major = num(major);                           /* convert to numeric */
 
   pos2 = find(sysvlong, ".", 'i', pos1+1);
   minor = substr(sysvlong, pos1+1, pos2-pos1-1);/* minor version */
   minor = num(minor);
 
   pos3 = find(sysvlong, "M", 'i', pos2+1);
   iteration = substr(sysvlong, pos2+1, pos3-pos2-1);/* iteration version */
   iteration = num(iteration);
 
   pos4 = notdigit(sysvlong, pos3+1);
   maint = substr(sysvlong, pos3+1, pos4-pos3-1);   /* maintenance level */
   maint = num(maint);
finish;
 
/* test it by running code on SAS 9.3m2 (SAS/IML 12.1) */
run GetSASVersion( major, minor, iteration, maint );
v =  major || minor || iteration || maint;
print v[colname={"major" "minor" "iteration" "maint"} 
        label="Results for SAS 9.3m2"];
 
b = ( major<9 ) 
  | ( major=9 & minor<3 )
  | ( major=9 & minor=3 & iteration<1 )
  | ( major=9 & minor=3 & iteration=1 & maint<=2 );
if b then print "SAS 9.3m2 or earlier"; 
else      print "After SAS 9.3m2";
tags: Getting Started, SAS Programming
8月 192013
 

Even the best programmers make mistakes. For most errors, SAS software displays the nature and location of the error, returns control to the programmer, and awaits further instructions. However, there are a handful of insidious errors that cause SAS to think that a statement or program is not finished. For these errors, SAS doesn't display the error because it is waiting for the programmer to finish submitting the rest of the statement. Meanwhile, the programmer (who is unaware that an error has occurred) is waiting for SAS to respond. From the programmer's point of view, SAS is frozen. It has gone off into La-La Land, or maybe the Twilight Zone.

Fortunately, there is a simple "magic command" that fixes them all of these common errors. The common errors that render SAS unresponsive are as follows:

  • The forgotten semicolon: If the last statement in a program does not contain a terminating semicolon, SAS thinks that the program is not finished. It waits to receive the rest of the statement. Without a terminating semicolon, SAS will wait, and wait, and wait....
    y = 1        /* No semicolon, so statement not complete */
  • The forgotten closing single quote: If your program starts a string but forgets to end it, SAS thinks you are in the process of defining a string. You can submit statements such as QUIT and ENDSAS, but SAS thinks these statements are just part of the string and does not execute them.
    c = 'My string;     /* No closing quote. Future stmts are part of string */
    run;                * Hey! SAS is frozen! ;
    endsas;             * Argh! Nothing works! ;
    As shown above, you can detect this error visually if you are using a program editor in which syntax is color-coded. For example, in the SAS enhanced editor, all characters after the equal sign are colored purple, which indicates that SAS thinks they are all part of a string. Also, after the character string exceeds 256 characters, SAS writes a helpful warning to the SAS Log:
    WARNING: The quoted string currently being processed has become
             more than 262 characters long.  You might have
             unbalanced quotation marks.
  • The forgotten closing double quote: Same issue as for the forgotten single quote.
  • The forgotten closing comment: You started a comment, but haven't closed it with */. No matter what text you submit, SAS thinks it is part of the comment.
    c = 'My string';    /* Program is complete
    run;                * Hey! SAS is frozen! ;
    endsas;             * Argh! Nothing works! ;
    Again, if you use a color-coded program editor, you ought to be able to detect this error visually. In the SAS enhanced editor, you will notice that your statements are green.

There is a "magic command" that you can submit that will recover from all four errors:

;*';*";*/;

If you have used SAS Enterprise Guide, you've probably seen this special statement (also called the "magic string" or the "quote killer") appended to the end of submitted programs. It is used by many client applications to ensure that the SAS server terminates and produces results such as ODS tables and graphics. I don't know who originally invented the magic command, but let's look at what it does:

  • If the submitted program is already properly terminated (none of the errors are present), the command issues a null statement (the first character) and a comment (the remaining characters).
  • If the submitted program forgot a concluding semicolon, the command terminates the previous statement (the first character) and issues a comment (the remaining characters).
  • If the submitted program forgot to close a single-quote string, the command terminates the string (the third character) and issues a comment (the remaining characters).
  • If the submitted program forgot to close a double-quote string, the command terminates the string (the sixth character) and issues a comment (the remaining characters).
  • If the submitted program is missing a closing comment symbol, the command closes the comment (the eighth and ninth characters) and issues a null statement (the last character).

In all cases, the magic command causes SAS to escape from La-La Land and returns control to the programmer.

A forgotten RUN or QUIT statement is another error that can cause SAS to be unresponsive. For most procedures, SAS parses the statements in a program, but does not execute them until it encounters a RUN or QUIT statement. (Exceptions include some interactive procedures such as the IML and SQL procedures.) This kind of programming error is obviously fixed by submitting a QUIT or RUN statement. (Some programmers use the RUN CANCEL statement to abort a submitted DATA step.) Consequently, some programmers might want to modify the magic string as follows:

;*';*";*/;quit;

Again, this version of the magic command is used by many SAS client applications, including EG. It looks mysterious the first time you see it, but after you dissect it, it makes perfect sense. If you have ever asked "what is the purpose of the statement at the end of SAS Enterprise Guide programs," now you know!

Do you have a debugging tip that you use to overcome an insidious error? What do you do to regain control when your SAS program contains an error that locks-up your computer? Leave a comment.

tags: Getting Started, SAS Programming
8月 162013
 
Occasionally, people ask me what is the best thing about writing a book. Is it the notoriety you get from being a SAS Press author? Fame is always pleasant. Is it the money you make from the advance and the royalties?  Money is always useful. Is it displaying technical expertise [...]
8月 152013
 

In SAS 9.4, the SAS programming language continues add new features by the truckload. I've already discussed PROC DELETE (which is actually an old feature, but like an 80s hit song it's now back with a better version).

In this SAS Tech Talk video from SAS Global Forum 2013, I talked with Rick Langston about the advancements in the SAS programming language. Rick has been with SAS for...well, a long time. He's considered to be the steward of the SAS programming language. In this session, Rick discusses the process that we use to add new syntax to the language and to ensure its integrity.

 
Rick also talks about three specific new features in 9.4, all of which were added because customers asked for them. (It's difficult to read the Rick's syntax examples in the video, so I've included reference links below so that you can learn more.)

FILENAME ZIP access method

This brings the ability to read and write compressed ZIP files directly into the SAS language. For more information, see the FILENAME ZIP documentation. If you don't have SAS 9.4, you can still create ZIP files using ODS PACKAGE.

DOSUBL function

Rick calls this "submitting SAS code on the side", as it allows you to run a SAS step or statement from "inside" a currently running step. You can learn more from the DOSUBL function reference, or from this SAS Global Forum paper. I've also written a post with a specific example in SAS Enterprise Guide.

LOCKDOWN system option and statement

This one will excite SAS administrators. You can set the LOCKDOWN system option in a batch SAS session or SAS Workspace server to limit some of the "dangerous" functions of SAS and, more importantly, limit the file areas in which the SAS session will operate. We don't currently have a documentation link for this, so I'll dive in a bit further in a future blog post.

That's just a small taste of what's new. Be sure to check out the complete What's New in SAS 9.4 document for even more goodies.

tags: SAS 9.4, SAS programming, sasgf13, Tech Talk
8月 062013
 
With the pervasiveness of mobile devices, being able to read while “on the go” has been easier than ever. How many times have you found yourself in a situation where you pass the time waiting by reading something on your phone/iPad/tablet etc? With eBooks on my iPad, I find that [...]
7月 082013
 

Every programming language has an IF-THEN statement that branches according to whether a Boolean expression is true or false. In SAS, the IF-THEN (or IF-THEN/ELSE) statement evaluates an expression and braches according to whether the expression is nonzero (true) or zero (false). The basic syntax is

if numeric-expression then
   do-computation;
else
   do-alternative-computation;

One of the interesting features of the SAS language is that it is designed to handle missing values. This brings up the question: What happens if SAS encounters a missing value in an IF-THEN expression? Does the IF-THEN expression treat the missing value as "true" and execute the THEN statement, or does it treat the missing value as "false" and execute the alternative ELSE statement (if it exists)?

The answer is fully documented, but let's run an example to demonstrate the SAS behavior:

data A;
input x @@;
if x then Expr="True "; 
     else Expr="False";
datalines;
1 0 .
;
 
proc print noobs; run;

Ah-ha! SAS interprets a missing value as "false." More correctly, here is an excerpt from the SAS documentation:

SAS evaluates the expression in an IF-THEN statement to produce a result that is either non-zero, zero, or missing. A non-zero and nonmissing result causes the expression to be true; a result of zero or missing causes the expression to be false.

This treatment of missing values is handled consistently by other SAS languages and in other conditional statements. For example, the CHOOSE function in the SAS/IML language is a vector alternative to the IF-THEN/ELSE statement, but it handles missing values by using the same rules:

proc iml;
x  = {1, 0, .};
Expr = choose(x,"True","False");
print x Expr;

The output is identical to the previous output from the DATA step and PROC PRINT.

If you do not want missing values to be treated as "false," then do not reference a variable directly, but instead use a Boolean expression in the IF-THEN statement. For example, in the following statement a missing value results in the THEN statement being executed, whereas all other numerical values continue to behave as expected:

if x^=0 then ...;

Have you encountered places in SAS where missing values are handled in a surprising way? Post your favorite example in the comments.

tags: Getting Started, SAS Programming
7月 032013
 

When I work on SAS projects that create lots of files as results, it's often a requirement that those files be organized in a certain folder structure. The exact structure depends on the project, but here's an example:

/results
   |__ html
       |__ images
   |__ xls
   |__ data

Before you can have SAS populate these file folders, the folders have to actually exist. Traditionally, SAS programmers have handled this by doing one of the following:

  • Simply require that the folders exist before you run through the project. (This is the SEP method: Somebody Else's Problem.)
  • Use SAS statements and shell commands (via SYSTASK or other method) to create the folders as needed. The SAS-related archives are full of examples of this. It can get complex when you have to account for operating system differences, and whether operating system commands are even permitted (NOXCMD system option).

In SAS 9.3 there is a new system option that simplifies this: DLCREATEDIR. When this option is in effect, a LIBNAME statement that points to a non-existent folder will take matters into its own hands and create that folder.

Here's a simple example, along with the log messages:

options dlcreatedir;
libname newdir "/u/sascrh/brand_new_folder";

NOTE: Library NEWDIR was created.
NOTE: Libref NEWDIR was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: /u/sascrh/brand_new_folder

You might be thinking, "Hey, SAS libraries are for data, not for other junk like ODS results." Listen: we've just tricked the LIBNAME statement into making a folder for you -- you can use it for whatever you want. I won't tell.

In order to create a series of nested folders, you'll have to create each folder level in top-down order. For example, if you need a "results" and a "results/images" folder, you can do this:

%let outdir=%sysfunc(getoption(work));
/* create a results folder in the WORK area, with images subfolder */
options dlcreatedir;
libname res "&outdir./results";
libname img "&outdir./results/images";
/* clear the librefs - don't need them */
libname res clear;
libname img clear;

Or (and this is a neat trick) you can use a single concatenated LIBNAME statement to do the job:

libname res ("&outdir./results", "&outdir./results/images");
libname res clear;

NOTE: Libref RES was successfully assigned as follows: 
      Levels:           2
      Engine(1):        V9 
      Physical Name(1): /saswork/SAS_workC1960000554D_gsf0/results
      Engine(2):        V9 
      Physical Name(2): /saswork/SAS_workC1960000554D_gsf0/results/images

If you feel that folder creation is best left to the card-carrying professionals, don't worry! It is possible for a SAS admin to restrict use of the DLCREATEDIR option. That means that an admin can set the option (perhaps to NODLCREATEDIR to prohibit willy-nilly folder-making) and prevent end users from changing it. Just let them try, and they'll see:

13         options dlcreatedir;
                   ___________
                   36
WARNING 36-12: SAS option DLCREATEDIR is restricted by your Site 
Administrator and cannot be updated.

That's right -- DENIED! Mordac the Preventer would be proud. Job security: achieved!

Exact documentation for how to establish Restricted Options can be a challenge to find. You'll find it within the Configuration Guides for each platform in the Install Center. Here are quick links for SAS 9.3: Windows x64, Windows x86, and UNIX.

tags: DLCREATEDIR, Restricted Options, sas administration, SAS libraries, SAS programming, SAS tips
6月 192013
 

I am not a big fan of the macro language, and I try to avoid it when I write SAS/IML programs. I find that the programs with many macros are hard to read and debug. Furthermore, the SAS/IML language supports loops and indexing, so many macro constructs can be replaced by standard SAS/IML syntax.

Nevertheless, many SAS customers use macro constructs as part of their daily SAS programming tasks, and that practice often continues when they write SAS/IML programmers. A customer recently asked a question about the macro language that required knowledge of the way that macro variables are handled within a SAS/IML loop. This post shares my response.

Here's the crux of the customer's question. Run the following SAS/IML program and see if you can understand why it behaves as it does:

proc iml;
i = 7;
call symputx("j", i);    /* 1. Put value of i into macro variable j */
y1 = &j;                 /* 2. Assign y1 the value of &j            */
print y1;                /* success! */
 
y = j(1,4,.);
do i = 1 to ncol(y);     /* 3. Start processing the DO block of statements */
   call symputx("j", i); /* 4. Put value of i into macro variable j */
   y[i] = &j;            /* 5. Hmmmm, what does this do inside the loop? */
end;
print y;                 /* Not what you might expect? */

As you can see from the output, the first use of the macro variable (outside the DO loop), works as expected. But the second does not. The customer wanted to know why the elements of y are not set to 1, 2, 3, 4 within the loop.

The key point to remember about macro variables is that SAS code never sees them. Macro variables are evaluated by the macro preprocessor at parse time, not at run time. The SAS/IML code never sees &j, only the constant value that the preprocessor substitutes for &j.

It is also important to remember that PROC IML is an interactive procedure. (The "I" in IML stands for interactive!) Each statement or block of statements is parsed as it is encountered, as opposed to the DATA step, which parses the entire program before beginning execution.

Let's examine the program step-by-step to understand why the first construct works but the second does not. The following steps refer to the numbers in the program comments:

  1. The value of the SAS/IML scalar i is copied (as text) into the macro variable j.
  2. The statement is encountered. The value of the macro variable j is substituted by the macro preprocesser. Then the statement is executed. The SAS/IML variable y1 is assigned to the value 7.
  3. A DO loop is encountered by the SAS/IML parser. The parser finds the matching END statement and proceeds to parse the entire body of the loop in order to check for syntax errors. This parsing phase occurs exactly one time. Because the block of statements contain a macro variable, the macro preprocessor substitutes the value of the macro variable j, which is 7.
  4. For each iteration, the value of the SAS/IML scalar i is copied (as text) into the macro variable j.
  5. For each iteration, the ith element of the y vector is assigned the value 7. In particular, this statement does not contain a reference to the macro varible j.

To the casual reader of the program, it looks like &j will have a different value during each step of the iteration. But but it doesn't. The expression &j is resolved at parse time. SAS/IML parses the entire body of the DO loop once, before any execution occurs, and at parse time the expression &j is 7.

There is a way to get what the customer wants. The SYMGET function retrieves the value of a macro variable at run time. Therefore the following statements fill the vector y with the values 1 through 4:

do i = 1 to ncol(y);
   call symputx("j", i);
   y[i] = num(symget("j"));  /* get macro value at run time */
end;
print y;                     /* Yes! This is what we want! */

For me, this blog post emphasizes three facts:

  • Always remember that macro substitution is done by a preprocessor, which operates at parse time.
  • The SAS/IML language parses an entire block of statements (between the DO and END statements) one time before executing the block.
  • Mixing macro code and SAS/IML statements can be confusing and hard to debug. When you have the option, use SAS/IML language features instead of relying on macro language constructs.
tags: SAS Programming