sas programming

1月 052013
 

Many of my blogging colleagues are taking this week to reflect on their top posts in 2012. Some are using the visitor statistics to rank the posts, but as Andy Ratcliffe points out, that gives short-shrift to the late-season articles.

This year, I'm going to use this space to recap a few of my favorite topics, grouped by theme. These are the topics that generated lots of interest, or perhaps they haven't yet generated the interest that I think they deserve -- so I'll plug them again.

New tools to improve your SAS Enterprise Guide life

I'll lead with my favorites - new custom tasks! They are easy to download, install, and use. Some of these might be The Missing Feature that you're looking for, so it's worth your time to check them out.

  • Copy Files task (for FTP-like file transfers). It supports wildcard notation and macro expressions, providing lots of flexibility.
  • Project Reviewer task - shows all of the tasks in your project, grouped by process flow. It summarizes the time elapsed by each task and each process flow, who last modified the tasks and when, and more. And you can automatically include this as a report within your project.
  • SAS Options Viewer - a SAS Enterprise Guide version of the OPTIONS window, showing the current values for all of your SAS system options.
  • SAS Macro Variable Viewer - a window to view all of your SAS macro variables. Also includes a tool to evaluate macro expressions instantly.
  • Search your project files with the EGP-Project-Search-inator.

64-bit SAS and 64-bit Windows

Many people are now using 64-bit versions of Windows, and they've updated their SAS installation to a 64-bit version to go with it. Was it all smooth sailing? Not quite -- here are the top 64-bit "gotchas".
Also see these related posts:

Fun for tinkerers: Windows PowerShell

I had a lot of fun playing with Windows PowerShell this year and figuring out how to integrate it with SAS. Here are a few of my favorite posts on the topic:

Cool SAS programming tips

In my job I write and maintain lots of SAS programs, so I'm always finding new ways to do things. That's a great thing about SAS: it's so huge that you can never know everything about it, so you'll always have something new to learn. Here are a few of the tips that I learned and/or shared over the past year:

With my new book completed and available, I hope to have time to learn and share even more in 2013. I've already got a couple of SAS Global Forum papers lined up; those are sure to provide me with some material very soon!

tags: 64-bit, PowerShell, SAS custom tasks, SAS programming
12月 182012
 

I used "Dropbox" in the title for this post, but these techniques can be used for other cloud-based file sharing services, such as GitHub and Google Drive.

Using PROC HTTP (added in SAS 9.2), you can easily access any "cloud-based" file as long as you have a public link to reach it. I'm a cloudy sort of guy, so I've got profiles on Dropbox, Google Drive and Github. I've shared a few sample data sources on these accounts so that you can try these techniques for yourself.

Here's one working example to get you started. (In all of my examples, I included a placeholder for PROXYHOST=, which you need only if your Internet connection is routed through a gateway proxy. That's common for corporate intranet connections, less common for schools and homes.)

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

There are a few additional considerations for accessing cloud-based data.

You need the direct download link for the file


When you click "Share Link" on a service like Dropbox, you are provided with a browser-friendly link that you can give to your friends or colleagues. When they visit that link, they are presented with a simple view of the file name and a Download button.

Here's the Share Link for my sample data set:

https://www.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

I can't use that link in my SAS program because PROC HTTP isn't going to "click" on the Download button for me. Instead, I need the link that results from clicking on the Download button, which I can get by substituting the dl.dropbox.com domain instead of www.dropbox.com in the address:

https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

Note: You should assume that URL schemes for the "direct download" links can be subject to change, and of course they vary based on the cloud service provider. Today, the "dl.dropbox.com" scheme works for Dropbox.

A cloud-based folder is NOT a cloud-based path for a SAS library

Dropbox allows you to share a link to a folder that contains multiple files. As tempting as it might be to try, you cannot treat a cloud folder as a "path" for a SAS library. Did you notice the FILENAME statement in my first code example? I'm providing a local file destination for the download operation. PROC HTTP is my method to copy the file that I need into my SAS session.

If I have multiple files to fetch, I'll need to repeat those steps for each file, using its direct-download URL.

Your file might require additional processing before you can use it

My earlier program example works by copying a SAS7BDAT file (a SAS data set) into my SAS WORK folder. Because it's a data set, my SAS session recognizes as a valid data member. I was lucky because the data set encoding was compatible with my SAS session.

Sometimes the data might not be quite ready for you to use directly. If a colleague shares a file in a CSV or Excel format, you will need to run a PROC IMPORT step before you can use it. For convenience, a colleague might use PROC CPORT to save multiple SAS data sets into a single transport file. You'll need to run it through PROC CIMPORT before you can use it.

Here's an example that uses a CPORT file that I created with the CARS and CLASS sample data sets:

filename _inbox TEMP;
 
proc http method="get" 
 url="https://dl.dropbox.com/s/5uehelsok9oslgv/sample.cpo" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
 
proc cimport file=_inbox lib=work;
run;
filename _inbox clear;

From the SAS log:

NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.91 seconds
      cpu time            0.00 seconds

NOTE: PROC CIMPORT begins to create/update data set WORK.CARS
NOTE: Data set contains 15 variables and 428 observations. 
      Logical record length is 152

NOTE: PROC CIMPORT begins to create/update data set WORK.CLASS
NOTE: Data set contains 5 variables and 19 observations. 
      Logical record length is 40

You can %include SAS programs from the cloud

It might not be the wisest move to blindly access a cloud-based SAS program and execute it with your SAS session, but you can do it. And because SAS programs can contain data records (in the form of CARDS or DATALINES), this is another way to share data in a portable way.

This program downloads and runs Rick Wicklin's "Christmas Tree" challenge, which he posted on The DO Loop blog last week.

filename _inbox "%sysfunc(getoption(work))/ifsxmas.sas";
proc http method="get" 
 url="https://dl.dropbox.com/s/g1hat0woohud9yc/IFSChristmasTree.sas" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
%include _inbox;
filename _inbox clear;

Working with Google Drive and GitHub

These techniques also work with the other popular services, but the download links are different.

For files stored on GitHub, you need to find the direct link to the "raw" file in your repository. Here's a working example:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://github.com/cjdinger/proc_http_example/raw/master/data/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

For Google Drive, you must first select to make the file or folder public, or at least reachable by "anyone who has the link". By default, your Google Drive content is private to your account.

The "Share Link" is more cryptic than some of the other services. The important part is the id= portion of the URL, which allows you to form the direct link like this:

https://docs.google.com/uc?export=download&id=<cryptic_id_from_Google_Drive>

Here's my SAS-enabled example. Note the use of the %str() function to avoid complaints about trying to resolve the ampersand portion of the URL:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://docs.google.com/uc?export=download%str(&)id=0BwSh7LOTCPQ5MmlJZFdOOFJhOHc" 
 out=_inbox 
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;
tags: cloud, Dropbox, github, Google Drive, PROC HTTP, SAS programming
12月 112012
 
There’s a new video series launching in the SAS Education department unlike anything that’s ever been done before.  It’s called Stat Wars. Here’s how it works… two SAS instructors go head-to-head and battle it out over a statistics challenge. (So far no injuries have been reported.)  It’s educational and engaging, [...]
11月 122012
 

I was recently asked, "Does SAS support computing inverse hyperbolic trigonometric functions?"

I was pretty sure that I had used the inverse hyperbolic trig functions in SAS, so I was surprised when I read the next sentence: "I ask because I saw a Usage Note that says these functions are not supported."

Sure enough, I followed the link and found an old Usage Note (which references a SAS newsletter article from 1993!) that states that SAS does not support the arc-hyperbolic trig functions!

Well, the Usage Note was correct when it was written many years ago, but it is wrong now. SAS added support for the inverse hyperbolic functions in SAS 9.2. Until I can get someone to revise that Usage Note, here's a quick list of the inverse hyperbolic functions in SAS:

  • The ARCOSH Function computes the inverse hyperbolic cosine
  • The ARSINH Function computes the inverse hyperbolic sine
  • The ARTANH Function computes the inverse hyperbolic tangent
  • The inverse hyperbolic secant at x is computed as ARCOSH(1/x)
  • The inverse hyperbolic cosecant at x is computed as ARSINH(1/x)
  • The inverse hyperbolic cotangent at x is computed as ARTANH(1/x)

And remember that you can call any Base SAS function from the SAS/IML language and pass in a vector of values. So, for example, you can evaluate the arc-hyperbolic tangent at uniformly spaced points in it's domain by using the following statements:

proc iml;
x = do(-0.99, 0.99, 0.01);
y = artanh(x);

Notice that SAS does not have a function for the arc-hyperbolic secant, cosecant, or cotangent. Althoug, to be safe, I should say that it doesn't have those functions in SAs 9.3. In 10 years, who knows?

tags: Getting Started, SAS Programming
11月 122012
 

I was recently asked, "Does SAS support computing inverse hyperbolic trigonometric functions?"

I was pretty sure that I had used the inverse hyperbolic trig functions in SAS, so I was surprised when I read the next sentence: "I ask because I saw a Usage Note that says these functions are not supported." [Editor's Note 16NOV2012: SAS Technical support has updated the Usage Note. Hurray!]

Sure enough, I followed the link and found an old Usage Note (which references a SAS newsletter article from 1993!) that states that SAS does not support the arc-hyperbolic trig functions!

Well, the Usage Note was correct when it was written many years ago, but it is wrong now. SAS added support for the inverse hyperbolic functions in SAS 9.2. Until I can get someone to revise that Usage Note, here's a quick list of the inverse hyperbolic functions in SAS:

  • The ARCOSH Function computes the inverse hyperbolic cosine
  • The ARSINH Function computes the inverse hyperbolic sine
  • The ARTANH Function computes the inverse hyperbolic tangent
  • The inverse hyperbolic secant at x is computed as ARCOSH(1/x)
  • The inverse hyperbolic cosecant at x is computed as ARSINH(1/x)
  • The inverse hyperbolic cotangent at x is computed as ARTANH(1/x)

And remember that you can call any Base SAS function from the SAS/IML language and pass in a vector of values. So, for example, you can evaluate the arc-hyperbolic tangent at uniformly spaced points in it's domain by using the following statements:

proc iml;
x = do(-0.99, 0.99, 0.01);
y = artanh(x);

Notice that SAS does not have a function for the arc-hyperbolic secant, cosecant, or cotangent. Although, to be safe, I should say that it doesn't have those functions in SAs 9.3. In 10 years, who knows?

tags: Getting Started, SAS Programming
11月 102012
 

We are careening towards the holiday season, and this year more than ever before it's going to mean one thing: Online Shopping. As you enter your credit card number over and over to complete your many purchases, you might occasionally transpose or mistype your account number. Does it surprise you when the online form immediately informs you that you've entered an invalid number? Before you even click the Complete Purchase button? How does it know the number is invalid? Is it checking against a list of known good numbers? For those concerned about online privacy, that's a disturbing thought.

No, of course these sites don't have a database of "good" credit card numbers that they use to validate your number. Instead, they simply use math. Credit card numbers follow a standard that allows their "syntax" to be validated by the Luhn algorithm, which combines a checksum approach followed by a "modulus 10" operation to validate the sequence of digits. A couple of years ago, Heuristic Andrew shared a SAS DATA step approach for validating credit card numbers.

National Provider IDs (NPIs), which are used to identify health care providers in the USA, use a similar approach. NPIs are 10 digits in length, and you can use the Luhn algorithm to distinguish a good provider ID from a bad one (where we're qualifying the ID, not the actual provider). The 10-digit ID value isn't quite enough information for the validation; in order for the math to work, you must prefix the ID with this constant string of digits: '80840'.

I can tell from reading SAS-L that there are lots of SAS professionals who work with healthcare data. Validating the NPI value can be a good first-line approach for detecting certain data entry errors.

Adapting Andrew's implementation of the Luhn algorithm, I created a SAS function that can tell you whether a given NPI is valid. I named it isValidNpi(npi_value), and it returns a 0 (invalid) or 1 (valid). It's simple to use:

/* my FCMP function library */
options cmplib=work.myfuncs;
 
data test;
  input npi $ 1-10;
  valid_npi = isValidNpi(npi);
  /* 4 valid NPIs followed by 5 invalid IDs */
  datalines;
1003802901
1003864232
1013955343
1134108814
0000000000
1234567890
15280060
152
10038029O1
run;

Here's the result:

The remainder of this post is the complete FCMP source code for the function. I'm a bit of a newbie when it comes to understanding healthcare data, so I'd appreciate any feedback you might have about NPIs, validation, and this approach in general. Leave a note in the comments.

proc fcmp outlib=work.myfuncs.npi;
  /***************************************************************/
  /* Validates that a National Provider ID (NPI) value is valid  */
  /* Checks only that it is valid in terms of its form, not that */
  /* it corresponds to a real provider.                          */
  /* Uses the Luhn algorithm and a prefix of '80840'             */
  /* More:                                                       */
  /*   http://en.wikipedia.org/wiki/Luhn_algorithm               */
  /*   http://en.wikipedia.org/wiki/National_Provider_Identifier */
  /* RETURNS: 1 if valid, 0 if not valid                         */
  /***************************************************************/
  /* hat tip: Heuristic Andrew */
  function isValidNpi(npi $);
    length AddIt $2 ChkStr $ 15 ChkSum 8;
    /* check length for NPI */
    if length(trim(npi)) ^= 10 then
      do;
        /* put 'Not 10 digits ' npi=; */
        return(0);
      end;
    /* check that all are digits */
    else if (prxmatch('/\D/',trim(npi)) > 0) then
      do;
        /* put 'invalid characters ' npi=; */
        return(0);
      end;
    else
      do;
        /* Luhn's algorithm (also called modulus 10 or mod 10) */
        ChkSum = 0;
        ChkStr=reverse(cats('80840',npi));
        do pos=1 to length(ChkStr);
          if mod(pos,2) then /* odd positions */
            do;
              AddIt=substr(ChkStr,pos,1);
            end;
          else /* even positions: digit*2 */
          do;
            AddIt=put(2*input(substr(ChkStr,pos,1),2.),2.);
          end;
          /* add digits */
          do i=1 to length(AddIt);
            ChkSum+input(substr(AddIt,i,1),2.);
          end;
        end;
        /* Check if ID is valid or not (if ChkSum ends with Zero) */
        if mod(ChkSum,10)=0 then
          do;
            /* put 'This is a valid ID: ' npi= ChkSum=; */
            return (1);
          end;
        else if mod(ChkSum,10) ne 0 then
          do;
            /* put 'This is an invalid ID: ' npi= ChkSum=; */
            return (0);
          end;
      end;
  endsub;
run;
quit;
options cmplib=work.myfuncs;
tags: healthcare, luhn algorithm, NPI, SAS programming
10月 292012
 

I was looking at some SAS documentation when I saw a Base SAS function that I never knew existed. The NWKDOM function returns the date for the nth occurrence of a weekday for the specified month and year. I surely could have used that function last spring when I blogged about how to compute the onset and end of daylight saving time (DST) in SAS!

As any handyman will tell you, having the right tool makes a job so much easier. Instead of nine complicated lines of SAS DATA step code, my new algorithm consists of two calls to the NWKDOM function! In the US, DST ends on the first Sunday in November. Here's all you need to do to figure out that date in 2012: nwkdom(1, 1, 11, 2012). The first argument specifies the week (first=1). The second argument specifies the day of the week as a number 1–7 (Sunday=1). The third argument specifies the month as a number 1–12 (November=11). The last argument is the year.

Therefore, the following DATA step code computes the beginning and end of DST for several years:

/* compute the beginning and end of daylight saving time for several years */
data DST;
format dst_beg dst_end DATE5.;
do year=2012 to 2022;
   dst_beg = nwkdom(2, 1, 3, year);/*DST begins 2nd Sun in March */
   dst_end = nwkdom(1, 1,11, year);/*DST ends 1st Sun in Nov */
   output;
end;
run;
 
proc print noobs; 
var year dst_beg dst_end;
run;

With the right tools, every task is easier.

tags: SAS Programming