sas programming

7月 112011
 
SAS Enterprise Guide sets values for several useful SAS macro variables when it connects to a SAS session, including one macro variable, &_CLIENTPROJECTPATH, that contains the name and path of the current SAS Enterprise Guide project file.

(To learn about this and other macro variables that SAS Enterprise Guide assigns, look in Help->SAS Enterprise Guide help, and search the Index for "macro variables".)

If your SAS program knows the path of your project file, then you can use that information to make your project more "portable", and reference data resources using paths that are relative to the project location, instead of having to hard-code absolute paths into your program.

For example, I've been working with some data from DonorsChoose.org, and I've captured that work in a project file. After saving the project, I can easily get the name of the project file by checking the value of &_CLIENTPROJECTPATH:


15    %put &_clientprojectpath;
'C:\DataSources\DonorsChoose\DonorsChoose.egp'
 
If you can get the full path of the project file, then you can build that information into a SAS program so that as you move the project file and its "assets" (such as data), you don't need to make changes to the project to accommodate a new project folder "home". Here is the bit of code that takes the project file location and distills a path from it, and then uses it to assign a path-based SAS library.

/* a bit of code to detect the local project path                          */
/* NOTE: &_CLIENTPROJECTPATH is set only if you saved the project already! */
%let localProjectPath =
   %sysfunc(substr(%sysfunc(dequote(&_CLIENTPROJECTPATH)), 1,
   %sysfunc(findc(%sysfunc(dequote(&_CLIENTPROJECTPATH)), %str(/\), -255 ))));

libname DC "&localProjectPath";
 
Here's the output:

21  libname DC "&localProjectPath";
NOTE: Libref DC was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\DataSources\DonorsChoose
 
This allows me to keep the project together with the data that it consumes and creates. And I can easily share the project and data with a colleague, who can store it in a different folder on his/her machine, and it should work just the same without any changes.
7月 082011
 
The recent issue of InformationWeek features a Q&A session with Ken Thompson, one of the creators of the Unix operating system. (He collaborated with Dennis Ritchie, of C language fame. Since much of SAS is written in C, I daresay there are a few copies of K&R around here.)

One of the questions hit on the topic of code reviews:

Interviewer: Was there any concept of looking at each other's code or doing code reviews?

Thompson: [Shaking head] We were all pretty good coders.

The implication: only bad coders need code reviews.

This is an outdated attitude towards software development. Unfortunately, I've encountered this attitude among some software professionals that I've worked with (outside of SAS as well as within). Why do some programmers (especially those from the "good ol' days") place little value on code reviews?

Perhaps peer code reviews were less important 30 years ago. The constraints around running code were much tighter. There were fewer layers in the stack where bugs could lurk. Memory was scarce, instruction sets were limited. Computer applications were like my 1973 AMC Gremlin: fewer moving parts, so it was easier to see which parts weren't working correctly and predict when they would fail. (Most often, it was the carburetor -- another outmoded concept.)

Also, the discipline of programming was newer back then. Perhaps there was less variability among the approaches you could use to solve a problem with code. If you had the skills to code the solution, maybe your solution would not differ much from that of any other similarly skilled colleague.

Well, those days are gone. Now more than ever, we need code reviews to be a formal part of how we work as SAS professionals (and as software professionals in general). Mark Chu-Carroll spells out many of the reasons in his blog post, Things Everyone Should Do: Code Review. As Mark says, it's not about the problems that a reviewer will catch; it's more about the programmer who prepares to have his/her work reviewed. Knowing that you have to explain this to another person, that someone else will be looking at your work...that can help to keep you on your toes.

In addition to reasons that Mark cites, those of us who work with SAS have a special motivation: we need to pass on the knowledge of a rich legacy code base to our younger workers. And because the SAS language expands with each new release, old-time SAS programmers need exposure to people who can apply new techniques to old problems. Code reviews are one way to help make that happen: improve quality, ensure continuity, and keep it fresh. Who can argue against that?

6月 262011
 
About a year ago (wow, has it been that long?), I posted an example program that lets you report on the contents of a SAS information map. Using my example, you can see the data items, filters, and folder structure within a given information map.

Last week a reader posted a comment, wondering how to also learn the names of the SAS tables that contribute to the SAS information map's definition. I know that you can't get to that information using the information map dictionary tables, but I did learn that it is possible using PROC INFOMAPS and the LIST statement.

Here is an example that relates to the information map that I reported on earlier:


proc infomaps ;
 update infomap "Cars" mappath="/Shared Data/Maps";
 list datasources;
run;
 
The output goes to the SAS log:

Total datasources: 1
Data source: Sample Data.CARS_1993
ID: CARS_1993
Name: CARS_1993
Description:
 
Wow, I guess that was a pretty simple information map, with only one data source. I'm sure that you can come up with more complex examples.
6月 172011
 
SAS-based processes are critical to many organizations, but sometimes the trickiest part of your job falls into one or both of these activities:
  • Getting stuff from the outside world "into" SAS. (Once it's in SAS, as many of you know, the world is your oyster.)
  • Getting the output of your SAS process "out" to the non-SAS people who need to consume it.
Here's a handy DATA step program that can copy file content from one place to another. It copies the content byte-by-byte, so it's great for copying files from your SAS session to a place outside of SAS, or vice versa.

/* these IN and OUT filerefs can point to anything */
filename in "c:\dataIn\input.xlsx";
filename out "c:\dataOut\output.xlsx";

/* copy the file byte-for-byte  */
data _null_;
  length filein 8 fileid 8;
  filein = fopen('in','I',1,'B');
  fileid = fopen('out','O',1,'B');
  rec = '20'x;
  do while(fread(filein)=0);
     rc = fget(filein,rec,1);
     rc = fput(fileid, rec);
     rc =fwrite(fileid);
  end;
  rc = fclose(filein);
  rc = fclose(fileid);
run;

filename in clear;
filename out clear;
 
It's true that you can copy disk-based files from one place to another by using operating system shell commands (via SYSTASK, for example). But the cool thing about the above program is that it can copy files to/from other places as well -- any location that you can access with a FILENAME statement, including URLs. For example, imagine that there is a file on the Web that you want to bring into SAS for analysis. Simply use FILENAME URL to define the IN fileref. Here's an example that grabs an Excel file from the Web and imports it into SAS:

filename in url "http://www.LotsOfData.org/data/data.xlsx"
  /* PROXY= is important for going outside firewall, if you have one */
  /* proxy="http://yourProxy.company.com" */
  ;
filename out "c:\temp\data.xlsx";

data _null_;
 length filein 8 fileid 8;
 filein = fopen('in','I',1,'B');
 fileid = fopen('out','O',1,'B');
 rec = '20'x;
 do while(fread(filein)=0);
  rc = fget(filein,rec,1);
  rc = fput(fileid, rec);
  rc =fwrite(fileid);
 end;
 rc = fclose(filein);
 rc = fclose(fileid);
run;

/* Works on 32-bit Windows */
/* If using 64-bit SAS, you must use DBMS=EXCELCS */
PROC IMPORT OUT= WORK.test
  DATAFILE = out /* the downloaded copy */
  DBMS=EXCEL REPLACE;
  SHEET="FirstSheet";
  SCANTEXT=YES;
  USEDATE=YES;
  SCANTIME=YES;
  GETNAMES=YES; /* not supported for EXCELCS */
  MIXED=NO; /* not supported for EXCELCS */
RUN;

filename in clear;
filename out clear;
 
Or, going the other way, perhaps you have a SAS stored process that creates a file that you want to position as a "download" file when the user runs the stored process in a web browser. In that case, you can use the reserved fileref _WEBOUT instead of the OUT fileref. For a specific example of this in action, see this tip about working with the Stored Process Web Application.
5月 102011
 
It's a simple task to use SAS to compute the number of weekdays between two dates. You can use the INTCK function with the WEEKDAY interval to come up with that number.

diff = intkc('WEEKDAY', start_date, end_date);
 
If you want to compute the number of working days between two dates, and assuming that every weekday is also a workday, this is perfect. However, most cultures observe certain non-productive days that they call "holidays", and this use of the INTCK function does not consider these when discounting the total sum of days.

Microsoft Excel supplies a function called NETWORKDAYS that can count the weekdays and consider a range of holiday dates when computing the total. This type of function is useful for project planners when they want to tick down the number of remaining days available to complete some work task. While there is no direct analogy to that function in SAS, it's not difficult to write your own functions by using the FCMP procedure. Here is an example of this function:


proc fcmp  outlib=work.myfuncs.dates;
  function networkdays(d1,d2,holidayDataset $,dateColumn $);

    /* make sure the start date < end date */
    start_date = min(d1,d2);
    end_date = max(d1,d2);

    /* read holiday data into array */
    /* array will resize as necessary */
    array holidays[1] / nosymbols;
    if (not missing(holidayDataset) and exist(holidayDataset)) then
       rc = read_array(holidayDataset, holidays, dateColumn);
    else put "NOTE: networkdays(): No Holiday data considered";

    /* INTCK computes transitions from one day to the next */
    /* To include the start date, if it is a weekday, then */
    /*  make the start date one day earlier.               */
    if (1 < weekday(start_date)< 7) then start_date = start_date-1;
       diff = intck('WEEKDAY', start_date, end_date);
    do i = 1 to dim(holidays);
    if (1 < weekday(holidays[i])< 7) and
         (start_date <= holidays[i] <= end_date) then
           diff = diff - 1;
    end;
    return(diff);
  endsub;
run; quit;
 
This function can read a range of holiday dates from a data set. Assuming that you have a data set named USHOLIDAYS with a date column named HOLIDAYDATE, you could use the function like this:

options cmplib=work.myfuncs;
/* test with data set of values */
data test;
  length dates 8;
  format dates date9.;
  infile datalines dsd;
  input dates : date9.;
  workdaysSince = networkdays(dates, today(), "work.usholidays","holidaydate");
datalines;
01NOV2010
21NOV2010
01DEC2010
01APR2011
;
 
This shows an example of the result:

Example output using this custom function

If you want to see the complete example with sample holiday data, I've placed it on the support site here. (Special thanks to my colleague Jason, whose paper I referred to, for his help in refining this example.)

4月 132011
 
This morning I delivered a talk to visiting high school students at the SAS campus. The topic: using SAS to analyze Twitter content.

Being teenagers, high school students are well familiar with Twitter. But this batch of students was also very familiar with SAS, as they all have taken SAS programming as a course within their schools. (In fact, at least one of the students had earned a SAS certification!)

The organizers of today's program had asked me to keep it very interactive, so I left it to the students to help select what topic we would search for and then analyze from Twitter.

Before we began, I warned them. "This is the Internet," I said. "Participants on Twitter don't always use words that are acceptable in polite conversation. If any bad words appear on the screen during our exercise, I'm going to behave just like I do when it happens in front of my young daughters at home: I'm going to pretend that it's not there."

(However, knowing that I was scheduled to present this morning, I did take precautions and asked Twitter to watch its mouth.)

The students offered several great suggestions of trending topics on Twitter; in the end we settled on #Thor (as in, god of thunder), which should provide interesting content thanks to an imminent theatrical release. (Hence, my "hammer time" title on the blog. Get it?)

And that's when I did something I've never done before. I showed SAS program code to high school students. And they understood it.

After a brief description of some of the key statements and constructs, we ran the program to retrieve 1000 tweets that were tagged #Thor (presumably mostly about the movie, but who knows? Maybe Odin is out there promoting his brand.)

We ran a frequency chart (using the SGPLOT procedure) to show the distribution of tweets over the past 90 hours. Looks like there was a spike in activity around midnight EDT. One student guessed: A new trailer being promoted, perhaps?


Because Twitter content is unstructured text data, we can use what we know about Twitter "conventions" and parse out some extra information, such as how many of the tweets are retweets, and who was the original "tweeter"? We used DATA step and regular expressions to identify the retweets, and then ran a frequency analysis (FREQ) to identify the accounts who had the most retweeted content.


We noticed that @Marvel dominated the conversation, at least for our small sample. I guess it makes sense that they have a vested interest, since they "own" the Thor character. And @DrPepper had quite a lot to say. We guessed that maybe the soft drink is a sponsor for the film and we'll see a few cans of Dr. Pepper on the big screen? (I checked later and it looks like, yes indeed, Thor is a Pepper - wouldn't you like to be a Pepper too?)

I couldn't have been more pleased with how this session went. Together, the students and I used our SAS programming skills and critical thinking to investigate a topic, and we learned things that we didn't know before. We gathered raw data that we had never seen before, and we turned it into information. And that is a life skill that will never go out of style.

2月 042011
 
Millions of Americans will be gathering around the television this Sunday to watch Super Bowl XLV. They'll gather in bars and private homes, prepare billions of calories worth of snacks, and root for their favorite teams.

But if you're looking for an alternate form of entertainment, why not watch "New Goodies for the SAS Programmer in SAS Enterprise Guide"? It's educational and family friendly; I promise that it doesn't feature any racy advertisements.

The recording of this SAS Talks webinar is now available for anyone to view. During the live session last week, we received lots of questions via the Q&A chat panel. These have been answered and summarized on this follow-up page for the webinar, which includes links to many other related resources.

If, after watching it, you have more questions, you can cruise over to the SAS Enterprise Guide discussion forum and ask the community. I'm not sure how many of them will be signed in this Sunday evening...perhaps only those that are Buffalo Bills fans like me.

2月 022011
 
SAS Global Forum, with its long standing history, continues to be the hub and melting pot of brain-stimulating concoctions and eureka moments for SAS users across the globe! This year the magic continues in Las Vegas, at Caesars Palace from April 4 – 7. This series, Snapshot of the Best Papers of 2010, brings with it a tasting of some of those magical ingredients that go into making this conference such a huge success. It consists of a generous portion of insightful tidbits from our paper winners of 2010, a dash of some of the current tracks you’ll see this year, along with a sprinkling of those section descriptions.

All that, along with our users’ interactions, is what makes this conference so stimulating and flavorful through the years! Apart from the thought of appetizing food, the next thing I bet that’s likely to pique the interest of our SAS Professionals is a challenging code worth cracking. This week’s post is a delightful offering of just that, as I present Aaron Augustine’s paper SAS Code Validation: L.E.T.O. Method. It was the winning presentation under the Coders’ Corner section. This year we have Lois Levin, an independent consultant, and Peter Crawford of Crawford Software Consultancy Ltd. co-chairing this section. There are a lot of interesting topics to choose from this year. Whether it is: SAS and Excel, macro libraries, Web crawling for data, PRX genotype, SAS data on smart phones, documenting with wikis, optimizing joins without merging, creating functions with the PROTO PROC, Report and Tabulate reporting tricks, or SAS pitfalls to beware of, there’s plenty to go around the room.

While it might not be as sensational and dramatic as the Da Vinci code, Augustine’s paper offers practical ideas to tackling an analytics world where a lot of SAS code is written for ad hoc analysis. His paper prescribes the L.E.T.O method! I got in touch with Augustine,Director of Analytics Research and Development at Information Resources Inc., to further decode the L.E.T.O method for us.
Continue reading "Coders' Corner: SAS Code Validation: L.E.T.O. Method"
12月 242010
 
A customer phoned up SAS Technical Support the other day with a problem. (Yes, that happens occasionally...) When he ran a certain program through SAS Enterprise Guide, the program didn't seem to complete and no results were returned. Here is a snippet of the program (modified a bit to make it less proprietary) :
  1. proc sql noprint;
  2.   connect to teradata as tera2
  3.   (user=&user pw=&password server=&server
  4.   mode=teradata FASTEXPORT=YES);
  5.   create table buyers as
  6.     select * from connection to tera2
  7.      (select id,
  8.         trxn_dt,
  9.         sum (ln_itm_qty) as Items,
  10.         sum (ext_prc_amt as Sales
  11.         from store.linetrxn a
  12.           join cust.indv_xref b
  13.           on a.rid = b.rid
  14. /* remainder removed for brevity */
Can you spot the problem? Other than the use of a user ID and password, which can be avoided, what else is wrong with this program? Why does it fail to finish?

The SAS code formatter in SAS Enterprise Guide 4.3 can tell you. Click Ctrl+I, and the code formatter scans your program and attempts to indent it for proper readability. But for this program, it stops short of that with this message:

Formatting Error
Please check for an unmatched parenthesis on line 7.

Now you know: there is an unmatched parenthesis! And every SAS programmer knows that a missing parenthesis, unmatched quote, or missing semicolon can cause problems for the SAS language parser. (Rick Wicklin calls this a "parse-time error" in his post about finding and fixing programming errors.)

But where is the unmatched paren in this program? The message says "line 7", but line 7 looks okay. There is an open paren for the inner SELECT clause. But starting there, there must be a missing matching parenthesis within the clause. To find it, use the arrow keys in the program editor to navigate the program and visit every parenthesis character in the region. The program editor in SAS Enterprise Guide 4.3 highlights the matching parentheses pairs as you navigate:

See the matches

You can also use the Ctrl+[ (open bracket) key to move the cursor from one parenthesis to its match, if there is one. If no match can be found, the program editor emits a friendly beep to tell you, "sorry, can't find a match".

In this example, you don't have to look further than line 10, where the programmer has left off a closing parenthesis. The line should read: sum (ext_prc_amt) as Sales.

People are really starting to appeciate the new SAS program editor (including the formatter). Earlier this week I received an unsolicited phone call from a SAS programmer (who happens to work at SAS). She was calling just to say that this feature is "AWESOME" and that it has saved her countless hours as she maintains legacy SAS programs. Angela Hall also highlights it in her blog about real-world BI experiences.

I love to hear these stories. Have you started using SAS Enterprise Guide 4.3 yet? What do you like about it? And what could be improved? Let us know, either here in the comments or via SAS Technical Support.