sas programming

6月 262011
About a year ago (wow, has it been that long?), I posted an example program that lets you report on the contents of a SAS information map. Using my example, you can see the data items, filters, and folder structure within a given information map.

Last week a reader posted a comment, wondering how to also learn the names of the SAS tables that contribute to the SAS information map's definition. I know that you can't get to that information using the information map dictionary tables, but I did learn that it is possible using PROC INFOMAPS and the LIST statement.

Here is an example that relates to the information map that I reported on earlier:

proc infomaps ;
 update infomap "Cars" mappath="/Shared Data/Maps";
 list datasources;
The output goes to the SAS log:

Total datasources: 1
Data source: Sample Data.CARS_1993
ID: CARS_1993
Name: CARS_1993
Wow, I guess that was a pretty simple information map, with only one data source. I'm sure that you can come up with more complex examples.
6月 172011
SAS-based processes are critical to many organizations, but sometimes the trickiest part of your job falls into one or both of these activities:
  • Getting stuff from the outside world "into" SAS. (Once it's in SAS, as many of you know, the world is your oyster.)
  • Getting the output of your SAS process "out" to the non-SAS people who need to consume it.
Here's a handy DATA step program that can copy file content from one place to another. It copies the content byte-by-byte, so it's great for copying files from your SAS session to a place outside of SAS, or vice versa.

/* these IN and OUT filerefs can point to anything */
filename in "c:\dataIn\input.xlsx";
filename out "c:\dataOut\output.xlsx";

/* copy the file byte-for-byte  */
data _null_;
  length filein 8 fileid 8;
  filein = fopen('in','I',1,'B');
  fileid = fopen('out','O',1,'B');
  rec = '20'x;
  do while(fread(filein)=0);
     rc = fget(filein,rec,1);
     rc = fput(fileid, rec);
     rc =fwrite(fileid);
  rc = fclose(filein);
  rc = fclose(fileid);

filename in clear;
filename out clear;
It's true that you can copy disk-based files from one place to another by using operating system shell commands (via SYSTASK, for example). But the cool thing about the above program is that it can copy files to/from other places as well -- any location that you can access with a FILENAME statement, including URLs. For example, imagine that there is a file on the Web that you want to bring into SAS for analysis. Simply use FILENAME URL to define the IN fileref. Here's an example that grabs an Excel file from the Web and imports it into SAS:

filename in url ""
  /* PROXY= is important for going outside firewall, if you have one */
  /* proxy="" */
filename out "c:\temp\data.xlsx";

data _null_;
 length filein 8 fileid 8;
 filein = fopen('in','I',1,'B');
 fileid = fopen('out','O',1,'B');
 rec = '20'x;
 do while(fread(filein)=0);
  rc = fget(filein,rec,1);
  rc = fput(fileid, rec);
  rc =fwrite(fileid);
 rc = fclose(filein);
 rc = fclose(fileid);

/* Works on 32-bit Windows */
/* If using 64-bit SAS, you must use DBMS=EXCELCS */
  DATAFILE = out /* the downloaded copy */
  GETNAMES=YES; /* not supported for EXCELCS */
  MIXED=NO; /* not supported for EXCELCS */

filename in clear;
filename out clear;
Or, going the other way, perhaps you have a SAS stored process that creates a file that you want to position as a "download" file when the user runs the stored process in a web browser. In that case, you can use the reserved fileref _WEBOUT instead of the OUT fileref. For a specific example of this in action, see this tip about working with the Stored Process Web Application.
5月 102011
It's a simple task to use SAS to compute the number of weekdays between two dates. You can use the INTCK function with the WEEKDAY interval to come up with that number.

diff = intkc('WEEKDAY', start_date, end_date);
If you want to compute the number of working days between two dates, and assuming that every weekday is also a workday, this is perfect. However, most cultures observe certain non-productive days that they call "holidays", and this use of the INTCK function does not consider these when discounting the total sum of days.

Microsoft Excel supplies a function called NETWORKDAYS that can count the weekdays and consider a range of holiday dates when computing the total. This type of function is useful for project planners when they want to tick down the number of remaining days available to complete some work task. While there is no direct analogy to that function in SAS, it's not difficult to write your own functions by using the FCMP procedure. Here is an example of this function:

proc fcmp  outlib=work.myfuncs.dates;
  function networkdays(d1,d2,holidayDataset $,dateColumn $);

    /* make sure the start date < end date */
    start_date = min(d1,d2);
    end_date = max(d1,d2);

    /* read holiday data into array */
    /* array will resize as necessary */
    array holidays[1] / nosymbols;
    if (not missing(holidayDataset) and exist(holidayDataset)) then
       rc = read_array(holidayDataset, holidays, dateColumn);
    else put "NOTE: networkdays(): No Holiday data considered";

    /* INTCK computes transitions from one day to the next */
    /* To include the start date, if it is a weekday, then */
    /*  make the start date one day earlier.               */
    if (1 < weekday(start_date)< 7) then start_date = start_date-1;
       diff = intck('WEEKDAY', start_date, end_date);
    do i = 1 to dim(holidays);
    if (1 < weekday(holidays[i])< 7) and
         (start_date <= holidays[i] <= end_date) then
           diff = diff - 1;
run; quit;
This function can read a range of holiday dates from a data set. Assuming that you have a data set named USHOLIDAYS with a date column named HOLIDAYDATE, you could use the function like this:

options cmplib=work.myfuncs;
/* test with data set of values */
data test;
  length dates 8;
  format dates date9.;
  infile datalines dsd;
  input dates : date9.;
  workdaysSince = networkdays(dates, today(), "work.usholidays","holidaydate");
This shows an example of the result:

Example output using this custom function

If you want to see the complete example with sample holiday data, I've placed it on the support site here. (Special thanks to my colleague Jason, whose paper I referred to, for his help in refining this example.)

4月 132011
This morning I delivered a talk to visiting high school students at the SAS campus. The topic: using SAS to analyze Twitter content.

Being teenagers, high school students are well familiar with Twitter. But this batch of students was also very familiar with SAS, as they all have taken SAS programming as a course within their schools. (In fact, at least one of the students had earned a SAS certification!)

The organizers of today's program had asked me to keep it very interactive, so I left it to the students to help select what topic we would search for and then analyze from Twitter.

Before we began, I warned them. "This is the Internet," I said. "Participants on Twitter don't always use words that are acceptable in polite conversation. If any bad words appear on the screen during our exercise, I'm going to behave just like I do when it happens in front of my young daughters at home: I'm going to pretend that it's not there."

(However, knowing that I was scheduled to present this morning, I did take precautions and asked Twitter to watch its mouth.)

The students offered several great suggestions of trending topics on Twitter; in the end we settled on #Thor (as in, god of thunder), which should provide interesting content thanks to an imminent theatrical release. (Hence, my "hammer time" title on the blog. Get it?)

And that's when I did something I've never done before. I showed SAS program code to high school students. And they understood it.

After a brief description of some of the key statements and constructs, we ran the program to retrieve 1000 tweets that were tagged #Thor (presumably mostly about the movie, but who knows? Maybe Odin is out there promoting his brand.)

We ran a frequency chart (using the SGPLOT procedure) to show the distribution of tweets over the past 90 hours. Looks like there was a spike in activity around midnight EDT. One student guessed: A new trailer being promoted, perhaps?

Because Twitter content is unstructured text data, we can use what we know about Twitter "conventions" and parse out some extra information, such as how many of the tweets are retweets, and who was the original "tweeter"? We used DATA step and regular expressions to identify the retweets, and then ran a frequency analysis (FREQ) to identify the accounts who had the most retweeted content.

We noticed that @Marvel dominated the conversation, at least for our small sample. I guess it makes sense that they have a vested interest, since they "own" the Thor character. And @DrPepper had quite a lot to say. We guessed that maybe the soft drink is a sponsor for the film and we'll see a few cans of Dr. Pepper on the big screen? (I checked later and it looks like, yes indeed, Thor is a Pepper - wouldn't you like to be a Pepper too?)

I couldn't have been more pleased with how this session went. Together, the students and I used our SAS programming skills and critical thinking to investigate a topic, and we learned things that we didn't know before. We gathered raw data that we had never seen before, and we turned it into information. And that is a life skill that will never go out of style.

2月 042011
Millions of Americans will be gathering around the television this Sunday to watch Super Bowl XLV. They'll gather in bars and private homes, prepare billions of calories worth of snacks, and root for their favorite teams.

But if you're looking for an alternate form of entertainment, why not watch "New Goodies for the SAS Programmer in SAS Enterprise Guide"? It's educational and family friendly; I promise that it doesn't feature any racy advertisements.

The recording of this SAS Talks webinar is now available for anyone to view. During the live session last week, we received lots of questions via the Q&A chat panel. These have been answered and summarized on this follow-up page for the webinar, which includes links to many other related resources.

If, after watching it, you have more questions, you can cruise over to the SAS Enterprise Guide discussion forum and ask the community. I'm not sure how many of them will be signed in this Sunday evening...perhaps only those that are Buffalo Bills fans like me.

2月 022011
SAS Global Forum, with its long standing history, continues to be the hub and melting pot of brain-stimulating concoctions and eureka moments for SAS users across the globe! This year the magic continues in Las Vegas, at Caesars Palace from April 4 – 7. This series, Snapshot of the Best Papers of 2010, brings with it a tasting of some of those magical ingredients that go into making this conference such a huge success. It consists of a generous portion of insightful tidbits from our paper winners of 2010, a dash of some of the current tracks you’ll see this year, along with a sprinkling of those section descriptions.

All that, along with our users’ interactions, is what makes this conference so stimulating and flavorful through the years! Apart from the thought of appetizing food, the next thing I bet that’s likely to pique the interest of our SAS Professionals is a challenging code worth cracking. This week’s post is a delightful offering of just that, as I present Aaron Augustine’s paper SAS Code Validation: L.E.T.O. Method. It was the winning presentation under the Coders’ Corner section. This year we have Lois Levin, an independent consultant, and Peter Crawford of Crawford Software Consultancy Ltd. co-chairing this section. There are a lot of interesting topics to choose from this year. Whether it is: SAS and Excel, macro libraries, Web crawling for data, PRX genotype, SAS data on smart phones, documenting with wikis, optimizing joins without merging, creating functions with the PROTO PROC, Report and Tabulate reporting tricks, or SAS pitfalls to beware of, there’s plenty to go around the room.

While it might not be as sensational and dramatic as the Da Vinci code, Augustine’s paper offers practical ideas to tackling an analytics world where a lot of SAS code is written for ad hoc analysis. His paper prescribes the L.E.T.O method! I got in touch with Augustine,Director of Analytics Research and Development at Information Resources Inc., to further decode the L.E.T.O method for us.
Continue reading "Coders' Corner: SAS Code Validation: L.E.T.O. Method"
12月 242010
A customer phoned up SAS Technical Support the other day with a problem. (Yes, that happens occasionally...) When he ran a certain program through SAS Enterprise Guide, the program didn't seem to complete and no results were returned. Here is a snippet of the program (modified a bit to make it less proprietary) :
  1. proc sql noprint;
  2.   connect to teradata as tera2
  3.   (user=&user pw=&password server=&server
  4.   mode=teradata FASTEXPORT=YES);
  5.   create table buyers as
  6.     select * from connection to tera2
  7.      (select id,
  8.         trxn_dt,
  9.         sum (ln_itm_qty) as Items,
  10.         sum (ext_prc_amt as Sales
  11.         from store.linetrxn a
  12.           join cust.indv_xref b
  13.           on a.rid = b.rid
  14. /* remainder removed for brevity */
Can you spot the problem? Other than the use of a user ID and password, which can be avoided, what else is wrong with this program? Why does it fail to finish?

The SAS code formatter in SAS Enterprise Guide 4.3 can tell you. Click Ctrl+I, and the code formatter scans your program and attempts to indent it for proper readability. But for this program, it stops short of that with this message:

Formatting Error
Please check for an unmatched parenthesis on line 7.

Now you know: there is an unmatched parenthesis! And every SAS programmer knows that a missing parenthesis, unmatched quote, or missing semicolon can cause problems for the SAS language parser. (Rick Wicklin calls this a "parse-time error" in his post about finding and fixing programming errors.)

But where is the unmatched paren in this program? The message says "line 7", but line 7 looks okay. There is an open paren for the inner SELECT clause. But starting there, there must be a missing matching parenthesis within the clause. To find it, use the arrow keys in the program editor to navigate the program and visit every parenthesis character in the region. The program editor in SAS Enterprise Guide 4.3 highlights the matching parentheses pairs as you navigate:

See the matches

You can also use the Ctrl+[ (open bracket) key to move the cursor from one parenthesis to its match, if there is one. If no match can be found, the program editor emits a friendly beep to tell you, "sorry, can't find a match".

In this example, you don't have to look further than line 10, where the programmer has left off a closing parenthesis. The line should read: sum (ext_prc_amt) as Sales.

People are really starting to appeciate the new SAS program editor (including the formatter). Earlier this week I received an unsolicited phone call from a SAS programmer (who happens to work at SAS). She was calling just to say that this feature is "AWESOME" and that it has saved her countless hours as she maintains legacy SAS programs. Angela Hall also highlights it in her blog about real-world BI experiences.

I love to hear these stories. Have you started using SAS Enterprise Guide 4.3 yet? What do you like about it? And what could be improved? Let us know, either here in the comments or via SAS Technical Support.

12月 182010
You might be too young to remember Clara Peller. She was the star of a series of fast-food burger commercials in the 1980s, in which she demanded meatier meals by shouting "Where's the beef?" at the pickup counter or drive-through window. Alas, the competitor restaurant meals were afflicted with "Fluffy bun", meaning that it was difficult to find the all-beef patty because it was dwarfed by the bread in which it was served.

SAS Enterprise Guide can also serve up some meaty content in the SAS log when you run a SAS program. But in order get that content to you from your SAS session, SAS Enterprise Guide wraps your program in what we call the "ODS sandwich" -- SAS statements that open and close one or more ODS destinations around your program, so that your results can be packaged from SAS and delivered into your project view.

Sometimes, the ODS sandwich can obscure the content that you're really interested in. Consider this log:

1   ;*';*";*/;quit;run;
NOTE: Procedures may not support all options or statements for all devices. For details, see the documentation for each procedure.
12    ODS tagsets.sasreport12(ID=EGSR) FILE=EGSR STYLE=Analysis
12  ! STYLESHEET=(URL="file:///C:/Projects/f2ec43/winclient/Build/Debug/Styles/Analysis.css") NOGTITLE NOGFOOTNOTE
12  ! GPATH=&sasworklocation ENCODING=UTF8 options(rolap="on");
15   proc means;
16     class origin;
17   run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.16 seconds
      cpu time            0.04 seconds
25   ;*';*";*/;quit;run;
29   QUIT; RUN;

There are only a few lines of interest here, but lots of fluff. It's necessary fluff, mind you. Just like you wouldn't want the fast-food restaurant to hand over your burger (and all the fixins) without a bun, you need to use ODS to deliver your output. But it's not what you came for.

In SAS Enterprise Guide 4.2 and 4.3, you can change an option to hide the fluff in your log. Select Tools->Options and go to the "Results General" page. See the checkbox option that says "Show generated wrapper code in SAS log"? If you clear that checkbox, the ODS sandwich will be hidden from you when you run your programs. (If you're curious how this works, this is the technique that we use.)

Here's the previous example with this option cleared, sans the fluffy ODS bun:

1    ;*';*";*/;quit;run;
3    %_eg_hidenotesandsource;
19   proc means;
20     class origin;
21   run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.04 seconds
      cpu time            0.06 seconds
23   %_eg_hidenotesandsource;
37   QUIT; RUN;

Look carefully at the log. You'll notice that the line numbers are not consecutive -- it skips a few! That's because the ODS statements are still submitted and processed, but the log output for those statements is suppressed. The result is a cleaner log, with more meat than fluff.

We know that many programmers work in regulated industries, and aren't permitted to suppress portions of the SAS log like this. That's why the option is set to "show all" by default. But if you're in search of "just the beef", give this handy option a try.

Peek at your data using VBScript, OLE DB, and the SAS local data provider

 ole db, sas programming, SAS tips  Peek at your data using VBScript, OLE DB, and the SAS local data provider已关闭评论
11月 042010
Let's pretend for a moment that you don't have SAS installed on all of your office computers. But you've got some great content locked away inside SAS data sets. Is there a way to get to the content of that data, without having to push the data through a SAS tier?

There is a way! All that you need is the SAS OLE DB Local Data Provider and some programming know-how. You can download the provider from SAS. And the know-how? The basic recipes are in the SAS 9.2 Providers for OLE DB Cookbook. (There's a version for SAS 9.1 as well.)

Note: there are limitations when using the Local Data Provider. Because there is no SAS session in this mix, you don't get the benefit of SAS data services. For example, it can't handle your user-defined formats. You might sacrifice precision for some numeric values. You don't have the full cross-platform support that SAS provides.

Here is an example script that verifies that the provider is installed, opens a data set file, and reports on the row count and column names/types. It's a very simple example that doesn't get very fancy and doesn't include proper error checking, but I hope that it shows you the possibilities.

To run the example on your Windows machine:

  1. Copy the program into a local file (for example, c:\datatest\ShowColumns.vbs).
  2. Change the filename and path variables in the script to point to data files on your machine.
  3. Open a command prompt window.
  4. Run the example program with cscript.exe. For example:

    cscript.exe c:\datatest\ShowColumns.vbs
    Note that this can work with the 32-bit or 64-bit versions of the SAS OLE DB providers. If you are on a 64-bit machine and you want to use the 32-bit provider, be sure to run the 32-bit version of cscript.exe:

    %windir%\SysWow64\cscript.exe c:\datatest\ShowColumns.vbs
Here's the program:

' Change these to report on your data set
' Change path to the directory where your data set resides
' Change filename to the root name of the sas7bdat file (no extension)
path = "C:\Program Files\SAS\EnterpriseGuide\4.3\Sample\Data"
filename = "Candy_Sales_Summary"

' Check registry for SAS Local Provider
Set WSHShell = CreateObject("WScript.Shell")
clsID = WSHShell.RegRead("HKCR\SAS.LocalProvider\CLSID\")
WScript.Echo "DIAGNOSTICS: SAS.LocalProvider CLSID is " & clsID
inProcServer = WSHShell.RegRead("HKCR\CLSID\"  & clsID & "\InprocServer32\")
WScript.Echo "DIAGNOSTICS: Registered InprocServer32 DLL is " & inProcServer

' Constants for ADO calls
Const adOpenDynamic = 2
Const adLockOptimistic = 3
Const adCmdTableDirect = 512

' Instantiate the provider object
Set obConnection = CreateObject("ADODB.Connection")
Set obRecordset = CreateObject("ADODB.Recordset")

obConnection.Provider = "SAS.LocalProvider"
obConnection.Properties("Data Source") = path
obRecordset.Open filename, obConnection, adOpenDynamic, adLockOptimistic, adCmdTableDirect

 'Report on Fields in this data set
WScript.Echo ""
 WScript.Echo "Opened data " & filename & ", Record count: " & obRecordset.RecordCount
 For Each Field In obRecordset.Fields
   If Field.Type = 5 Then pType = "Numeric"
   If Field.Type = 200 Then pType = "Character"
   WScript.Echo Field.Name & " " & pType

Output from my example:

Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

DIAGNOSTICS: SAS.LocalProvider CLSID is {0221264D-F909-4872-B7E6-B108D3FC8E8B}
DIAGNOSTICS: Registered InprocServer32 DLL is C:\Program Files\SAS\SharedFiles\SASProvidersForOLEDB\sasafloc0902.dll

Opened data Candy_Sales_Summary, Record count: 15000
Name Character
Region Character
OrderID Numeric
ProdID Numeric
Customer Numeric
Type Character
Product Character
Category Character
Subcategory Character
Retail_Price Numeric
Units Numeric
Discount Character
Date Numeric
Fiscal_Year Character
Fiscal_Quarter Character
Fiscal_Month_Num Numeric
Sale_Amount Numeric
10月 232010
The question came up on the SAS Enterprise Guide discussion forum: which do you prefer, List Report Wizard (PROC REPORT) or Summary Tables (PROC TABULATE)? And as with most SAS-related questions, the proper response is: "it depends." If you put these two PROCs in the ring with a Google Fight, PROC REPORT is a clear winner. Cynthia@SAS delivered a very thoughtful answer. She first carefully separated out the "point-and-click" versus "code-it-yourself" concerns, highlighting the capabilities and limitations of the tasks within SAS Enterprise Guide. Then, for the people who want follow a programming approach, Cynthia offers these insights:
But for the beginner, I think the biggest issue that I've heard about (and it surprised me) was whether they wanted the box area in the upper left hand corner of the table. For some folks, it doesn't matter that the BOX is there -- but I've had some students switch to PROC REPORT just because they didn't like the big empty box -- seriously -- the BOX area was a deal breaker for them -- they wanted the look that they got with PROC REPORT. The second issue that I've heard about is changing the code -- some people find the PROC REPORT code easier to understand than the TABULATE code -- if and only if they have to change the code. The TABULATE code is more abbreviated and "algebra-like" with all the * and = in the TABLE statement. Even though the REPORT code is more verbose -- the existence of a COLUMN statement makes it clear which variable will come first on the report row, which second, which third, etc, etc. And then the explicit DEFINE statement makes it easy to see what statistic will get used for a numeric variable, the usage for a category variable (ORDER, GROUP or ACROSS).
So, as you can see, sometimes it actually does come down to a "BOXing" match. <groan...sorry>