sas programming

2月 042011
 
Millions of Americans will be gathering around the television this Sunday to watch Super Bowl XLV. They'll gather in bars and private homes, prepare billions of calories worth of snacks, and root for their favorite teams.

But if you're looking for an alternate form of entertainment, why not watch "New Goodies for the SAS Programmer in SAS Enterprise Guide"? It's educational and family friendly; I promise that it doesn't feature any racy advertisements.

The recording of this SAS Talks webinar is now available for anyone to view. During the live session last week, we received lots of questions via the Q&A chat panel. These have been answered and summarized on this follow-up page for the webinar, which includes links to many other related resources.

If, after watching it, you have more questions, you can cruise over to the SAS Enterprise Guide discussion forum and ask the community. I'm not sure how many of them will be signed in this Sunday evening...perhaps only those that are Buffalo Bills fans like me.

2月 022011
 
SAS Global Forum, with its long standing history, continues to be the hub and melting pot of brain-stimulating concoctions and eureka moments for SAS users across the globe! This year the magic continues in Las Vegas, at Caesars Palace from April 4 – 7. This series, Snapshot of the Best Papers of 2010, brings with it a tasting of some of those magical ingredients that go into making this conference such a huge success. It consists of a generous portion of insightful tidbits from our paper winners of 2010, a dash of some of the current tracks you’ll see this year, along with a sprinkling of those section descriptions.

All that, along with our users’ interactions, is what makes this conference so stimulating and flavorful through the years! Apart from the thought of appetizing food, the next thing I bet that’s likely to pique the interest of our SAS Professionals is a challenging code worth cracking. This week’s post is a delightful offering of just that, as I present Aaron Augustine’s paper SAS Code Validation: L.E.T.O. Method. It was the winning presentation under the Coders’ Corner section. This year we have Lois Levin, an independent consultant, and Peter Crawford of Crawford Software Consultancy Ltd. co-chairing this section. There are a lot of interesting topics to choose from this year. Whether it is: SAS and Excel, macro libraries, Web crawling for data, PRX genotype, SAS data on smart phones, documenting with wikis, optimizing joins without merging, creating functions with the PROTO PROC, Report and Tabulate reporting tricks, or SAS pitfalls to beware of, there’s plenty to go around the room.

While it might not be as sensational and dramatic as the Da Vinci code, Augustine’s paper offers practical ideas to tackling an analytics world where a lot of SAS code is written for ad hoc analysis. His paper prescribes the L.E.T.O method! I got in touch with Augustine,Director of Analytics Research and Development at Information Resources Inc., to further decode the L.E.T.O method for us.
Continue reading "Coders' Corner: SAS Code Validation: L.E.T.O. Method"
12月 242010
 
A customer phoned up SAS Technical Support the other day with a problem. (Yes, that happens occasionally...) When he ran a certain program through SAS Enterprise Guide, the program didn't seem to complete and no results were returned. Here is a snippet of the program (modified a bit to make it less proprietary) :
  1. proc sql noprint;
  2.   connect to teradata as tera2
  3.   (user=&user pw=&password server=&server
  4.   mode=teradata FASTEXPORT=YES);
  5.   create table buyers as
  6.     select * from connection to tera2
  7.      (select id,
  8.         trxn_dt,
  9.         sum (ln_itm_qty) as Items,
  10.         sum (ext_prc_amt as Sales
  11.         from store.linetrxn a
  12.           join cust.indv_xref b
  13.           on a.rid = b.rid
  14. /* remainder removed for brevity */
Can you spot the problem? Other than the use of a user ID and password, which can be avoided, what else is wrong with this program? Why does it fail to finish?

The SAS code formatter in SAS Enterprise Guide 4.3 can tell you. Click Ctrl+I, and the code formatter scans your program and attempts to indent it for proper readability. But for this program, it stops short of that with this message:

Formatting Error
Please check for an unmatched parenthesis on line 7.

Now you know: there is an unmatched parenthesis! And every SAS programmer knows that a missing parenthesis, unmatched quote, or missing semicolon can cause problems for the SAS language parser. (Rick Wicklin calls this a "parse-time error" in his post about finding and fixing programming errors.)

But where is the unmatched paren in this program? The message says "line 7", but line 7 looks okay. There is an open paren for the inner SELECT clause. But starting there, there must be a missing matching parenthesis within the clause. To find it, use the arrow keys in the program editor to navigate the program and visit every parenthesis character in the region. The program editor in SAS Enterprise Guide 4.3 highlights the matching parentheses pairs as you navigate:

See the matches

You can also use the Ctrl+[ (open bracket) key to move the cursor from one parenthesis to its match, if there is one. If no match can be found, the program editor emits a friendly beep to tell you, "sorry, can't find a match".

In this example, you don't have to look further than line 10, where the programmer has left off a closing parenthesis. The line should read: sum (ext_prc_amt) as Sales.

People are really starting to appeciate the new SAS program editor (including the formatter). Earlier this week I received an unsolicited phone call from a SAS programmer (who happens to work at SAS). She was calling just to say that this feature is "AWESOME" and that it has saved her countless hours as she maintains legacy SAS programs. Angela Hall also highlights it in her blog about real-world BI experiences.

I love to hear these stories. Have you started using SAS Enterprise Guide 4.3 yet? What do you like about it? And what could be improved? Let us know, either here in the comments or via SAS Technical Support.

12月 182010
 
You might be too young to remember Clara Peller. She was the star of a series of fast-food burger commercials in the 1980s, in which she demanded meatier meals by shouting "Where's the beef?" at the pickup counter or drive-through window. Alas, the competitor restaurant meals were afflicted with "Fluffy bun", meaning that it was difficult to find the all-beef patty because it was dwarfed by the bread in which it was served.

SAS Enterprise Guide can also serve up some meaty content in the SAS log when you run a SAS program. But in order get that content to you from your SAS session, SAS Enterprise Guide wraps your program in what we call the "ODS sandwich" -- SAS statements that open and close one or more ODS destinations around your program, so that your results can be packaged from SAS and delivered into your project view.

Sometimes, the ODS sandwich can obscure the content that you're really interested in. Consider this log:


1   ;*';*";*/;quit;run;
2   OPTIONS PAGENO=MIN;
3   %LET _CLIENTTASKLABEL='Program';
4   %LET _CLIENTPROJECTPATH='';
5   %LET _CLIENTPROJECTNAME='';
6   %LET _SASPROGRAMFILE=;
7  
8   ODS _ALL_ CLOSE;
9   OPTIONS DEV=ACTIVEX;
NOTE: Procedures may not support all options or statements for all devices. For details, see the documentation for each procedure.
10    GOPTIONS XPIXELS=0 YPIXELS=0;
11    FILENAME EGSR TEMP;
12    ODS tagsets.sasreport12(ID=EGSR) FILE=EGSR STYLE=Analysis
12  ! STYLESHEET=(URL="file:///C:/Projects/f2ec43/winclient/Build/Debug/Styles/Analysis.css") NOGTITLE NOGFOOTNOTE
12  ! GPATH=&sasworklocation ENCODING=UTF8 options(rolap="on");
NOTE: Writing TAGSETS.SASREPORT12(EGSR) Body file: EGSR
13  
14   GOPTIONS ACCESSIBLE;
15   proc means data=sashelp.cars;
16     class origin;
17   run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.16 seconds
      cpu time            0.04 seconds
18        
19   GOPTIONS NOACCESSIBLE;
20   %LET _CLIENTTASKLABEL=;
21   %LET _CLIENTPROJECTPATH=;
22   %LET _CLIENTPROJECTNAME=;
23   %LET _SASPROGRAMFILE=;
24  
25   ;*';*";*/;quit;run;
26   ODS _ALL_ CLOSE;
27  
28  
29   QUIT; RUN;
 

There are only a few lines of interest here, but lots of fluff. It's necessary fluff, mind you. Just like you wouldn't want the fast-food restaurant to hand over your burger (and all the fixins) without a bun, you need to use ODS to deliver your output. But it's not what you came for.

In SAS Enterprise Guide 4.2 and 4.3, you can change an option to hide the fluff in your log. Select Tools->Options and go to the "Results General" page. See the checkbox option that says "Show generated wrapper code in SAS log"? If you clear that checkbox, the ODS sandwich will be hidden from you when you run your programs. (If you're curious how this works, this is the technique that we use.)

Here's the previous example with this option cleared, sans the fluffy ODS bun:


1    ;*';*";*/;quit;run;
2    OPTIONS PAGENO=MIN;
3    %_eg_hidenotesandsource;
18  
19   proc means data=sashelp.cars;
20     class origin;
21   run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.04 seconds
      cpu time            0.06 seconds
22  
23   %_eg_hidenotesandsource;
35  
36  
37   QUIT; RUN;
 

Look carefully at the log. You'll notice that the line numbers are not consecutive -- it skips a few! That's because the ODS statements are still submitted and processed, but the log output for those statements is suppressed. The result is a cleaner log, with more meat than fluff.

We know that many programmers work in regulated industries, and aren't permitted to suppress portions of the SAS log like this. That's why the option is set to "show all" by default. But if you're in search of "just the beef", give this handy option a try.

Peek at your data using VBScript, OLE DB, and the SAS local data provider

 ole db, sas programming, SAS tips  Peek at your data using VBScript, OLE DB, and the SAS local data provider已关闭评论
11月 042010
 
Let's pretend for a moment that you don't have SAS installed on all of your office computers. But you've got some great content locked away inside SAS data sets. Is there a way to get to the content of that data, without having to push the data through a SAS tier?

There is a way! All that you need is the SAS OLE DB Local Data Provider and some programming know-how. You can download the provider from SAS. And the know-how? The basic recipes are in the SAS 9.2 Providers for OLE DB Cookbook. (There's a version for SAS 9.1 as well.)

Note: there are limitations when using the Local Data Provider. Because there is no SAS session in this mix, you don't get the benefit of SAS data services. For example, it can't handle your user-defined formats. You might sacrifice precision for some numeric values. You don't have the full cross-platform support that SAS provides.

Here is an example script that verifies that the provider is installed, opens a data set file, and reports on the row count and column names/types. It's a very simple example that doesn't get very fancy and doesn't include proper error checking, but I hope that it shows you the possibilities.

To run the example on your Windows machine:

  1. Copy the program into a local file (for example, c:\datatest\ShowColumns.vbs).
  2. Change the filename and path variables in the script to point to data files on your machine.
  3. Open a command prompt window.
  4. Run the example program with cscript.exe. For example:

    cscript.exe c:\datatest\ShowColumns.vbs
     
    Note that this can work with the 32-bit or 64-bit versions of the SAS OLE DB providers. If you are on a 64-bit machine and you want to use the 32-bit provider, be sure to run the 32-bit version of cscript.exe:

    %windir%\SysWow64\cscript.exe c:\datatest\ShowColumns.vbs
     
Here's the program:

' Change these to report on your data set
' Change path to the directory where your data set resides
' Change filename to the root name of the sas7bdat file (no extension)
path = "C:\Program Files\SAS\EnterpriseGuide\4.3\Sample\Data"
filename = "Candy_Sales_Summary"

' Check registry for SAS Local Provider
Set WSHShell = CreateObject("WScript.Shell")
clsID = WSHShell.RegRead("HKCR\SAS.LocalProvider\CLSID\")
WScript.Echo "DIAGNOSTICS: SAS.LocalProvider CLSID is " & clsID
inProcServer = WSHShell.RegRead("HKCR\CLSID\"  & clsID & "\InprocServer32\")
WScript.Echo "DIAGNOSTICS: Registered InprocServer32 DLL is " & inProcServer

' Constants for ADO calls
Const adOpenDynamic = 2
Const adLockOptimistic = 3
Const adCmdTableDirect = 512

' Instantiate the provider object
Set obConnection = CreateObject("ADODB.Connection")
Set obRecordset = CreateObject("ADODB.Recordset")

obConnection.Provider = "SAS.LocalProvider"
obConnection.Properties("Data Source") = path
obConnection.Open
obRecordset.Open filename, obConnection, adOpenDynamic, adLockOptimistic, adCmdTableDirect

 'Report on Fields in this data set
WScript.Echo ""
 WScript.Echo "Opened data " & filename & ", Record count: " & obRecordset.RecordCount
 For Each Field In obRecordset.Fields
   If Field.Type = 5 Then pType = "Numeric"
   If Field.Type = 200 Then pType = "Character"
   WScript.Echo Field.Name & " " & pType
 Next

obRecordset.Close
obConnection.Close
 
Output from my example:

Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

DIAGNOSTICS: SAS.LocalProvider CLSID is {0221264D-F909-4872-B7E6-B108D3FC8E8B}
DIAGNOSTICS: Registered InprocServer32 DLL is C:\Program Files\SAS\SharedFiles\SASProvidersForOLEDB\sasafloc0902.dll

Opened data Candy_Sales_Summary, Record count: 15000
Name Character
Region Character
OrderID Numeric
ProdID Numeric
Customer Numeric
Type Character
Product Character
Category Character
Subcategory Character
Retail_Price Numeric
Units Numeric
Discount Character
Date Numeric
Fiscal_Year Character
Fiscal_Quarter Character
Fiscal_Month_Num Numeric
Sale_Amount Numeric
 
10月 232010
 
The question came up on the SAS Enterprise Guide discussion forum: which do you prefer, List Report Wizard (PROC REPORT) or Summary Tables (PROC TABULATE)? And as with most SAS-related questions, the proper response is: "it depends." If you put these two PROCs in the ring with a Google Fight, PROC REPORT is a clear winner. Cynthia@SAS delivered a very thoughtful answer. She first carefully separated out the "point-and-click" versus "code-it-yourself" concerns, highlighting the capabilities and limitations of the tasks within SAS Enterprise Guide. Then, for the people who want follow a programming approach, Cynthia offers these insights:
But for the beginner, I think the biggest issue that I've heard about (and it surprised me) was whether they wanted the box area in the upper left hand corner of the table. For some folks, it doesn't matter that the BOX is there -- but I've had some students switch to PROC REPORT just because they didn't like the big empty box -- seriously -- the BOX area was a deal breaker for them -- they wanted the look that they got with PROC REPORT. The second issue that I've heard about is changing the code -- some people find the PROC REPORT code easier to understand than the TABULATE code -- if and only if they have to change the code. The TABULATE code is more abbreviated and "algebra-like" with all the * and = in the TABLE statement. Even though the REPORT code is more verbose -- the existence of a COLUMN statement makes it clear which variable will come first on the report row, which second, which third, etc, etc. And then the explicit DEFINE statement makes it easy to see what statistic will get used for a numeric variable, the usage for a category variable (ORDER, GROUP or ACROSS).
So, as you can see, sometimes it actually does come down to a "BOXing" match. <groan...sorry>
10月 092010
 
Yesterday, I was in the #raganSAS audience as David Pogue told me What's New and What's Next in the world of technology. David is a great presenter, and he really had the audience engaged as he talked about augmented reality, his world according to Twitter, and an iPhone app that comes pretty close to teaching the world to sing in perfect harmony (plus a cheater app that helps the world to sing like T. Pain).

On the world-harmony-for-profit theme, he shared information about web sites such as Kiva.org that facilitate microfinancing around the world. There are other microfinance sites that help people closer to home (for us in the USA), but as Pogue said, only Kiva.org can give you that "rosy glow" when you know you're helping people in developing countries.

Kiva.org opens financial doors for people who might not have another source of funding; but it also presents a platform rich in data for analysis and reporting. The folks at Kiva.org support web services that allow you to build applications that reference the data that they collect. They also offer "data snapshots": downloadable versions of all of the data they have on the loans, loan recipients, and the lenders who participate.

If you could get this data into SAS, what insights could you glean? What cool stats could you produce? What stories could you tell with charts and plots?

So, now we come to your homework assignment...if you choose to accept it. I've already done the grunt work of writing a SAS program that transforms the raw data (from its XML format) into SAS data sets. I've even written a sample step that produces a simple chart based on the current data.

My plot with SGPANEL

What can you do with this data using SAS? There are two data sets: lenders (over 400,000 records) and loans (over 165,000 records). They contain columns relating to geography (location of lenders and loan recipients), quantity (how many loans, what amounts), categories (loan purpose/industry, gender of recipient), and time (when the loan was granted/funded). You can read about the data on Kiva.org, and then create interesting reports using SAS.

Bonus assignment: can you improve my SAS program that pulls the data into SAS? I promise you: there is lots of room for optimization. (If I held off of this post until I perfected it, it would be ready for World Statistics Day 2011.) My implementation uses the XML libname engine, DATA step, and PROC SQL. It could be more automated (download the zip file with FILENAME URL, extract and process) and more efficient (faster appends, perhaps joining and summarizing for easier analysis). The program encounters a few errors when it runs, probably due to character encoding in the XML data. What would you do differently?

Here's how you can get started:

  • Download my SAS program and XML map files from this ZIP file here (small, just about 3K).
  • Extract the ZIP file to a new folder that your SAS session can access as the Kiva "root" folder (example: "C:\public\Kiva" or "/u/userid/Kiva").
  • Download the data snapshot from Kiva.org (big, about 150MB ZIP file). You need the XML format (not the JSON format).
  • Extract the data snapshot files into your Kiva "root" folder.
  • Modify my kivaProgram.sas file to set the Kiva data root folder, and set the number of loan XML files and lender XML files (as described in the comments in the program).

(By the way, I wrote this program entirely using SAS Enterprise Guide 4.3. So I know that you can run it from there, or within whatever SAS 9.2 environment you have access to.)

What better way to celebrate World Statistics Day than to compute some statistics for the world? Post your experiences back here in the comments, or use sasCommunity.org to share more details and post the link.

What to do IN BETWEEN dates

 data step, in operator, sas programming, SAS tips  What to do IN BETWEEN dates已关闭评论
9月 302010
 
I saw a suggestion arrive from a SAS customer who would like to see the IN operator extended to allow ranges of date values. For example, you can currently write a program that checks for values IN a collection or range of numbers:

data check;
  if x in (1:10) then result="match";
run;
 
This matches on the set of numbers 1 through 10, inclusive. The customer would like to see something like this supported, to match on the dates that fall within a given range.

data check;
   if x in ('01JAN2010'd : '01FEB2010'd) then result="match";
run;
 
It's a great suggestion. But in the meantime, you can satisfy this simple example by using the shorthand for the AND operator:

data check;
  if '01JAN2010'd <= x <='01FEB2010'd then result="match";
run;
 
There are many other ways to "skin this cat", including using PROC SQL and the BETWEEN condition. What's your favorite?