sas programming

4月 152013
 

Continuing with our stringed list theme, in this post I would like to talk about applying functions to those lists. So far we have developed quite a few tools that can manipulate lists in different ways, but what good are they if we can’t do something meaningful with the lists? We want to be able to do anything that we can normally do to one thing (be it a variable, table, number, or file) to each element in a list.

Here is my current version of this macro. It applies the specified function to each element in the list, collects the return values and returns them as a corresponding list.

%MACRO applyFunctionList(list, function);
   %LOCAL return i current_element current_functioncall list function;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
       %LET current_element = %stripp(%QSCAN(&list, &i, %STR( )));
       %LET current_functioncall = %SYSFUNC(TRANWRD(&function,
                                       %BQUOTE(||element||),
                                       %BQUOTE(&current_element) ));
       %LET return = &return &current_functioncall;
       %LET i = %EVAL(&i + 1);
   %END;
   %stripp(&return)
%MEND;

When passing a function in, it is important to use the %NRSTR() function so that nothing within it is resolved before its time. Also, because we are specifying a function in a general way, not specifically to any particular element, there is a convention that the macro assumes the following: the occurrence of the text “||element||” signals where the actual elements will be inserted prior to execution. In other words, wherever “||element||” occurs in the specified function, it will be replaced with the actual elements, one by one, and then run.

So for example, running:

%applyFunctionList(list=%STR(test test2 test3 test43),
          function=%NRSTR(%LENGTH(||element||)) );

Here is a variation of that macro. This one essentially does the same thing, the difference is that it does not collect return values. It is designed for use when only the side effects are desired, i.e. not expecting a return. It is useful in certain situations where %applyFunctionList() does not work because the resulting list is too big and violates the maximum macro variable length rule:

%MACRO applyFunctionList_noReturn(list, function);
   %LOCAL return i current_element current_functioncall list function;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
       %LET current_element = %stripp(%QSCAN(&list, &i, %STR( )));
       %SYSFUNC(TRANWRD(&function,
               %BQUOTE(||element||),
               %BQUOTE(&current_element) ))
       %LET i = %EVAL(&i + 1);
   %END;
%MEND;


Example 1: Adding missing variables to a data set

Suppose that we had a list of variables that are expected to be in a dataset. We might have information about them stored in some meta table (we’ll get more into that later on in the series), but for now let’s say we can just list them out and that they are all numeric.

Assume that we are getting this dataset from a third party that has simply omitted fields that do not apply. Unfortunately, we have a bunch of code that will not work unless all the expected fields are present.

One quick solution is to simply add those variables to the table with a null value. Depending on your situation this may or may not be a feasible solution, but it is a base from which you can build. You can add all the missing variables in one DATA step by putting the call to %applyFunctionList within it, using %differenceList() that I shared last time to determine the missing variables:

DATA buy_modified;
   SET sashelp.buy;
   %applyFunctionList(list = %differenceList(%STR(buy date amount type source currency), %LOWCASE(%varlist(sashelp.buy))),
                      function = %NRSTR(
                           ||element|| = .;
                       )
                      );
RUN; 

If your meta table had information such as data type, then we could pull that in to assign the appropriate null value for each variable. We’ll do more stuff like that later on.

Example 2: Recoding missing data
This is an example that you will do for yourself! So I guess it is more like an assignment, but it’s fun. I want you to modify all the variables that begin with “S” in the data set sashelp.Qtr111 so that missing data is recoded as 0. HINT: look at the %getFilteredList() macro I have provided.

Example 3: Importing a directory list into a data set

In this example we are going to make a tool that imports all the CSVs in a specified directory into the WORK library.

Listing the contents of a directory.
First I will share with you a function with two helper functions that lists the contents of a given directory. I’ve also used the regular expressions tools I shared last time. Note that we could use a list function to process the list rather than implementing a filtering feature in the %listDir macro. In many ways that would be a more modular approach, but there is a limitation with our current list framework - elements with spaces are not supported. All our functions interpret a space as a delimiter so an element with a space within it would be seen as two elements and not one. SAS doesn’t allow spaces in dataset names or variable names. However, Windows supports filenames and folder names with spaces so this can be a problem if you’re working with a lot of those types of filenames. One solution is to modify the tools to use a delimiter that will never be encountered within an element. You can also enforce a certain type of quoting scheme. For now I will continue with the space separated element approach to keep things simple:


/*  
  Returns whether or not a specified path is a directory.

  EXAMPLE:
    %PUT %isDir(%STR(C:\WINDOWS_DNE));
    %PUT %isDir(%STR(C:\WINDOWS));
    %PUT %isDir(%STR(c:\temp));
*/
%MACRO isDir(path);
  %LOCAL return fileref rc d_id;

  %LET return = 0;

  %IF %SYSFUNC(FILEEXIST(&path)) %THEN %DO;
    %LET rc = %SYSFUNC(FILENAME(fileref, &path));
    %LET d_id = %SYSFUNC(DOPEN(&fileref));
    %LET rc = %SYSFUNC(DCLOSE(&d_id));
    %IF &d_id > 0 %THEN %LET return = 1;
  %END;

  &return
%MEND;

/*  
  Returns whether or not a specified path is a file.

  EXAMPLE:
    %PUT %isFile(%STR(C:\WINDOWS_DNE));
    %PUT %isFile(%STR(C:\WINDOWS));
    %PUT %isFile(%STR(c:\temp));
*/
%MACRO isFile(path);
  %LOCAL return fileref;

  %LET return = 0;

  %IF %SYSFUNC(FILEEXIST(&path)) AND ^%isDir(&path) %THEN %LET return = 1;

  &return
%MEND;

/*  
  Lists the contents of a specified path. Shows contained files and folders by default.
  Can specify to show either all files (&filesOnly) or all directories (&directoriesOnly).

  EXAMPLE:
    %PUT %listDir(%STR(C:\WINDOWS_DOES_NOT_EXIST));
    %PUT %listDir(%STR(C:\WINDOWS));
    %PUT %listDir(%STR(C:\WINDOWS), filesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), directoriesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), filesOnly=1, directoriesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), regex=%NRSTR(..exe$));
*/
%MACRO listDir(path, regex, filesOnly=0, directoriesOnly=0);
  %LOCAL return fileref rc d_id num_members member_name;

  %LET return = %STR();

  %IF &filesOnly AND &directoriesOnly %THEN
    %PUT ERROR: filesOnly and directoriesOnly cannot both be 1;
  %ELSE %IF ^%SYSFUNC(FILEEXIST(&path)) %THEN 
    %PUT ERROR: &path DOES NOT EXIST.;
  %ELSE %IF ^%isDir(&path) %THEN
    %PUT ERROR: &path IS NOT A DIRECTORY.;
  %ELSE %DO;
    %LET rc = %SYSFUNC(FILENAME(fileref, &path));
    %LET d_id = %SYSFUNC(DOPEN(&fileref));
      %LET num_members = %SYSFUNC(DNUM(&d_id));
      %DO i = 1 %TO &num_members;
      %LET member_name = %SYSFUNC(DREAD(&d_id, &i));
      /* Append to return list if member type matches specified filters. 
        (Longer processing time than if in sepearate %IFs but a lot cleaner).*/
      %IF (%PRXMatch(/&regex/, %STR(&member_name))) AND
        ((&filesOnly AND %isFile(&path/&member_name)) OR
        (&directoriesOnly AND %isDir(&path/&member_name)) OR
        (^&filesOnly AND ^&directoriesOnly))
      %THEN
        %LET return = &return &member_name;
    %END;
    %LET rc = %SYSFUNC(DCLOSE(&d_id));
  %END;

  &return
%MEND;


Importing the list into a dataset.
Back to the importing tool we are creating - we first think of what we would do if we just wanted to import one CSV. I am using a generic PROC IMPORT here just as an example. It works, but it is not perfect. We will develop a more robust import process later on:

      PROC IMPORT DATAFILE="C:\sampleCSVs\example.csv"
           OUT=WORK.example
           DBMS=CSV
           REPLACE;
           GETNAMES=YES;
       RUN;

Next we know that we want to call this from within %applyFunctionList() so we modify all the things specific to our example.csv to refer to ||element|| instead:

      PROC IMPORT DATAFILE="&path\||element||.csv"
           OUT=WORK.||element||
           DBMS=CSV
           REPLACE;
           GETNAMES=YES;
       RUN;

Extracting the filenames from the pathname. Then we can add in the code to extract the CSV file names from the given path. Here is the full macro:

%MACRO importAllCSVs(path);
   %LOCAL fullfilenames filenames i;

   %LET fullfilenames = %listDir(&path, regex=%NRSTR(..csv$), filesOnly=1);
   %LET filenames = %applyFunctionList(list=&fullfilenames,
                           function=%NRSTR(%SUBSTR(||element||, 1, %EVAL(%INDEX(||element||, %STR(.csv))-1))));

   /*    Walk through each CSV file. */
   %applyFunctionList_noReturn(list=&filenames,
                      function=%NRSTR(
                           %PUT Importing "&path\||element||.csv";

                           PROC IMPORT DATAFILE="&path\||element||.csv"
                               OUT=WORK.||element||
                               DBMS=CSV
                               REPLACE;
                               GETNAMES=YES;
                           RUN;
                      ));
%MEND;    

I’ve used another call to %applyFunctionList() to extract the name portion of the files (excluding the extension). I’ve included a folder with subfolders, each with sample CSVs, in the accompanying files for this post. You can run the macro on it and see how it works. Try refactoring the macro to use only one %applyFunctionList() call, rather than using two separate calls.

It is interesting to note that you can call this macro from within another invocation of %applyFunctionList():

%applyFunctionList_noReturn(list=%STR(1004 1100 1202),
                  function=%NRSTR(%importAllCSVs(C:\sampleCSVs\||element||));
                             );

This will import all the CSVs from each of the three folders: “C:\sampleCSVs\1004”, “C:\sampleCSVs\1100” and “C:\sampleCSVs\1202”.

Try it yourself!
You can find this post’s files at http://dontsasme.com/SASUserBlogSeries/Post3/ . I’ve included necessary tools from previous posts as well. Play around with these ideas in your SAS playground. I hope you are starting to see the real world applications of this approach.

Keep in mind that we are still in the foundation phase. We are laying down the framework from which we will build really neat things. As always, I encourage you to take in as many ideas as you can, see what they are doing objectively and blend the things that are useful to you so that you can create your own.

Additional resources
Here is some reference material on main system calls used in Example 3:
FILEEXIST
FILENAME
DOPEN

tags: Macro tutorial, sas programming
3月 262013
 

Are you a new SAS user who isn't sure where to begin? Have no fear, because you're not alone. Here at SAS, we often receive questions from people who need help getting started with the software.

Getting Started with SAS is the topic of a SAS Talks session that I will co-host with SAS instructor Stacey Syphus on April 11, 2013 at 1pm ET. Tune in if you can, but don't worry if you can't -- the session will be recorded and available later as part of the SAS Talks series.

The goals of this session are to familiarize you with the content of typical SAS programs, teach you to read SAS log output, and show you how to run and modify programs. You won't become a SAS programming expert by attending this session, but it should provide a good foundation for further study. If you want to find out more, dozens of SAS programming books and training courses are available that cover every aspect of the craft. (And yes, SAS programmers often do regard themselves as craftspeople, of a sort.)

In this session, Stacey and I will cover:

  • a high-level review of SAS, its foundational technologies (including Base SAS) and how it works.
  • the different methods for working with SAS, including the SAS windowing environment and SAS Enterprise Guide.
  • how to determine which SAS approach is best for you.

Also, you'll learn where you can find resources at SAS that are available specifically for SAS beginners (including my SAS for Dummies book, now available in its second edition).

I hope that you'll join Stacey and me for the session. Don't wait to begin getting the most out of SAS.

Related links:

Getting started with SAS Enterprise Guide
SAS Enterprise Guide for the Programmer: What's in It for Me?

tags: SAS programming, sas talks, sas training
2月 252013
 

The SAS/IML language has a curious syntax that enables you to specify a "repetition factor" when you initialize a vector of literal values. Essentially, the language enables you to specify the frequency of an element. For example, suppose you want to define the following vector:

proc iml;
x = {1 2 2 2 2 3 3 3 3 3 4 4 5 5 5 5 5 5};

The vector has one 1, followed by four 2s, followed by five 3s, two 4s, and six 5s. An alternative syntax is to specify the "repetition factor" for each element by using a positive integer enclosed in brackets, like so:

x = {1 [4]2 [5]3 [2]4 [6]5};

You can think of the repetition factor as the frequency or number of occurrences of the value that follows the closing bracket.

Admittedly, this simple example does not save a lot of typing, but if a value is repeated tens or hundreds of times, this syntax not only saves typing, but also is less prone to error and is clearer to read. For example, repetition factors make it easy to specify the genders of 100 subjects:

gender = {[42]"Female" [58]"Male"};

I find this syntax interesting because I am not aware of many other languages that support repetition factors like this. FORTRAN has repetition factors for the FORMAT and the DATA statement. This syntax is supported in the SAS DATA step, which obviously preceded and inspired the SAS/IML syntax:

data A;
array x(18) (1 4*2 5*3 2*4 6*5);
run;
Furthermore, the SAS/SCL language had repetition factors for defining arrays. Does anyone know of other languages that support a similar syntax for initializing arrays?

tags: Getting Started, SAS Programming
1月 052013
 

Many of my blogging colleagues are taking this week to reflect on their top posts in 2012. Some are using the visitor statistics to rank the posts, but as Andy Ratcliffe points out, that gives short-shrift to the late-season articles.

This year, I'm going to use this space to recap a few of my favorite topics, grouped by theme. These are the topics that generated lots of interest, or perhaps they haven't yet generated the interest that I think they deserve -- so I'll plug them again.

New tools to improve your SAS Enterprise Guide life

I'll lead with my favorites - new custom tasks! They are easy to download, install, and use. Some of these might be The Missing Feature that you're looking for, so it's worth your time to check them out.

  • Copy Files task (for FTP-like file transfers). It supports wildcard notation and macro expressions, providing lots of flexibility.
  • Project Reviewer task - shows all of the tasks in your project, grouped by process flow. It summarizes the time elapsed by each task and each process flow, who last modified the tasks and when, and more. And you can automatically include this as a report within your project.
  • SAS Options Viewer - a SAS Enterprise Guide version of the OPTIONS window, showing the current values for all of your SAS system options.
  • SAS Macro Variable Viewer - a window to view all of your SAS macro variables. Also includes a tool to evaluate macro expressions instantly.
  • Search your project files with the EGP-Project-Search-inator.

64-bit SAS and 64-bit Windows

Many people are now using 64-bit versions of Windows, and they've updated their SAS installation to a 64-bit version to go with it. Was it all smooth sailing? Not quite -- here are the top 64-bit "gotchas".
Also see these related posts:

Fun for tinkerers: Windows PowerShell

I had a lot of fun playing with Windows PowerShell this year and figuring out how to integrate it with SAS. Here are a few of my favorite posts on the topic:

Cool SAS programming tips

In my job I write and maintain lots of SAS programs, so I'm always finding new ways to do things. That's a great thing about SAS: it's so huge that you can never know everything about it, so you'll always have something new to learn. Here are a few of the tips that I learned and/or shared over the past year:

With my new book completed and available, I hope to have time to learn and share even more in 2013. I've already got a couple of SAS Global Forum papers lined up; those are sure to provide me with some material very soon!

tags: 64-bit, PowerShell, SAS custom tasks, SAS programming