sas programming

7月 032013
 

When I work on SAS projects that create lots of files as results, it's often a requirement that those files be organized in a certain folder structure. The exact structure depends on the project, but here's an example:

/results
   |__ html
       |__ images
   |__ xls
   |__ data

Before you can have SAS populate these file folders, the folders have to actually exist. Traditionally, SAS programmers have handled this by doing one of the following:

  • Simply require that the folders exist before you run through the project. (This is the SEP method: Somebody Else's Problem.)
  • Use SAS statements and shell commands (via SYSTASK or other method) to create the folders as needed. The SAS-related archives are full of examples of this. It can get complex when you have to account for operating system differences, and whether operating system commands are even permitted (NOXCMD system option).

In SAS 9.3 there is a new system option that simplifies this: DLCREATEDIR. When this option is in effect, a LIBNAME statement that points to a non-existent folder will take matters into its own hands and create that folder.

Here's a simple example, along with the log messages:

options dlcreatedir;
libname newdir "/u/sascrh/brand_new_folder";

NOTE: Library NEWDIR was created.
NOTE: Libref NEWDIR was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: /u/sascrh/brand_new_folder

You might be thinking, "Hey, SAS libraries are for data, not for other junk like ODS results." Listen: we've just tricked the LIBNAME statement into making a folder for you -- you can use it for whatever you want. I won't tell.

In order to create a series of nested folders, you'll have to create each folder level in top-down order. For example, if you need a "results" and a "results/images" folder, you can do this:

%let outdir=%sysfunc(getoption(work));
/* create a results folder in the WORK area, with images subfolder */
options dlcreatedir;
libname res "&outdir./results";
libname img "&outdir./results/images";
/* clear the librefs - don't need them */
libname res clear;
libname img clear;

Or (and this is a neat trick) you can use a single concatenated LIBNAME statement to do the job:

libname res ("&outdir./results", "&outdir./results/images");
libname res clear;

NOTE: Libref RES was successfully assigned as follows: 
      Levels:           2
      Engine(1):        V9 
      Physical Name(1): /saswork/SAS_workC1960000554D_gsf0/results
      Engine(2):        V9 
      Physical Name(2): /saswork/SAS_workC1960000554D_gsf0/results/images

If you feel that folder creation is best left to the card-carrying professionals, don't worry! It is possible for a SAS admin to restrict use of the DLCREATEDIR option. That means that an admin can set the option (perhaps to NODLCREATEDIR to prohibit willy-nilly folder-making) and prevent end users from changing it. Just let them try, and they'll see:

13         options dlcreatedir;
                   ___________
                   36
WARNING 36-12: SAS option DLCREATEDIR is restricted by your Site 
Administrator and cannot be updated.

That's right -- DENIED! Mordac the Preventer would be proud. Job security: achieved!

Exact documentation for how to establish Restricted Options can be a challenge to find. You'll find it within the Configuration Guides for each platform in the Install Center. Here are quick links for SAS 9.3: Windows x64, Windows x86, and UNIX.

tags: DLCREATEDIR, Restricted Options, sas administration, SAS libraries, SAS programming, SAS tips
6月 192013
 

I am not a big fan of the macro language, and I try to avoid it when I write SAS/IML programs. I find that the programs with many macros are hard to read and debug. Furthermore, the SAS/IML language supports loops and indexing, so many macro constructs can be replaced by standard SAS/IML syntax.

Nevertheless, many SAS customers use macro constructs as part of their daily SAS programming tasks, and that practice often continues when they write SAS/IML programmers. A customer recently asked a question about the macro language that required knowledge of the way that macro variables are handled within a SAS/IML loop. This post shares my response.

Here's the crux of the customer's question. Run the following SAS/IML program and see if you can understand why it behaves as it does:

proc iml;
i = 7;
call symputx("j", i);    /* 1. Put value of i into macro variable j */
y1 = &j;                 /* 2. Assign y1 the value of &j            */
print y1;                /* success! */
 
y = j(1,4,.);
do i = 1 to ncol(y);     /* 3. Start processing the DO block of statements */
   call symputx("j", i); /* 4. Put value of i into macro variable j */
   y[i] = &j;            /* 5. Hmmmm, what does this do inside the loop? */
end;
print y;                 /* Not what you might expect? */

As you can see from the output, the first use of the macro variable (outside the DO loop), works as expected. But the second does not. The customer wanted to know why the elements of y are not set to 1, 2, 3, 4 within the loop.

The key point to remember about macro variables is that SAS code never sees them. Macro variables are evaluated by the macro preprocessor at parse time, not at run time. The SAS/IML code never sees &j, only the constant value that the preprocessor substitutes for &j.

It is also important to remember that PROC IML is an interactive procedure. (The "I" in IML stands for interactive!) Each statement or block of statements is parsed as it is encountered, as opposed to the DATA step, which parses the entire program before beginning execution.

Let's examine the program step-by-step to understand why the first construct works but the second does not. The following steps refer to the numbers in the program comments:

  1. The value of the SAS/IML scalar i is copied (as text) into the macro variable j.
  2. The statement is encountered. The value of the macro variable j is substituted by the macro preprocesser. Then the statement is executed. The SAS/IML variable y1 is assigned to the value 7.
  3. A DO loop is encountered by the SAS/IML parser. The parser finds the matching END statement and proceeds to parse the entire body of the loop in order to check for syntax errors. This parsing phase occurs exactly one time. Because the block of statements contain a macro variable, the macro preprocessor substitutes the value of the macro variable j, which is 7.
  4. For each iteration, the value of the SAS/IML scalar i is copied (as text) into the macro variable j.
  5. For each iteration, the ith element of the y vector is assigned the value 7. In particular, this statement does not contain a reference to the macro varible j.

To the casual reader of the program, it looks like &j will have a different value during each step of the iteration. But but it doesn't. The expression &j is resolved at parse time. SAS/IML parses the entire body of the DO loop once, before any execution occurs, and at parse time the expression &j is 7.

There is a way to get what the customer wants. The SYMGET function retrieves the value of a macro variable at run time. Therefore the following statements fill the vector y with the values 1 through 4:

do i = 1 to ncol(y);
   call symputx("j", i);
   y[i] = num(symget("j"));  /* get macro value at run time */
end;
print y;                     /* Yes! This is what we want! */

For me, this blog post emphasizes three facts:

  • Always remember that macro substitution is done by a preprocessor, which operates at parse time.
  • The SAS/IML language parses an entire block of statements (between the DO and END statements) one time before executing the block.
  • Mixing macro code and SAS/IML statements can be confusing and hard to debug. When you have the option, use SAS/IML language features instead of relying on macro language constructs.
tags: SAS Programming
6月 132013
 

If you've watched any of the demos for SAS Visual Analytics (or even tried it yourself!), you have probably seen this nifty exploration of multiple measures.

It's a way to look at how multiple measures are correlated with one another, using a diagonal heat map chart. The "stronger" the color you see in the matrix, the stronger the correlation.

You might have wondered (as I did): can I build a chart like this in Base SAS? The answer is Yes (of course). It won't match the speed and interactivity of SAS Visual Analytics, but you might still find this to be a useful way to explore your data.

The approach

There are four steps to achieving a similar visualization in the 9.3 version of Base SAS. (Remember that ODS Graphics procedures are part of Base SAS in SAS 9.3!)

  1. Use the CORR procedure to create a data set with a correlations matrix. Actually, several SAS procedures can create TYPE=CORR data sets, but I used PROC CORR with Pearson's correlation in my example.
  2. Use DATA step to rearrange the CORR data set to prepare it for rendering in a heat map.
  3. Define the graph "shell" using the Graph Template Language (GTL) and the HEATMAPPARM statement. You've got a lot of control over the graph appearance when you use GTL.
  4. Use the SGRENDER procedure to create the graph by applying the CORR data you prepared in the first two steps.

Here's an example of the result:

The program

I wrapped up the first two steps in a SAS macro. The macro first runs PROC CORR to create the matrix data, then uses DATA step to transform the result for the heat map.

Note: By default, the PROC CORR step will treat all of the numeric variables as measures to correlate. That's not always what you want, especially if your data contains categorical columns that just happen to be numbers. You can use DROP= or KEEP= data set options when using the macro to narrow the set of variables that are analyzed. The examples (near the end of this post) show how that's done.

/* Prepare the correlations coeff matrix: Pearson's r method */
%macro prepCorrData(in=,out=);
  /* Run corr matrix for input data, all numeric vars */
  proc corr data=&in. noprint
    pearson
    outp=work._tmpCorr
    vardef=df
  ;
  run;
 
  /* prep data for heat map */
data &out.;
  keep x y r;
  set work._tmpCorr(where=(_TYPE_="CORR"));
  array v{*} _numeric_;
  x = _NAME_;
  do i = dim(v) to 1 by -1;
    y = vname(v(i));
    r = v(i);
    /* creates a lower triangular matrix */
    if (i<_n_) then
      r=.;
    output;
  end;
run;
 
proc datasets lib=work nolist nowarn;
  delete _tmpcorr;
quit;
%mend;

You have to define the graph "shell" (or template) only once in your program. The template definition can then be reused in as many PROC SGRENDER steps as you want.

This heat map definition uses the fact that correlations are always between -1 and 1. Negative numbers show a negative correlation (ex: cars of higher weight will achieve a lower MPG). It's useful to select a range of colors that make it easier to discern the relationships. In my example, I went for "strong" contrasting colors on the ends with a muted color in the middle.

  /* Create a heat map implementation of a correlation matrix */
ods path work.mystore(update) sashelp.tmplmst(read);
 
proc template;
  define statgraph corrHeatmap;
   dynamic _Title;
    begingraph;
      entrytitle _Title;
      rangeattrmap name='map';
      /* select a series of colors that represent a "diverging"  */
      /* range of values: stronger on the ends, weaker in middle */
      /* Get ideas from http://colorbrewer.org                   */
      range -1 - 1 / rangecolormodel=(cxD8B365 cxF5F5F5 cx5AB4AC);
      endrangeattrmap;
      rangeattrvar var=r attrvar=r attrmap='map';
      layout overlay / 
        xaxisopts=(display=(line ticks tickvalues)) 
        yaxisopts=(display=(line ticks tickvalues));
        heatmapparm x = x y = y colorresponse = r / 
          xbinaxis=false ybinaxis=false
          name = "heatmap" display=all;
        continuouslegend "heatmap" / 
          orient = vertical location = outside title="Pearson Correlation";
      endlayout;
    endgraph;
  end;
run;

You can then use the macro and template together to produce each visualization. Here are some examples:

/* Build the graphs */
ods graphics /height=600 width=800 imagemap;
 
%prepCorrData(in=sashelp.cars,out=cars_r);
proc sgrender data=cars_r template=corrHeatmap;
   dynamic _title="Corr matrix for SASHELP.cars";
run;
 
%prepCorrData(in=sashelp.iris,out=iris_r);
proc sgrender data=iris_r template=corrHeatmap;
   dynamic _title= "Corr matrix for SASHELP.iris";
run;
 
/* example of dropping categorical numerics */
%prepCorrData(
  in=sashelp.pricedata(drop=region date product line),
  out=pricedata_r);
proc sgrender data=pricedata_r template=corrHeatmap;
  dynamic _title="Corr matrix for SASHELP.pricedata";
run;

Download complete program: corrmatrix_gtl.sas for SAS 9.3

Spoiler alert: These steps will only get easier in a future version of SAS 9.4, where similar built-in visualizations are planned for PROC CORR and elsewhere.

Related resources

You can apply a similar "heat-map-style" coloring to ODS tables by creating custom table templates.

If you haven't yet tried SAS Visual Analytics, it's worth a test-drive. Many of the visualizations are inspiring (as this blog post proves).

Finally, while I didn't dissect the GTL heat map definition in detail in this post, you can learn a lot more about GTL from Sanjay Matange and his team at the Graphically Speaking blog.

Acknowledgments

Big thanks to Rick Wicklin, who helped me quite a bit with this example. Rick validated my initial approach, and also provided valuable suggestions to improve the heat map and the statistical meaning of the example. He pointed me to http://colorbrewer.org, which provides examples of useful color ranges that you can apply in maps -- colors that are easy to read and don't distract from the meaning.

Rick told me that he is working on some related work coming up on his blog and within SAS 9.4, so you should watch his blog for additional insights.

tags: business analytics, GTL, ODS Graphics, SAS programming, Visual Analytics
6月 052013
 

One of the great things about SAS libraries is that you can write your programs to read and write data without having to worry about where the data lives. SAS data set on a file system? Oracle table in a database server? Hadoop data in Hive? For many SAS applications, the programmer can treat these very different data sources in exactly the same way.

...except for the cases where you can't treat them the same, because they are different. These different databases have different capabilities and behaviors, and sometimes you need to optimize your SAS programs to take advantage of those differences.

Recently, a colleague at SAS needed a reliable SAS macro-based method to detect the library engine for a given libref. Here's what we came up with:

%macro getEngine(libref);
 %global ENGINE;
 %let dsid=%sysfunc(open(sashelp.vlibnam(where=(libname="&libref.")),i));
 %if (&dsid ^= 0) %then %do;  
   %let engnum=%sysfunc(varnum(&dsid,ENGINE));
   %let rc=%sysfunc(fetch(&dsid));
   %let engine=%sysfunc(getvarc(&dsid,&engnum));
       %put &libref. ENGINE is &engine.;
   %let rc= %sysfunc(close(&dsid.));
 %end;
%mend;

This simple macro peeks inside SASHELP.VLIBNAM, a virtual table that maintains the vital characteristics of all of the active SAS libraries in a session. The macro queries the table for the engine name for the given library, and places the result in a macro variable named &ENGINE.

Here are some example uses and results. The first two calls are for built-in SAS libraries, which use the BASE engine (aliased to "V9"). The third call is for a MySQL library that I use for reporting on our WordPress database.

34         %getEngine(SASHELP);
SASHELP ENGINE is V9
35         %getEngine(WORK);
WORK ENGINE is V9
36         %getEngine(WPBLOGS);
WPBLOGS ENGINE is MYSQL
tags: data access, macro programming, SAS libraries, SAS programming
5月 252013
 
This week's SAS tip is from Ron Cody and his latest book Cody's Collection of Popular SAS Programming Tasks and How to Tackle Them. Learn more about this esteemed user and his many bestselling SAS books--as well as get additional bonus content on his author page. The following excerpt is from [...]
5月 132013
 

I've conducted a lot of univariate analyses in SAS, yet I'm always surprised when the best way to carry out the analysis uses a SAS regression procedure. I always think, "This is a univariate analysis! Why am I using a regression procedure? Doesn't a regression require at least two variables?"

Then it dawns on me. In the SAS regression procedures, a MODEL statement that does not contain explanatory variables simply performs a univariate analysis on the response variable. For example, when there are no explanatory variables, a classical regression analysis produces sample statistics, such as the mean, the variance, and the standard error of the mean, as shown in the following output:

/* estimates of mean, variance, and std err of mean */
ods select FitStatistics ParameterEstimates ;
proc reg data=sashelp.cars;
   model MPG_City= ;
quit;

The estimates in this PROC REG output are the same as are produced by the following call to PROC MEANS:

proc means data=sashelp.cars mean std stderr cv;
 var MPG_City;
run;

The ODS graphics that are produced by PROC REG also includes a histogram of the centered data and a normal Q-Q plot.

Here are some other instances in which a SAS regression procedure can be used to carry out a univariate analysis:

  • Robust estimates of scale and location. This univariate analysis is usually performed by using PROC UNIVARIATE with the ROBUSTSCALE option. However, you can also use the ROBUSTREG procedure to estimate robust statistics. The ROBUSTREG procedure provides four different estimation methods, which you can control by using the METHOD= option. For example, the following statements display robust estimates of location and scale:
    ods select ParameterEstimates;
    proc robustreg data=sashelp.cars method=M;
       model MPG_City= ;
    run;
  • Detection of univariate outliers. How can you detect univariate outliers in SAS? One way is to call the ROBUSTREG procedure! Again, there are four estimation methods that you can use.
    ods select DiagSummary;
    proc robustreg data=sashelp.cars method=LTS;
       model MPG_City= ;
       output out=out outlier=outliers;
    run;
     
    proc print data=out(where=(outliers=1));
       var make model type mpg_city;
    run;
  • Estimation of quantiles, with confidence intervals. Last week I showed how to use PROC UNIVARIATE to compute sample quantiles and confidence intervals. An alternative approach is to use the QUANTREG procedure, as follows:
    /* estimate univariate quantiles, CIs, and std errors */
    ods output ParameterEstimates=QntlCI;
    proc quantreg data=sashelp.cars;
       model MPG_City= / quantiles=0.025 0.2 0.8 0.975;
    run;
     
    proc print data=QntlCI; run;
  • Fit discrete parametric models to univariate data. I've previously shown how to use the GENMOD procedure to fit a Poisson model to data, and the same technique can be used to fit other discrete distributions, including the binomial, geometric, multinomial, negative binomial, and some zero-inflated distributions.

  • Fit parameters for a mixed density model to univariate data. I've previously demonstrated how to use the FMM procedure to fit a finite mixture distribution to data.

There are other examples, but I hope you see that the SAS regression procedures are useful for computing univariate statistics and analyses.

Do you have a favorite univariate analysis that can be accomplished by using a SAS regression procedure? Let me know about it by leaving a comment.

tags: Data Analysis, SAS Programming
4月 152013
 
We’ve gone mobile so you can too. Come see us at SAS Global Forum  and be one of the first to try our new e-Courses on the tablet!  SAS Programming 1 and SAS Programming 2, soon available on your tablet devices, also feature free access to the SAS Practice editor [...]
4月 152013
 

Continuing with our stringed list theme, in this post I would like to talk about applying functions to those lists. So far we have developed quite a few tools that can manipulate lists in different ways, but what good are they if we can’t do something meaningful with the lists? We want to be able to do anything that we can normally do to one thing (be it a variable, table, number, or file) to each element in a list.

Here is my current version of this macro. It applies the specified function to each element in the list, collects the return values and returns them as a corresponding list.

%MACRO applyFunctionList(list, function);
   %LOCAL return i current_element current_functioncall list function;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
       %LET current_element = %stripp(%QSCAN(&list, &i, %STR( )));
       %LET current_functioncall = %SYSFUNC(TRANWRD(&function,
                                       %BQUOTE(||element||),
                                       %BQUOTE(&current_element) ));
       %LET return = &return &current_functioncall;
       %LET i = %EVAL(&i + 1);
   %END;
   %stripp(&return)
%MEND;

When passing a function in, it is important to use the %NRSTR() function so that nothing within it is resolved before its time. Also, because we are specifying a function in a general way, not specifically to any particular element, there is a convention that the macro assumes the following: the occurrence of the text “||element||” signals where the actual elements will be inserted prior to execution. In other words, wherever “||element||” occurs in the specified function, it will be replaced with the actual elements, one by one, and then run.

So for example, running:

%applyFunctionList(list=%STR(test test2 test3 test43),
          function=%NRSTR(%LENGTH(||element||)) );

Here is a variation of that macro. This one essentially does the same thing, the difference is that it does not collect return values. It is designed for use when only the side effects are desired, i.e. not expecting a return. It is useful in certain situations where %applyFunctionList() does not work because the resulting list is too big and violates the maximum macro variable length rule:

%MACRO applyFunctionList_noReturn(list, function);
   %LOCAL return i current_element current_functioncall list function;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
       %LET current_element = %stripp(%QSCAN(&list, &i, %STR( )));
       %SYSFUNC(TRANWRD(&function,
               %BQUOTE(||element||),
               %BQUOTE(&current_element) ))
       %LET i = %EVAL(&i + 1);
   %END;
%MEND;


Example 1: Adding missing variables to a data set

Suppose that we had a list of variables that are expected to be in a dataset. We might have information about them stored in some meta table (we’ll get more into that later on in the series), but for now let’s say we can just list them out and that they are all numeric.

Assume that we are getting this dataset from a third party that has simply omitted fields that do not apply. Unfortunately, we have a bunch of code that will not work unless all the expected fields are present.

One quick solution is to simply add those variables to the table with a null value. Depending on your situation this may or may not be a feasible solution, but it is a base from which you can build. You can add all the missing variables in one DATA step by putting the call to %applyFunctionList within it, using %differenceList() that I shared last time to determine the missing variables:

DATA buy_modified;
   SET sashelp.buy;
   %applyFunctionList(list = %differenceList(%STR(buy date amount type source currency), %LOWCASE(%varlist(sashelp.buy))),
                      function = %NRSTR(
                           ||element|| = .;
                       )
                      );
RUN; 

If your meta table had information such as data type, then we could pull that in to assign the appropriate null value for each variable. We’ll do more stuff like that later on.

Example 2: Recoding missing data
This is an example that you will do for yourself! So I guess it is more like an assignment, but it’s fun. I want you to modify all the variables that begin with “S” in the data set sashelp.Qtr111 so that missing data is recoded as 0. HINT: look at the %getFilteredList() macro I have provided.

Example 3: Importing a directory list into a data set

In this example we are going to make a tool that imports all the CSVs in a specified directory into the WORK library.

Listing the contents of a directory.
First I will share with you a function with two helper functions that lists the contents of a given directory. I’ve also used the regular expressions tools I shared last time. Note that we could use a list function to process the list rather than implementing a filtering feature in the %listDir macro. In many ways that would be a more modular approach, but there is a limitation with our current list framework - elements with spaces are not supported. All our functions interpret a space as a delimiter so an element with a space within it would be seen as two elements and not one. SAS doesn’t allow spaces in dataset names or variable names. However, Windows supports filenames and folder names with spaces so this can be a problem if you’re working with a lot of those types of filenames. One solution is to modify the tools to use a delimiter that will never be encountered within an element. You can also enforce a certain type of quoting scheme. For now I will continue with the space separated element approach to keep things simple:


/*  
  Returns whether or not a specified path is a directory.

  EXAMPLE:
    %PUT %isDir(%STR(C:\WINDOWS_DNE));
    %PUT %isDir(%STR(C:\WINDOWS));
    %PUT %isDir(%STR(c:\temp));
*/
%MACRO isDir(path);
  %LOCAL return fileref rc d_id;

  %LET return = 0;

  %IF %SYSFUNC(FILEEXIST(&path)) %THEN %DO;
    %LET rc = %SYSFUNC(FILENAME(fileref, &path));
    %LET d_id = %SYSFUNC(DOPEN(&fileref));
    %LET rc = %SYSFUNC(DCLOSE(&d_id));
    %IF &d_id > 0 %THEN %LET return = 1;
  %END;

  &return
%MEND;

/*  
  Returns whether or not a specified path is a file.

  EXAMPLE:
    %PUT %isFile(%STR(C:\WINDOWS_DNE));
    %PUT %isFile(%STR(C:\WINDOWS));
    %PUT %isFile(%STR(c:\temp));
*/
%MACRO isFile(path);
  %LOCAL return fileref;

  %LET return = 0;

  %IF %SYSFUNC(FILEEXIST(&path)) AND ^%isDir(&path) %THEN %LET return = 1;

  &return
%MEND;

/*  
  Lists the contents of a specified path. Shows contained files and folders by default.
  Can specify to show either all files (&filesOnly) or all directories (&directoriesOnly).

  EXAMPLE:
    %PUT %listDir(%STR(C:\WINDOWS_DOES_NOT_EXIST));
    %PUT %listDir(%STR(C:\WINDOWS));
    %PUT %listDir(%STR(C:\WINDOWS), filesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), directoriesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), filesOnly=1, directoriesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), regex=%NRSTR(..exe$));
*/
%MACRO listDir(path, regex, filesOnly=0, directoriesOnly=0);
  %LOCAL return fileref rc d_id num_members member_name;

  %LET return = %STR();

  %IF &filesOnly AND &directoriesOnly %THEN
    %PUT ERROR: filesOnly and directoriesOnly cannot both be 1;
  %ELSE %IF ^%SYSFUNC(FILEEXIST(&path)) %THEN 
    %PUT ERROR: &path DOES NOT EXIST.;
  %ELSE %IF ^%isDir(&path) %THEN
    %PUT ERROR: &path IS NOT A DIRECTORY.;
  %ELSE %DO;
    %LET rc = %SYSFUNC(FILENAME(fileref, &path));
    %LET d_id = %SYSFUNC(DOPEN(&fileref));
      %LET num_members = %SYSFUNC(DNUM(&d_id));
      %DO i = 1 %TO &num_members;
      %LET member_name = %SYSFUNC(DREAD(&d_id, &i));
      /* Append to return list if member type matches specified filters. 
        (Longer processing time than if in sepearate %IFs but a lot cleaner).*/
      %IF (%PRXMatch(/&regex/, %STR(&member_name))) AND
        ((&filesOnly AND %isFile(&path/&member_name)) OR
        (&directoriesOnly AND %isDir(&path/&member_name)) OR
        (^&filesOnly AND ^&directoriesOnly))
      %THEN
        %LET return = &return &member_name;
    %END;
    %LET rc = %SYSFUNC(DCLOSE(&d_id));
  %END;

  &return
%MEND;


Importing the list into a dataset.
Back to the importing tool we are creating - we first think of what we would do if we just wanted to import one CSV. I am using a generic PROC IMPORT here just as an example. It works, but it is not perfect. We will develop a more robust import process later on:

      PROC IMPORT DATAFILE="C:\sampleCSVs\example.csv"
           OUT=WORK.example
           DBMS=CSV
           REPLACE;
           GETNAMES=YES;
       RUN;

Next we know that we want to call this from within %applyFunctionList() so we modify all the things specific to our example.csv to refer to ||element|| instead:

      PROC IMPORT DATAFILE="&path\||element||.csv"
           OUT=WORK.||element||
           DBMS=CSV
           REPLACE;
           GETNAMES=YES;
       RUN;

Extracting the filenames from the pathname. Then we can add in the code to extract the CSV file names from the given path. Here is the full macro:

%MACRO importAllCSVs(path);
   %LOCAL fullfilenames filenames i;

   %LET fullfilenames = %listDir(&path, regex=%NRSTR(..csv$), filesOnly=1);
   %LET filenames = %applyFunctionList(list=&fullfilenames,
                           function=%NRSTR(%SUBSTR(||element||, 1, %EVAL(%INDEX(||element||, %STR(.csv))-1))));

   /*    Walk through each CSV file. */
   %applyFunctionList_noReturn(list=&filenames,
                      function=%NRSTR(
                           %PUT Importing "&path\||element||.csv";

                           PROC IMPORT DATAFILE="&path\||element||.csv"
                               OUT=WORK.||element||
                               DBMS=CSV
                               REPLACE;
                               GETNAMES=YES;
                           RUN;
                      ));
%MEND;    

I’ve used another call to %applyFunctionList() to extract the name portion of the files (excluding the extension). I’ve included a folder with subfolders, each with sample CSVs, in the accompanying files for this post. You can run the macro on it and see how it works. Try refactoring the macro to use only one %applyFunctionList() call, rather than using two separate calls.

It is interesting to note that you can call this macro from within another invocation of %applyFunctionList():

%applyFunctionList_noReturn(list=%STR(1004 1100 1202),
                  function=%NRSTR(%importAllCSVs(C:\sampleCSVs\||element||));
                             );

This will import all the CSVs from each of the three folders: “C:\sampleCSVs\1004”, “C:\sampleCSVs\1100” and “C:\sampleCSVs\1202”.

Try it yourself!
You can find this post’s files at http://dontsasme.com/SASUserBlogSeries/Post3/ . I’ve included necessary tools from previous posts as well. Play around with these ideas in your SAS playground. I hope you are starting to see the real world applications of this approach.

Keep in mind that we are still in the foundation phase. We are laying down the framework from which we will build really neat things. As always, I encourage you to take in as many ideas as you can, see what they are doing objectively and blend the things that are useful to you so that you can create your own.

Additional resources
Here is some reference material on main system calls used in Example 3:
FILEEXIST
FILENAME
DOPEN

tags: Macro tutorial, sas programming
3月 262013
 

Are you a new SAS user who isn't sure where to begin? Have no fear, because you're not alone. Here at SAS, we often receive questions from people who need help getting started with the software.

Getting Started with SAS is the topic of a SAS Talks session that I will co-host with SAS instructor Stacey Syphus on April 11, 2013 at 1pm ET. Tune in if you can, but don't worry if you can't -- the session will be recorded and available later as part of the SAS Talks series.

The goals of this session are to familiarize you with the content of typical SAS programs, teach you to read SAS log output, and show you how to run and modify programs. You won't become a SAS programming expert by attending this session, but it should provide a good foundation for further study. If you want to find out more, dozens of SAS programming books and training courses are available that cover every aspect of the craft. (And yes, SAS programmers often do regard themselves as craftspeople, of a sort.)

In this session, Stacey and I will cover:

  • a high-level review of SAS, its foundational technologies (including Base SAS) and how it works.
  • the different methods for working with SAS, including the SAS windowing environment and SAS Enterprise Guide.
  • how to determine which SAS approach is best for you.

Also, you'll learn where you can find resources at SAS that are available specifically for SAS beginners (including my SAS for Dummies book, now available in its second edition).

I hope that you'll join Stacey and me for the session. Don't wait to begin getting the most out of SAS.

Related links:

Getting started with SAS Enterprise Guide
SAS Enterprise Guide for the Programmer: What's in It for Me?

tags: SAS programming, sas talks, sas training