sas programming

8月 102018
 

Hash tables are a very powerful and flexible data structure. Most SAS applications of hash tables focus on just one of their many powerful facilities: table lookup. Hash tables are a fantastic table lookup tool and their use for that should never be diminished. However, hash tables can do so [...]

The post Five things you (probably) don’t know you can do with a hash table appeared first on SAS Learning Post.

7月 302018
 

A programmer recently asked a question on a SAS discussion forum about design matrices for categorical variables. He had generated a design matrix by using PROC GLMMOD and wanted to use the design columns in a subsequent procedure. However, the columns were named COL1, COL2, COL3,..., so he couldn't tell which dummy variables correspond to each categorical variable. The following example illustrates his situation for the Weight_Status and Smoking_Status variables in the Sashelp.Heart data set:

proc glmmod data=Sashelp.Heart outdesign=GLMDesign;
   class Weight_Status Smoking_Status;
   model Cholesterol = Weight_Status Smoking_Status;
   ods select Parameters;
run;
 
proc contents data=GLMDesign varnum short; run;
The association betwen columns in a design matrix and levels of the original categorical variables

The "Parameters" table shows the association between columns of the design matrix and levels of the categorical variables in the model. The output from PROC CONTENTS shows that the columns of the design matrix (as stored in the GLMDesign data set) are named COL1, COL2, and so forth. On the discussion forum, I showed how to save the "Parameters" table to a SAS data and use a DATA _NULL_ step to form macro variables that you can use to associate the design columns to the original variables.

In retrospect, I missed a golden opportunity to mention that the GLMMOD procedure (which always produces a singular design matrix) is not as friendly or powerful as other procedures in SAS that can generate design matrices. In particular, the GLMSELECT and TRANSREG procedures can create design matrices for many different parameterizations of the classification variables. Furthermore, these procedures automatically create macro variables that tell you the names of the columns in the design matrix.

As a reminder, a major reason to create a design matrix is to perform an analysis with a SAS procedure that does not support a CLASS statement. For example, the documentation of the MCMC procedure shows how to use PROC TRANSREG to create a design matrix as preparation for performing a Bayesian regression analysis. For details of the many ways to generate design matrices in SAS, see my previous article, "Four ways to create a design matrix in SAS."

GLMSELECT: An easier way to associate columns of a design matrix

The GLMSELECT procedure supports the OUTDESIGN= option, which enables you to output a design matrix for the variables in a regression model. The GLMSELECT procedure has the following advantages of the GLMMOD procedure:

  • The procedure supports the EFFECT statement, which you can use to define spline effects, collection effects, and more. The OUTDESIGN= option creates columns for any effect that you can define.
  • The procedure supports nonsingular parameterizations such as the 'effect' or 'reference' parameterizations (PARAM=EFFECT or PARAM=REF, respectively).
  • The dummy variables have meaningful names of the form VarName_Level, where VarName is the name of a categorical variable and Level is one of its values.
  • The procedure automatically creates a macro variable (called _GLSMod) that contains the names of the columns of the design matrix.

The following statements create a design matrix for the Weight_Status and Smoking_Status variables. The design matrix uses the 'effect coding' parameterization and is written to the GLSDesign data set. The value of the _GLSMod macro is displayed in the SAS log and can be used in a subsequent procedure call:

proc glmselect data=Sashelp.Heart outdesign(fullmodel)=GLSDesign noprint;
   class Weight_Status(ref='Normal')
         Smoking_Status(ref='Non-smoker') / param=effect;
   model Cholesterol = Weight_Status Smoking_Status / selection=none; /* include ALL dummy variables */
run;
 
ods select Position;
proc contents data=GLSDesign varnum; run;   /* display all variables in OUTDESIGN= data set */
 
%put &_GLSMod;                              /* names of columns in design matrix */
--- SAS Log ---
Weight_Status_Overweight Weight_Status_Underweight
Smoking_Status_Heavy__16_25_ Smoking_Status_Light__1_5_
Smoking_Status_Moderate__6_15_ Smoking_Status_Very_Heavy____25_

The output from PROC CONTENTS show the variables in the GLSDesign data set, which are INTERCEPT, CHOLESTEROL (the response variable in the model), and the dummy variables listed in the _GLSMod macro variable. By default, the dummy variables have the pattern VarName_Level, although you can use the OUTDESIGN(PREFIX=prefix) option to generate column names of the form prefix1, prefix2, and so forth, where prefix is a user-specified prefix. You can see that values such as "Heavy (16-25)" are transformed into valid names for SAS variables.

In this example, the _GLSMOD macro contains the names of ALL dummy variables because the SELECTION=NONE option is used. If you use an option (such as SELECTION=LASSO) to perform variable selection, the _GLSMOD variable will contain the names of the dummy variables that are selected in the final model. This means that you can use the _GLSMOD variable to perform additional analyses. For example, if you want to use POC REG to output collinearity diagnostics for the variables in the final model, you could execute the following:

/* use the variables in the final model in a procedure that does not support a CLASS statement */
proc reg data=GLSDesign plots=none; 
   model Cholesterol = &_GLSMod / collin;
run;

TRANSREG: Dummy variables and transformed variable

In a similar way, the TRANSREG procedure supports the DESIGN option, which adds dummy variables to the output data set, as shown in the following call. The syntax is different, but the dummy variables in the TRGDesign data set are formed by the same 'effect coding' as in the previous example. The procedure automatically creates a macro variable (called _TRGInd) that contains the names of the columns of the design matrix.

proc transreg data=Sashelp.Heart design;
   model class(Weight_Status Smoking_Status / EFFECTS 
          zero="Normal"      "Non-smoker");
   output out=TRGDesign;
run;
 
%put &_TRGInd;
Weight_StatusOverweight Weight_StatusUnderweight Smoking_StatusHeavy__16_25_ Smoking_StatusLight__1_5_
Smoking_StatusModerate__6_15_ Smoking_StatusVery_Heavy____25_

The dummy variables contain the same values as in the previous example, but the two procedures construct slightly different names. The GLMSELECT names contain an extra underscore. For example, a dummy variable in the GLSDesign data set is Weight_Status_Overweight whereas the corresponding variable in the TRGDesign data set is Weight_StatusOverweight.

It is worth noting the PROC TRANSREG also supports macro variables that specify the names of dependent variables and transformed variables. You can even use the MACRO option to specify the name of the macro variables that are created. For details, see the documentation for the PROC TRANSREG statement.

In summary, when you generate a design matrix by using the GLMSELECT or TRANSREG procedures, the procedures create dummy variables that have meaningful names of the form VarName_Level or VarNameLevel. The procedures each create a macro variable that contains the names of the dummy variables. The _GLSMOD macro that is created by PROC GLMSELECT contains the names of dummy variables in the final selected model, so use SELECTION=NONE if you want all names. These macros make it easy for a programmer to refer to columns of the design matrix.

The post Meaningful names for columns of a design matrix appeared first on The DO Loop.

7月 172018
 

Automation for SAS Administrators - deleting old filesAttention SAS administrators! When running SAS batch jobs on schedule (or manually), they usually produce date-stamped SAS logs which are essential for automated system maintenance and troubleshooting. Similar log files have been created by various SAS infrastructure services (Metadata server, Mid-tier servers, etc.) However, as time goes on, the relevance of such logs diminishes while clutter stockpiles. In some cases, this may even lead to disk space problems.

There are multiple ways to solve this problem, either by deleting older log files or by stashing them away for auditing purposes (zipping and archiving). One solution would be using Unix/Linux or Windows scripts run on schedule. The other is much "SAS-sier."

Let SAS clean up its "mess"

We are going to write a SAS code that you can run manually or on schedule, which for a specified directory (folder) deletes all .log files that are older than 30 days.
First, we need to capture the contents of that directory, then select those file names with extension .log, and finally, subset that file selection to a sub-list where Date Modified is less than Today's Date minus 30 days.

Perhaps the easiest way to get the contents of a directory is by using the X statement (submitting DOS’ DIR command from within SAS with a pipe (>) option, e.g.

x 'dir > dirlist.txt';

or using pipe option in the filename statement:

filename DIRLIST pipe 'dir "C:\Documents and Settings"';

However, SAS administrators know that in many organizations, due to cyber-security concerns IT department policies do not allow enabling the X statement by setting SAS XCMD system option to NOXCMD (XCMD system option for Unix). This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration. In this case, no operating system command can be executed from within SAS. Try running any X statement in your environment; if it is disabled you will get the following ERROR in the SAS log:

ERROR: Shell escape is not valid in this SAS session.

To avoid that potential roadblock, we’ll use a different technique of capturing the contents of a directory along with file date stamps.

Macro to delete old log files in a directory/folder

The following SAS macro cleans up a Unix directory or a Windows folder removing old .log files. I must admit that this statement is a little misleading. The macro is much more powerful. Not only it can delete old .log files, it can remove ANY file types specified by their extension.

%macro mr_clean(dirpath=,dayskeep=30,ext=.log);
   data _null_;
      length memname $256;
      deldate = today() - &dayskeep;
      rc = filename('indir',"&dirpath");
      did = dopen('indir');
      if did then
      do i=1 to dnum(did);
         memname = dread(did,i);
         if reverse(trim(memname)) ^=: reverse("&ext") then continue;
         rc = filename('inmem',"&dirpath/"!!memname);
         fid = fopen('inmem');
         if fid then 
         do;
            moddate = input(finfo(fid,'Last Modified'),date9.);
            rc = fclose(fid);
            if . < moddate <= deldate then rc = fdelete('inmem');
         end;
      end; 
      rc = dclose(did);
      rc = filename('inmem');
      rc = filename('indir');
   run;
%mend mr_clean;

This macro has 3 parameters:

  • dirpath - directory path (required);
  • dayskeep - days to keep (optional, default 30);
  • ext - file extension (optional, default .log).

This macro works in both Windows and Linux/Unix environments. Please note that dirpath and ext parameter values are case-sensitive.

Here are examples of the macro invocation:

1. Using defaults

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean)

With this macro call, all files with extension .log (default) which are older than 30 days (default) will be deleted from the specified directory.

2. Using default extension

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=20)

With this macro call, all files with extension .log (default) which are older than 20 days will be deleted from the specified directory.

3. Using explicit parameters

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=10,ext=.xls)

With this macro call, all files with extension .xls (Excel files) which are older than 10 days will be deleted from the specified directory.

Old file deletion SAS macro code explanation

The above SAS macro logic and actions are done within a single data _NULL_ step. First, we calculate the date from which file deletion starts (going back) deldate = today() - &dayskeep. Then we assign fileref indir to the specified directory &dirpath:

rc = filename('indir',"&dirpath");

Then we open that directory:

did = dopen('indir');

and if it opened successfully (did>0) we loop through its members which can be either files or directories:

do i=1 to dnum(did);

In that loop, first we grab the directory member name:

memname = dread(did,i);

and look for our candidates for deletion, i.e., determine if that name (memname) ends with "&ext". In order to do that we reverse both character strings and compare their first characters. If they don’t match (^=: operator) then we are not going to touch that member - the continue statement skips to the end of the loop. If they do match it means that the member name does end with "&ext" and it’s a candidate for deletion. We assign fileref inmem to that member:

rc = filename('inmem',"&dirpath/"!!memname);

Note that forward slash (/) Unix/Linux path separator in the above statement is also a valid path separator in Windows. Windows will convert it to back slash (\) for display purposes, but it interprets forward slash as a valid path separator along with back slash.
Then we open that file using fopen function:

fid = fopen('inmem');

If inmem is a directory, the opening will fail (fid=0) and we will skip the following do-group that is responsible for the file deletion. If it is file and is opened successfully (fid>0) then we go through the deletion do-group where we first grab the file Last Modified date as moddate, close the file, and if moddate <= deldate we delete that file:

rc = fdelete('inmem');

Then we close the directory and un-assign filerefs for the members and directory itself.

Deleting old files across multiple directories/folders

Macro %mr_clean is flexible enough to address various SAS administrators needs. You can use this macro to delete old files of various types across multiple directories/folders. First, let’s create a driver table as follows:

data delete_instructions;
   length days 8 extn $9 path $256;
   infile datalines truncover;
   input days 1-2 extn $ 4-12 path $ 14-270;
   datalines;
30 .log      C:\PROJECTS\Automatically deleting old files\Logs1
20 .log      C:\PROJECTS\Automatically deleting old files\Logs2
25 .txt      C:\PROJECTS\Automatically deleting old files\Texts
35 .xls      C:\PROJECTS\Automatically deleting old files\Excel
30 .sas7bdat C:\PROJECTS\Automatically deleting old files\SAS_Backups
;

This driver table specifies how many days to keep files of certain extensions in each directory. In this example, perhaps the most beneficial deletion applies to the SAS_Backups folder since it contains SAS data tables (extension .sas7bdat). Data files typically have much larger size than SAS log files, and therefore their deletion frees up much more of the valuable disk space.

Then we can use this driver table to loop through its observations and dynamically build macro invocations using CALL EXECUTE:

data _null_;
   set delete_instructions;
   s = cats('%nrstr(%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,'))');
   call execute(s);
run;

Alternatively, we can use DOSUBL() function to dynamically execute our macro at every iteration of the driver table:

data _null_;
   set delete_instructions;
   s = cats('%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,')');
   rc = dosubl(s);
run;

Put it on autopilot

When it comes to cleaning your old files (logs, backups, etc.), the best practice for SAS administrators is to schedule your cleaning job to automatically run on a regular basis. Then you can forget about this chore around your "SAS house" as %mr_clean macro will do it quietly for you without the noise and fuss of a Roomba.

Your turn, SAS administrators

Would you use this approach in your SAS environment? Any suggestions for improvement? How do you deal with old log files? Other old files? Please share below.

SAS administrators tip: Automatically deleting old SAS logs was published on SAS Users.

7月 172018
 

The Base SAS DATA step has been a powerful tool for many years for SAS programmers. But as data sets grow and programmers work with massively parallel processing (MPP) computing environments such as Teradata, Hadoop or the SAS High-Performance Analytics grid, the data step remains stubbornly single-threaded. Welcome DS2 – [...]

The post What DS2 can do for the DATA step appeared first on SAS Learning Post.

7月 062018
 

SAS programmers have long wanted the ability to control the flow of their SAS programs without having to resort to complex SAS macro programming. With SAS 9.4 Maintenance 5, it's now supported! You can now recently came to light on SAS Support Communities. (Thanks to Super User Tom for asking about it.)

Prior to this change, if you wanted to check a condition -- say, whether a data set exists -- before running a PROC, you had to code it within a macro routine. It would look something like this:

/* capture conditional logic in macro */
%macro PrintIfExists();
 %if %sysfunc(exist(work.result)) %then
  %do;
    proc means data=work.result;
    run;
  %end; 
%else
  %do;
    %PUT WARNING: Missing WORK.RESULT - report process skipped.;
  %end;
%mend;
 
/* call the macro */
%PrintIfExists();

Now you can simplify this code to remove the %MACRO/%MEND wrapper and the macro call:

/* If a file exists, take an action */
/* else fail gracefully */
%if %sysfunc(exist(work.result)) %then
  %do;
    proc means data=work.result;
    run;
  %end;
%else
  %do;
    %PUT WARNING: Missing WORK.RESULT - report process skipped.;
  %end;

Here are some additional ideas for how to use this feature. I'm sure you'll be able to think of many more!

Run "debug-level" code only when in debug mode

When developing your code, it's now easier to leave debugging statements in and turn them on with a simple flag.

/* Conditionally produce debugging information */
%let _DEBUG = 0; /* set to 1 for debugging */
%if &_DEBUG. %then
  %do;
    proc print data=sashelp.class(obs=10);
    run;
  %end;

If you have code that's under construction and should never be run while you work on other parts of your program, you can now "IF 0" out the entire block. As a longtime C and C++ programmer, this reminds me of the "#if 0 / #endif" preprocessor directives as an alternative for commenting out blocks of code. Glad to see this in SAS!

/* skip processing of blocks of code */
/* like #if 0 / #endif in C/C++      */
%if 0 %then
  %do;
    proc ToBeDetermined;
      READMYMIND = Yes;
    run;
  %end;

Run code only on a certain day of the week

I have batch jobs that run daily, but that send e-mail to people only one day per week. Now this is easier to express inline with conditional logic.

/*If it's Monday, send a weekly report by email */
%if %sysfunc(today(),weekday1.)=2 %then
  %do;
    options emailsys=smtp emailhost=myhost.company.com;
    filename output email
      subject = "Weekly report for &SYSDATE."
      from = "SAS Dummy <sasdummy@sas.com>"
      to = "knowledgethirster@curious.net"
      ct ='text/html';
 
  ods tagsets.msoffice2k(id=email) 
    file=OUTPUT(title="Important Report!")
    style=seaside;
   title "The Weekly Buzz";
   proc print data=amazing.data;
   run;
  ods tagsets.msoffice2k(id=email) close;
  %end;

Check a system environment variable before running code

For batch jobs especially, system environment variables can be a rich source of information about the conditions under which your code is running. You can glean user ID information, path settings, network settings, and so much more. If your SAS program needs to pick up cues from the running environment, this is a useful method to accomplish that.

/* Check for system environment vars before running code */
%if %sysfunc(sysexist(ORACLE_HOME)) %then
  %do;
    %put NOTE: ORACLE client is installed.;
    /* assign an Oracle library */
    libname ora oracle path=corp schema=alldata authdomain=oracle;
  %end;

Limitations of %IF/%THEN in open code

As awesome as this feature is, there are a few rules that apply to the use of the construct in open code. These are different from what's allowed within a %MACRO wrapper.

First rule: your %IF/%THEN must be followed by a %DO/%END block for the statements that you want to conditionally execute. The same is true for any statements that follow the optional %ELSE branch of the condition.

And second: no nesting of multiple %IF/%THEN constructs in open code. If you need that flexibility, you can do that within a %MACRO wrapper instead.

And remember, this works only in SAS 9.4 Maintenance 5 and later. That includes the most recent release of SAS University Edition, so if you don't have the latest SAS release in your workplace, this gives you a way to kick the tires on this feature if you can't wait to try it.

The post Using %IF-%THEN-%ELSE in SAS programs appeared first on The SAS Dummy.

6月 302018
 

The Geo Map Visualization has several built-in geographical units, including country and region names and codes, US state names and codes, and US zip codes. You can also define your own geographic units. This paper describes how to identify any geographic point of interest, or collection of points, on a map to create custom maps in SAS.

The post Custom Maps in SAS: My Neighborhood appeared first on SAS Learning Post.

6月 192018
 

CAS DATA StepCloud Analytic Services (CAS) is really exciting. It’s open. It’s multi-threaded. It’s distributed. And, best of all for SAS programmers, it’s SAS. It looks like SAS. It feels like SAS. In fact, you can even run DATA Step in CAS. But, how does DATA Step work in a multi-threaded, distributed context? What’s new? What’s different? If I’m a SAS programming wizard, am I automatically a CAS programming wizard?

While there are certain _n_ automatic variable as shown below:

DATA tableWithUniqueID;
SET tableWithOutUniqueID; 
 
        uniqueID = _n_;
 
run;

CAS DATA Step

Creating a unique ID in CAS DATA Step is a bit more complicated. Each thread maintains its own _n_. So, if we just use _n_, we’ll get duplicate IDs. Each thread will produce an uniqueID field value of 1, 2..and so on. …. When the thread output is combined, we’ll have a bunch of records with an uniqueID of 1 and a bunch with an uniqueID of 2…. This is not useful.

To produce a truly unique ID, you need to augment _n_ with something else. _threadID_ automatic variable can help us get our unique ID as shown below:

DATA tableWithUniqueID;
SET tableWithOutUniqueID;
 
        uniqueID = put(_threadid_,8.) || || '_' || Put(_n_,8.);
 
run;

While there are surely other ways of doing it, concatenating _threadID_ with _n_ ensures uniqueness because the _threadID_ uniquely identifies a single thread and _n_ uniquely identifies a single row output by that thread.

Aggregation with DATA Step

Now, let’s look at “whole table” aggregation (no BY Groups).

SAS DATA Step

Aggregating an entire table in SAS DATA Step usually looks something like below. We create an aggregator field (totSalesAmt) and then add the detail records’ amount field (SaleAmt) to it as we process each record. Finally, when there are no more records (eof), we output the single aggregate row.

DATA aggregatedTable ;
SET detailedTable end=eof;
 
      retain totSalesAmt 0;
      totSalesAmt = totSalesAmt + SaleAmt;
      keep totSalesAmt;
      if eof then output;
 
run;

CAS DATA Step

While the above code returns one row in single-engine SAS, the same code returns multiple rows in CAS — one per thread. When I ran this code against a table in my environment, I got 28 rows (because CAS used 28 threads in this example).

As with the unique ID logic, producing a total aggregate is just a little more complicated in CAS. To make it work in CAS, we need a post-process step to bring the results together. So, our code would look like this:

DATA aggregatedTable ;
SET detailedTable end=eof;
 
      retain threadSalesAmt 0;
      threadSalesAmt = threadSalesAmt + SaleAmt;
      keep threadSalesAmt;
      if eof then output;
 
run;
 
DATA aggregatedTable / single=yes;
SET aggregatedTable end=eof;
 
      retain totSalesAmt 0;
      totSalesAmt = totSalesAmt + threadSalesAmt;
      if eof then output;
 
run;

In the first data step in the above example, we ran basically the same code as in the SAS DATA Step example. In that step, we let CAS do its distributed, multi-threaded processing because our table is large. Spreading the work over multiple threads makes the aggregation much quicker. After this, we execute a second DATA Step but here we force CAS to use only one thread with the single=yes option. This ensures we only get one output row because CAS only uses one thread. Using a single thread in this case is optimal because we’ll only have a few input records (one per thread from the previous step).

BY-GROUP Aggregation

Individual threads are then assigned to individual BY-Groups. Since each BY-Group is processed by one and only one thread, when we aggregate, we won’t see multiple output rows for a BY-Group. So, there shouldn’t be a need to consolidate the thread results like there was with “whole table” aggregation above.

Consequently, BY-Group aggregation DATA Step code should look exactly the same in CAS and SAS (at least for the basic stuff).

Concluding Thoughts

Coding DATA Step in CAS is very similar to coding DATA Step in SAS. If you’re a wizard in one, you’re likely a wizard in the other. The major difference is accounting for CAS’ massively parallel processing capabilities (which manifest as threads). For more insight into data processing with CAS, check out the SAS Global Forum paper.

Threads and CAS DATA Step was published on SAS Users.

6月 192018
 

When making a new piece of code, I like to use the smallest font I can read. This lets me fit more text on the screen at once. When presenting code to others, especially in a classroom setting, I like to make the font large enough to see from the back of the room. Here’s how I change font size in SAS in our three programming interfaces.

The post Changing font size in SAS appeared first on SAS Learning Post.

6月 152018
 

My 2018 SAS Global Forum paper was about "how to use the random-number generators (RNGs) in SAS." You can read the paper for details, but I recently recorded a short video that summarizes the main ideas in the paper. In particular, the video gives an overview of the new RNGs in SAS, which include the following:

  • MTHYBRID, MT2002, and MT64: Variants of the Mersenne twister RNG. The MTHYBRID method is the default RNG in SAS, beginning with the SAS 9.4M3.
  • PCG: A 64-bit permuted congruential generator
  • RDRAND: A hardware-based RNG that generates random numbers from thermal noise in the chip
  • TF2 and TF4: Counter-based Threefry RNGs

If your browser does not support embedded video, you can go directly to the video on YouTube.

The following references provide more information about the random number generators in SAS:

The post Video: New random number generators in SAS appeared first on The DO Loop.

6月 112018
 

In SAS, the reserved keyword _NULL_ specifies a SAS data set that has no observations and no variables. When you specify _NULL_ as the name of an output data set, the output is not written. The _NULL_ data set is often used when you want to execute DATA step code that displays a result, defines a macro variable, writes a text file, or makes calls to the EXECUTE subroutine. In those cases, you are interested in the "side effect" of the DATA step and rarely want to write a data set to disk. This article presents six ways to use the _NULL_ data set. Because the _NULL_ keyword is used, no data set is created on disk.

#1. Use SAS as a giant calculator

You can compute a quantity a DATA _NULL_ step and then use the PUT statement to output the answer to the SAS log. For example, the following DATA step evaluates the normal density function at x-0.5 when μ=1 and σ=2. The computation is performed twice: first using the built-in PDF function and again by using the formula for the normal density function. The SAS log shows that the answer is 0.193 in both cases.

data _NULL_;
mu = 1; sigma = 2; x = 0.5; 
pdf = pdf("Normal", x, mu, sigma);
y = exp(-(x-mu)**2 / (2*sigma**2)) / sqrt(2*constant('pi')*sigma**2);
put (pdf y) (=5.3);
run;
pdf=0.193 y=0.193

#2. Display characteristics of a data set

You can use a null DATA step to display characteristics of a data set. For example, the following DATA step uses the PUT statement to display the number of numeric and character variables in the Sashelp.Class data set. No data set is created.

data _NULL_;
set Sashelp.Class;
array char[*} $ _CHAR_;
array num[*} _NUMERIC_;
nCharVar  = dim(char);
nNumerVar = dim(num);
put "Sashelp.Class: " nCharVar= nNumerVar= ;
stop;   /* stop processing after first observation */
run;
Sashelp.Class: nCharVar=2 nNumerVar=3

You can also store these values in a macro variable, as shown in the next section.

#3. Create a macro variable from a value in a data set

You can use the SYMPUT or SYMPUTX subroutines to create a SAS macro variable from a value in a SAS data set. For example, suppose you run a SAS procedure that computes some statistic in a table. Sometimes the procedure supports an option to create an output data that contains the statistic. Other times you might need to use the ODS OUTPUT statement to write the table to a SAS data set. Regardless of how the statistic gets in a data set, you can use a DATA _NULL_ step to read the data set and store the value as a macro variable.

The following statements illustrate this technique. PROC MEANS creates a table called Summary, which contains the means of all numerical variables in the Sashelp.Class data. The ODS OUTPUT statement writes the Summary table to a SAS data set called Means. The DATA _NULL_ step finds the row for the Height variable and creates a macro variable called MeanHeight that contains the statistic. You can use that macro variable in subsequent steps of your analysis.

proc means data=Sashelp.Class mean stackods;
   ods output Summary = Means;
run;
 
data _NULL_;
set Means;
/* use PROC CONTENTS to determine the columns are named Variable and Mean */
if Variable="Height" then             
   call symputx("MeanHeight", Mean);
run;
 
%put &=MeanHeight;
MEANHEIGHT=62.336842105

For a second example, see the article "What is a factoid in SAS," which shows how to perform the same technique with a factoid table.

#4. Create macro variable from a computational result

Sometimes there is no procedure that computes the quantity that you want, or you prefer to compute the quantity yourself. The following DATA _NULL_ step counts the number of complete cases for the numerical variables in the Sashelp.Heart data. It then displays the number of complete cases and the percent of complete cases in the data. You can obtain the same results if you use PROC MI and look at the MissPattern table.

data _NULL_;
set Sashelp.Heart end=eof nobs=nobs;
NumCompleteCases + (nmiss(of _NUMERIC_) = 0); /* increment if all variables are nonmissing */
if eof then do;                               /* when all observations have been read ... */
   PctComplete = NumCompleteCases / nobs;     /* ... find the percentage */
   put NumCompleteCases= PctComplete= PERCENT7.1;
end;
run;
NumCompleteCases=864 PctComplete=16.6%

#5. Edit a text file or ODS template "on the fly"

This is a favorite technique of Warren Kuhfeld, who is a master of writing a DATA _NULL_ step that modifies an ODS template. In fact, this technique is at the heart of the %MODSTYLE macro and the SAS macros that modify the Kaplan-Meier survival plot.

Although I am not as proficient as Warren, I wrote a blog post that introduces this template modification technique. The DATA _NULL_ step is used to modify an ODS template. It then uses CALL EXECUTE to run PROC TEMPLATE to compile the modified template.

#6. A debugging tool

All the previous tips use _NULL_ as the name of a data set that is not written to disk. It is a curious fact that you can use the _NULL_ data set in almost every SAS statement that expects a data set name!

For example, you can read from the _NULL_ data set. Although reading zero observations is not always useful, one application is to check the syntax of your SAS code. Another application is to check whether a procedure is installed on your system. For example, you can run the statements PROC ARIMA data=_NULL_; quit; to check whether you have access to the ARIMA procedure.

A third application is to use _NULL_ to suppress debugging output. During the development and debugging phase of your development, you might want to use PROC PRINT, PROC CONTENTS, and PROC MEANS to ensure that your program is working as intended. However, too much output can be a distraction, so sometimes I direct the debugging output to the _NULL_ data set where, of course, it magically vanishes! For example, the following DATA step subsets the Sashelp.Cars data. I might be unsure as to whether I created the subset correctly. If so, I can use PROC CONTENTS and PROC MEANS to display information about the subset, as follows:

data Cars;
set Sashelp.Cars(keep=Type _NUMERIC_);
if Type in ('Sedan', 'Sports', 'SUV', 'Truck'); /* subsetting IF statement */
run;
 
/* FOR DEBUGGING ONLY */
%let DebugName = Cars;  /* use _NULL_ to turn off debugging output */
proc contents data=&DebugName short;
run;
proc means data=&DebugName N Min Max;
run;

If I don't want to this output (but I want the option to see it again later), I can modify the DebugName macro (%let DebugName = _NULL_;) so that the CONTENTS and MEANS procedures do not produce any output. If I do that and rerun the program, the program does not create any debugging output. However, I can easily restore the debugging output whenever I want.

Summary

In summary, the _NULL_ data set name is a valuable tool for SAS programmers. You can perform computations, create macro variables, and manipulate text files without creating a data set on disk. Although I didn't cover it in this article, you can use DATA _NULL_ in conjunction with ODS for creating customized tables and reports.

What is your favorite application of using the _NULL_ data set? Leave a comment.

The post 6 ways to use the _NULL_ data set in SAS appeared first on The DO Loop.