Leonid Batkhan

3月 062019
 

conditionally terminating a SAS batch flow process in UNIX/LinuxIn automated production (or business operations) environments, we often run SAS job flows in batch mode and on schedule. SAS job flow is a collection of several inter-dependent SAS programs executed as a single process.

In my earlier posts, Running SAS programs in batch under Unix/Linux and Let SAS write batch scripts for you, I described how you can run SAS programs in batch mode by creating UNIX/Linux scripts that in turn incorporate other scripts invocations.

In this scenario you can run multiple SAS programs sequentially or in parallel, all while having a single root script kicked off on schedule. The whole SAS processing flow runs like a chain reaction.

Why and when to stop SAS batch flow process

However, sometimes we need to automatically stop and terminate that chain job flow execution if certain criteria are met (or not met) in a program of that process flow.
Let’s say our first job in a batch flow is a data preparation step (ETL) where we extract data tables from a database and prepare them for further processing. The rest of the batch process is dependent on successful completion of that critical first job. The process is kicked off at 3:00 a.m. daily, however, sometimes we run into a situation when the database connection is unavailable, or the database itself is not finished refreshing, or something else happens resulting in the ETL program completing with ERRORs.

This failure means that our data has not updated properly and there is no reason to continue running the remainder of the job flow process as it might lead to undesired or even disastrous consequences. In this situation we want to automatically terminate the flow execution and send an e-mail notification to the process owners and/or SAS administrators informing them about the mishap.

How to stop SAS batch flow process in UNIX/Linux

Suppose, we run the following main.sh script on UNIX/Linux:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

The etl.sh script runs the SAS ETL process as follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/etl/etl.sas"
logname="/sas/code/etl/etl_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

We want to run tail.sh shell script (which itself runs multiple other scripts) only if etl.sas program completes successfully, that is if SAS ETL process etl.sas that is run by etl.sh completes with no ERRORs or WARNINGs. Otherwise, we want to terminate the main.sh script and do not run the rest of the processing flow.

To do this, we re-write our main.sh script as:

 
#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -eq 0 ]
   then
      #2 run the rest of processing flow
      /sas/code/processing/tail.sh
fi

In this code, we use a special shell script variable ($? for the Bourne and Korn shells, $STATUS for the C shell) to capture the exit status code of the previously executed OS command, /sas/code/etl/etl.sh:

exitcode=$?

Then the optional echo command just prints the captured value of that status for our information.

Every UNIX/Linux command executed by the shell script or user has an exit status represented by an integer number in the range of 0-255. The exit code of 0 means the command executed successfully without any errors; a non-zero value means the command was a failure.

SAS System plays nicely with the UNIX/Linux Operating System. According to the SAS documentation $? for the Bourne and Korn shells, and $STATUS for the C shell.) A value of 0 indicates successful termination. For additional flexibility, SAS’ Condition Exit Status Code All steps terminated normally 0 SAS issued WARNINGs 1 SAS issued ERRORs 2 User issued ABORT statement 3 User issued ABORT RETURN statement 4 User issued ABORT ABEND statement 5 SAS could not initialize because of a severe error 6 User issued ABORT RETURN - n statement n User issued ABORT ABEND - n statement n

Since our etl.sh script executes SAS code etl.sas, the exit status code is passed by the SAS System to etl.sh and consequently to our main.sh shell script.

Then, in the main.sh script we check if that exit code equals to 0 and then and only then run the remaining flow by executing the tail.sh shell script. Otherwise, we skip tail.sh and exit from the main.sh script reaching its end.

Alternatively, the main.sh script can be implemented with an explicit exit as follows:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -ne 0 ]
   then exit
fi
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

In this shell script code example, we check the exit return code value, and if it is NOT equal to 0, then we explicitly terminate the main.sh shell script using exit command which gets us out of the script immediately without executing the subsequent commands. In this case, our #2 command invoking tail.sh script never gets executed that effectively stops the batch flow process.

If you also need to automatically send an e-mail notification to the designated people about the failed batch flow process, you can do it in a separate SAS job that runs right before exit command. Then the if-statement will look something like this:

 
if [ $exitcode -ne 0 ]
   then
      # send an email and exit
      /sas/code/etl/email_etl_failure.sh
      exit
fi

That is immediately after the email is sent, the shell script and the whole batch flow process gets terminated by the exit command; no shell script commands beyond that if-statement will be executed.

A word of caution

Be extra careful if you use the special script variable $? directly in a script's logical expression, without assigning it to an interim variable. For example, you could use the following script command sequence:

/sas/code/etl/etl.sh
if [ $? -ne 0 ]
. . .

However, let’s say you insert another script command between them, for example:

/sas/code/etl/etl.sh
echo "Status=$? (0=SUCCESS,1=WARNING,2=ERROR)"
if [ $? -ne 0 ]
. . .

Then the $? variable in the if [ $? -ne 0 ] statement will have the value of the previous echo command, not the /stas/code/etl/etl.sh command as you might imply.

Hence, I suggest capturing the $? value in an interim variable (e.g. exitcode=$?) right after the command, exit code of which you are going to inspect, and then reference that interim variable (as $exitcode) in your subsequent script statements. That will save you from trouble of inadvertently referring to a wrong exit code when you insert some additional commands during your script development.

Your thoughts

What do you think about this approach? Did you find this blog post useful? Did you ever need to terminate your batch job flow? How did you go about it? Please share with us.

How to conditionally terminate a SAS batch flow process in UNIX/Linux was published on SAS Users.

2月 062019
 

Splitting external text data files into multiple files

Recently, I worked on a cybersecurity project that entailed processing a staggering number of raw text files about web traffic. Millions of rows had to be read and parsed to extract variable values.

The problem was complicated by the varying records composition. Each external raw file was a collection of records of different structures that required different parsing programming logic. Besides, those heterogeneous records could not possibly belong to the same rectangular data tables with fixed sets of columns.

Solving the problem

To solve the problem, I decided to employ a "divide and conquer" strategy: to split the external file into many files, each with a homogeneous structure, then parse them separately to create as many output SAS data sets.

My plan was to use a SAS DATA Step for looping through the rows (records) of the external file, read each row, identify its type, and based on that, write it to a corresponding output file.

Like how we would split a data set into many:

 
data CARS_ASIA CARS_EUROPE CARS_USA;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   output CARS_ASIA;
      when('Europe') output CARS_EUROPE;
      when('USA')    output CARS_USA;
   end;   
run;

But how do you switch between the output files? The idea came from SAS' Chris Hemedinger, who suggested using multiple FILE statements to redirect output to different external files.

Splitting an external raw file into many

As you know, one can use PUT statement in a SAS DATA Step to output a character string or a combination of character strings and variable values into an external file. That external file (a destination) is defined by a

 
filename inf  'c:\temp\input_file.txt';
filename out1 'c:\temp\traffic.txt';
filename out2 'c:\temp\system.txt';
filename out3 'c:\temp\threat.txt';
filename out4 'c:\temp\other.txt';
 
data _null_;
   infile inf;
   input REC_TYPE $10. @;
   input;
   select(REC_TYPE);
      when('TRAFFIC') file out1;
      when('SYSTEM')  file out2;
      when('THREAT')  file out3;
      otherwise       file out4;
   end;
   put _infile_;
run;

In this code, the first INPUT statement retrieves the value of REC_TYPE. The trailing @ line-hold specifier ensures that an input record is held for the execution of the next INPUT statement within the same iteration of the DATA Step. It may not be used exactly as written, but the point is you need to capture the filed(s) of interest and stay on the same row.

The second INPUT statement reads the whole raw file record into the _infile_ DATA Step automatic variable.

Depending on the value of the REC_TYPE variable assigned in the first INPUT statement, SELECT block toggles the FILE definition between one of the four filerefs, out1, out2, out3, or out4.

Then the PUT statement outputs the _infile_ automatic variable value to the output file defined in the SELECT block.

Splitting a data set into several external files

Similar technique can be used to split a data table into several external raw files. Let’s combine the above two code samples to demonstrate how you can split a data set into several external raw files:

 
filename outasi 'c:\temp\cars_asia.txt';
filename outeur 'c:\temp\cars_europe.txt';
filename outusa 'c:\temp\cars_usa.txt';
 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi;
      when('Europe') file outeur;
      when('USA')    file outusa;
   end;
   put _all_; 
run;

This code will read observations of the SASHELP.CARS data table, and depending on the value of ORIGIN variable, put _all_ will output all the variables (including automatic variables _ERROR_ and _N_) as named values (VARIABLE_NAME=VARIABLE_VALUE pairs) to one of the three external raw files specified by their respective file references (outasi, outeur, or outusa.)

You can modify this code to produce delimited files with full control over which variables and in what order to output. For example, the following code sample produces 3 files with comma-separated values:

 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi dlm=',';
      when('Europe') file outeur dlm=',';
      when('USA')    file outusa dlm=',';
   end;
   put make model type origin msrp invoice; 
run;

You may use different delimiters for your output files. In addition, rather than using mutually exclusive SELECT, you may use different logic for re-directing your output to different external files.

Bonus: How to zip your output files as you create them

For those readers who are patient enough to read to this point, here is another tip. As described in this series of blog posts by Chris Hemedinger, in SAS you can read your external raw files directly from zipped files without unzipping them first, as well as write your output raw files directly into zipped files. You just need to specify that in your filename statement. For example:

UNIX/Linux

 
filename outusa ZIP '/sas/data/temp/cars_usa.txt.gz' GZIP;

Windows

 
filename outusa ZIP 'c:\temp\cars.zip' member='cars_usa.txt';

Your turn

What is your experience with creating multiple external raw files? Could you please share with the rest of us?

How to split a raw file or a data set into many external raw files was published on SAS Users.

1月 102019
 


My New Year's resolution: “Unclutter your life” and I hope this post will help you do the same.

Here I share with you a data preparation approach and SAS coding technique that will significantly simplify, unclutter and streamline your SAS programming life by using data templates.

Dictionary.com defines template as “anything that determines or serves as a pattern; a model.” However, I was flabbergasted when my “prior art research” for the topic of this blog post ended rather abruptly: “No results found for data template.”

What do you mean “no results?!” (Yes, sometimes I talk to the Internet. Do you?) We have templates for everything in the world: MS Word templates, C++ templates, Photoshop templates, website templates, holiday templates, we even have our own PROC TEMPLATE. But no templates for data?

At that point I paused, struggling to accept reality, but then felt compelled to come up with my own definition:

A data template is a well-defined data structure containing a data descriptor but no data.

Therefore, a SAS data template is a SAS dataset (data table) containing the descriptor portion with all necessary attributes defined (variable types, labels, lengths, formats, and informats) and empty (zero observations) data portion.

Less clutter = greater efficiency

When you construct SAS data tables using SAS code or data management tools such as using data design documentation as a feed to code-generating SAS program.

Unfortunately, despite all these benefits the data template concept is not explicitly and consistently employed and is noticeably absent from data development methodologies and practices.

Let’s try to change that!

How to create SAS data templates from scratch

It is very easy to create SAS data template. Here is an example:

 
libname PARMSDL 'c:\projects\datatemplates';
data PARMSDL.MYTEMPLATE;
   label
      newvar1 = 'Label for new variable 1'
      newvar2 = 'Label for new variable 2'
      /* ... */
      newvarN = 'Label for new variable N'
      ;
  length
      newvar1 newvar2 $40
      newvarN 8
      ;
   format newvarN mmddyy10.;
   informat newvarN date9.;
   stop;
run;

First, you need to assign a permanent library (e.g. PARMSDL) where you are going to store your SAS dataset template. I usually do not store data templates in the same library as data. Nor do I store it in the same directory/folder where you store your SAS code. Ordinarily, I store data templates in a so-called parameter data library (that is why I use PARMSDL as a libref), along with other data defining SAS code structure.

In the data step, the very first statement LABEL defines all variables’ labels as well as variable position determined by the order in which they are listed.

Statement LENGTH defines variables’ types (numeric or character) and their length in bytes. Here you may group variables of the same length to shorten your code or define them individually to be more explicit.
Statement FORMAT defines variables’ formats as needed. You don’t have to define formats for all the variables; define them only if necessary.

Statement INFORMAT (also optional) defines informats that come handy if you use this data template for creating SAS datasets by reading external raw files. With informats defined on the data template, you won’t have to specify informats in your INPUT statement while reading external file, as the informats will be inherently associated with the variable names. That is why SAS data sets have informat attribute for its variables in the first place (if you ever wondered why.)

Finally, don’t forget the STOP statement at the end of your data step, just before the RUN statement. Otherwise, instead of zero observations, you will end up with a data table that has a single observation with all missing variable values. Not what we want.

It is worth noting that obs=0 system option will not work instead of the STOP statement as it is applied only to the data being read, but we read no data here. For the same reason, (obs=0) data set option will not work either. Try it, and SAS log will dispel your doubts:

 
data PARMSDL.MYTEMPLATE (obs=0);
                        ---
                        70
WARNING 70-63: The option OBS is not valid in this context.  Option ignored.

How to create SAS data templates by inheritance

If you already have some data table with well-defined variable attributes, you may easily create a data template out of that data table by inheriting its descriptor portion:

 
data PARMSDL.MYTEMPLATE;
   set SASDL.MYDATA (obs=0);
run;

Option (obs=0) does work here as it is applied to the dataset being read, and therefore STOP statement is not necessary.

You can also combine inheritance with defining new variables, as in the following example:

 
data MYTEMPLATE;
   set SASDL.MYDATA (obs=0); *<-- inherited template;
   * variables definition: ;
   label
      newvar1 = &#039;Label for new variable 1&#039;
      newvarN = &#039;Label for new variable N&#039;
      oldvar =  &#039;New Label for OLD variable’ 
      ;
   length
      newvar1 $40
      newvarN 8
      oldvar  $100 /* careful here, see notes below */
      ;
   format newvarN mmddyy10.;
   informat newvarN date9.;
run;

A word of warning

Be careful when your new variable definition type and length contradicts inherited definition.
You can overwrite/re-define inherited variable attributes such as labels, formats and informats with no problem, but you cannot overwrite type and in some cases length. If you do need to have a different variable type for a specific variable name on your data template, you should first drop that variable on the SET statement and then re-define it in the data step.

With the length attribute the picture is a bit different. If you try defining a different length for some variable, SAS will produce the following WARNING in the LOG:

WARNING: Length of character variable  has already been set.
Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of a character variable.

You can either use the advice of the WARNING statement and place the LENGTH statement as the very first statement or at least before the SET statement. In this case, you will find that you can increase the length without a problem, but if you try to reduce the length relative to the one on the parent dataset SAS will produce the following WARNING in the LOG:

WARNING: Multiple lengths were specified for the variable  by input data set(s). This can cause truncation of data.

In this case, a cleaner way will also be to drop that variable on the SET statement and redefine it with the LENGTH statement in the data step.

Keep in mind that when you drop these variables from the parent data set, besides losing their type and length attributes, you will obviously lose the rest of the attributes too. Therefore, you will need to re-define all the attributes (type, length, label, format, and informat) for the variables you drop. At least, this technique will allow you to selectively inherit some variables from a parent data set and explicitly define others.

How to use SAS data templates

One way to apply your data template to a newly created dataset is to: 1) Copy your data template in that new dataset; 2) Append your data table to that new data set. Here is an example:

 
/* copying data template into dataset */
data SASDL.MYNEWDATA
   set PARMSDL.MYTEMPLATE;
run;
 
/* append data to your dataset with descriptor */
proc append base=SASDL.MYNEWDATA data=WORK.MYDATA;
run;

Your variable types and lengths should be the same on the BASE= and DATA= tables; labels, formats and informats will be carried over from the BASE= dataset/template.

It is simple, but could be simplified even more to reduce your code to just a single data step:

 data SASDL.MYNEWDATA;
   if 0 then set PARMSDL.MYTEMPLATE;
   set WORK.MYDATA;
   /* other statements */
run;

Even though set PARMSDL.MYTEMPLATE; statement has never executed because of the explicitly FALSE condition (0 means FASLE) in the IF statement, the resulting dataset SASDL.MYDATA gets all its variable attributes carried over from the PARMSDL.MYTEMPLATE data template during data step compilation.

This same coding technique can be used to implicitly apply data variable attributes from a well-defined data set by inheritance even though that data set is not technically a data template (has more than 0 observations.) Run the following code to make sure MYDATA table has all the variables and attributes of the SASHELP.CARS data table while data values come from the ABC data set:

 
data ABC;
  make='Toyota';
run;
 
data MYDATA;
   if 0 then set SASHELP.CARS;
   set ABC;
run;

Perhaps the benefits of SAS data templates are best demonstrated when you read external data into SAS data table. Here is an example for you to run (of course, in real life MYTEMPLATE should be a permanent data set and instead of datalines it should be an external file):

 
data MYTEMPLATE;
   label
      fdate = 'Flight Date'
      count = 'Flight Count'
      fdesc = 'Flight Description'
      reven = 'Revenue, $';
   length fdate count reven 8 fdesc $22;
   format fdate date9. count comma12. reven dollar12.2;
   informat fdate mmddyy10. count comma8. fdesc $22. reven comma10.;
   stop;
run;
 
data FLIGHTS;
   if 0 then set MYTEMPLATE;
   input fdate count fdesc & reven;
   datalines;
12/05/2018 500   Flight from DCA to BOS  120,034
10/01/2018 1,200 Flight from BOS to DCA  90,534
09/15/2018 2,234 Flight from DCA to MCO  1,350
;

Here is how the output data set looks:

Notice how simple the last data step is. No labels, no lengths, no formats, no informats – no clutter. Yet, the raw data is read in nicely, with proper informats applied, and the resulting data set has all the proper labels and variable formatting. And when you repeat this process for another sample of similar data you can still use the same data template, and your read-in data step stays the same – simple and concise.

Your thoughts

Do you find SAS data templates useful? Do you use them in any shape or form in your SAS data development projects? Please share your thoughts.

Simplify data preparation using SAS data templates was published on SAS Users.

7月 172018
 

Automation for SAS Administrators - deleting old filesAttention SAS administrators! When running SAS batch jobs on schedule (or manually), they usually produce date-stamped SAS logs which are essential for automated system maintenance and troubleshooting. Similar log files have been created by various SAS infrastructure services (Metadata server, Mid-tier servers, etc.) However, as time goes on, the relevance of such logs diminishes while clutter stockpiles. In some cases, this may even lead to disk space problems.

There are multiple ways to solve this problem, either by deleting older log files or by stashing them away for auditing purposes (zipping and archiving). One solution would be using Unix/Linux or Windows scripts run on schedule. The other is much "SAS-sier."

Let SAS clean up its "mess"

We are going to write a SAS code that you can run manually or on schedule, which for a specified directory (folder) deletes all .log files that are older than 30 days.
First, we need to capture the contents of that directory, then select those file names with extension .log, and finally, subset that file selection to a sub-list where Date Modified is less than Today's Date minus 30 days.

Perhaps the easiest way to get the contents of a directory is by using the X statement (submitting DOS’ DIR command from within SAS with a pipe (>) option, e.g.

x 'dir > dirlist.txt';

or using pipe option in the filename statement:

filename DIRLIST pipe 'dir "C:\Documents and Settings"';

However, SAS administrators know that in many organizations, due to cyber-security concerns IT department policies do not allow enabling the X statement by setting SAS XCMD system option to NOXCMD (XCMD system option for Unix). This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration. In this case, no operating system command can be executed from within SAS. Try running any X statement in your environment; if it is disabled you will get the following ERROR in the SAS log:

ERROR: Shell escape is not valid in this SAS session.

To avoid that potential roadblock, we’ll use a different technique of capturing the contents of a directory along with file date stamps.

Macro to delete old log files in a directory/folder

The following SAS macro cleans up a Unix directory or a Windows folder removing old .log files. I must admit that this statement is a little misleading. The macro is much more powerful. Not only it can delete old .log files, it can remove ANY file types specified by their extension.

%macro mr_clean(dirpath=,dayskeep=30,ext=.log);
   data _null_;
      length memname $256;
      deldate = today() - &dayskeep;
      rc = filename('indir',"&dirpath");
      did = dopen('indir');
      if did then
      do i=1 to dnum(did);
         memname = dread(did,i);
         if reverse(trim(memname)) ^=: reverse("&ext") then continue;
         rc = filename('inmem',"&dirpath/"!!memname);
         fid = fopen('inmem');
         if fid then 
         do;
            moddate = input(finfo(fid,'Last Modified'),date9.);
            rc = fclose(fid);
            if . < moddate <= deldate then rc = fdelete('inmem');
         end;
      end; 
      rc = dclose(did);
      rc = filename('inmem');
      rc = filename('indir');
   run;
%mend mr_clean;

This macro has 3 parameters:

  • dirpath - directory path (required);
  • dayskeep - days to keep (optional, default 30);
  • ext - file extension (optional, default .log).

This macro works in both Windows and Linux/Unix environments. Please note that dirpath and ext parameter values are case-sensitive.

Here are examples of the macro invocation:

1. Using defaults

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean)

With this macro call, all files with extension .log (default) which are older than 30 days (default) will be deleted from the specified directory.

2. Using default extension

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=20)

With this macro call, all files with extension .log (default) which are older than 20 days will be deleted from the specified directory.

3. Using explicit parameters

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=10,ext=.xls)

With this macro call, all files with extension .xls (Excel files) which are older than 10 days will be deleted from the specified directory.

Old file deletion SAS macro code explanation

The above SAS macro logic and actions are done within a single data _NULL_ step. First, we calculate the date from which file deletion starts (going back) deldate = today() - &dayskeep. Then we assign fileref indir to the specified directory &dirpath:

rc = filename('indir',"&dirpath");

Then we open that directory:

did = dopen('indir');

and if it opened successfully (did>0) we loop through its members which can be either files or directories:

do i=1 to dnum(did);

In that loop, first we grab the directory member name:

memname = dread(did,i);

and look for our candidates for deletion, i.e., determine if that name (memname) ends with "&ext". In order to do that we reverse both character strings and compare their first characters. If they don’t match (^=: operator) then we are not going to touch that member - the continue statement skips to the end of the loop. If they do match it means that the member name does end with "&ext" and it’s a candidate for deletion. We assign fileref inmem to that member:

rc = filename('inmem',"&dirpath/"!!memname);

Note that forward slash (/) Unix/Linux path separator in the above statement is also a valid path separator in Windows. Windows will convert it to back slash (\) for display purposes, but it interprets forward slash as a valid path separator along with back slash.
Then we open that file using fopen function:

fid = fopen('inmem');

If inmem is a directory, the opening will fail (fid=0) and we will skip the following do-group that is responsible for the file deletion. If it is file and is opened successfully (fid>0) then we go through the deletion do-group where we first grab the file Last Modified date as moddate, close the file, and if moddate <= deldate we delete that file:

rc = fdelete('inmem');

Then we close the directory and un-assign filerefs for the members and directory itself.

Deleting old files across multiple directories/folders

Macro %mr_clean is flexible enough to address various SAS administrators needs. You can use this macro to delete old files of various types across multiple directories/folders. First, let’s create a driver table as follows:

data delete_instructions;
   length days 8 extn $9 path $256;
   infile datalines truncover;
   input days 1-2 extn $ 4-12 path $ 14-270;
   datalines;
30 .log      C:\PROJECTS\Automatically deleting old files\Logs1
20 .log      C:\PROJECTS\Automatically deleting old files\Logs2
25 .txt      C:\PROJECTS\Automatically deleting old files\Texts
35 .xls      C:\PROJECTS\Automatically deleting old files\Excel
30 .sas7bdat C:\PROJECTS\Automatically deleting old files\SAS_Backups
;

This driver table specifies how many days to keep files of certain extensions in each directory. In this example, perhaps the most beneficial deletion applies to the SAS_Backups folder since it contains SAS data tables (extension .sas7bdat). Data files typically have much larger size than SAS log files, and therefore their deletion frees up much more of the valuable disk space.

Then we can use this driver table to loop through its observations and dynamically build macro invocations using CALL EXECUTE:

data _null_;
   set delete_instructions;
   s = cats('%nrstr(%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,'))');
   call execute(s);
run;

Alternatively, we can use DOSUBL() function to dynamically execute our macro at every iteration of the driver table:

data _null_;
   set delete_instructions;
   s = cats('%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,')');
   rc = dosubl(s);
run;

Put it on autopilot

When it comes to cleaning your old files (logs, backups, etc.), the best practice for SAS administrators is to schedule your cleaning job to automatically run on a regular basis. Then you can forget about this chore around your "SAS house" as %mr_clean macro will do it quietly for you without the noise and fuss of a Roomba.

Your turn, SAS administrators

Would you use this approach in your SAS environment? Any suggestions for improvement? How do you deal with old log files? Other old files? Please share below.

SAS administrators tip: Automatically deleting old SAS logs was published on SAS Users.

5月 302018
 

developing foolproof solutionsAs oil and water, hardware and software don't mix, but rather work hand-in-hand together to deliver value to us, their creators. But sometimes, we make mistakes, behave erratically, or deal with others who might make mistakes, behave erratically, or even take advantage of our technologies.

Therefore, it is imperative for developers, whether hardware or software engineers, to foresee unintended (probable or improbable) system usages and implement features that will make their creations foolproof, that is protected from misuse.

In this post I won’t lecture you about various techniques of developing foolproof solutions, nor will I present even a single snippet of code. Its purpose is to stimulate your multidimensional view of problems, to unleash your creativity and to empower you to become better at solving problems, whether you develop or test software or hardware, market or sell it, write about it, or just use it.

You May Also Like: Are you solving the wrong problem?

The anecdote I’m about to tell you originated in Russia, but since there was no way to translate this fictitious story exactly without losing its meaning, I attempted to preserve its essence while adapting it to the “English ear” with some help from Sir Arthur Conan Doyle. Well, sort of. Here goes.

The Art of Deduction

Mr. Sherlock Holmes and Dr. Watson were traveling in an automobile in northern Russia. After many miles alone on the road, they saw a truck behind them. Soon enough, the truck pulled ahead, and after making some coughing noises, suddenly stopped right in front of them. Sherlock Holmes stopped their car as well.

Dr. Watson: What happened? Has it broken?

Holmes: I don’t think so. Obviously, it ran out of gas.

The truck driver got out of his cabin, grabbed a bucket hanging under the back of the truck and ran towards a ditch on the road shoulder. He filled the bucket with standing water from the ditch and ran back to his truck. Then, without hesitation, he carefully poured the bucketful of water into the gas tank. Obviously in full confidence of what he’s doing, he returned to the truck, started the engine, and drove away.

Dr. Watson (in astonishment): What just happened? Are Russian ditches filled with gasoline?

Holmes: Relax, dear Watson, it was ordinary ditch water. But I wouldn’t suggest drinking it.

Dr. Watson (still in disbelief): What, do their truck engines work on water, then?

Holmes: Of course not, it’s a regular Diesel engine.

Dr. Watson: Then how is that possible? If the truck was out of gas, how was it able to start back up after water was added to the tank?!

Who knew Sherlock Holmes had such engineering acumen!

Holmes: “Elementary, my dear Watson. The fuel intake pipe is raised a couple inches above the bottom of the gas tank. That produces the effect of seemingly running out of gas when the fuel falls below the pipe, even though there is still some gas left in the tank. Remember, oil and water don't mix.  When the truck driver poured a bucketful of water into the gas tank, that water – having a higher density than the Diesel fuel – settled in the bottom, pushing the fuel above the intake opening thus making it possible to pump it to the engine.”

After a long pause – longer than it usually takes to come to grips with reality – Dr. Watson whispered in bewilderment.

Dr. Watson: Я не понимаю, I don’t understand!

Then, still shaken, he asked the only logical question a normal person could possibly ask under the circumstances.

Dr. Watson: Why would they raise the fuel intake pipe from the tank bottom in the first place?

Holmes: Ah, Watson, it must be to make it foolproof. What if some fool decides to pour a bucket of water in the gas tank!

You May Also Like: Are you solving the wrong problem?

Are you developing foolproof solutions? was published on SAS Users.

5月 082018
 

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.

5月 082018
 

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.

4月 012018
 

BREAKING NEWS. Today, shortly after midnight on the U.S. East Coast, Cary, NC-based SAS Institute successfully completed its first space exploration mission.

This interplanetary expedition was conducted on a SAS-designed manned spacecraft powered by our state-of-the-art atomic Collider Acceleration System (CAS) engine. A crew of three SAS volunteers took part in that undertaking. These brave souls were:

spacecraft commanderspacecraft pilot/engineerspacecraft mission specialist

All three specialize in terrestrial communications (i.e. social media) and received special training on extra-terrestrial travel and communications. Their mission was to study the Venus space area up close to find out what causes the gravitational field anomaly that has recently been observed there.

How it all started

Those of us who attended last year’s SAS Global Forum in Orlando must remember the inspiring speech by Canadian astronaut Chris Hadfield. The main idea I took away from his speech was that success is not a good teacher, as it teaches us nothing; failure, on the hand, is a very good teacher, at least for those of us who are willing to learn the lesson.  But, as we all know, there is a time to fail/learn, and there is a time to succeed.

I can’t speak for all of you, but, man, were we inspired by that speech! We at SAS knew right then that we, too, wanted to explore the “final frontier.” As a data analytics software company, all our studies start with data explorations. Our public relations group worked tirelessly with the major stakeholder Government agencies and private companies (NASA, SpaceX, ROSCOSMOS, etc.) to get ahold of the data. When our analysts finally did get access to the data, they were overwhelmed by its size. That was really BIG data (literally of cosmic proportions) – data about every little pocket of spacetime in our Solar system, collected over multiple years of astronomical observations and Space exploration programs.

Predictive Modeling

Using our flagship SAS Viya Analytics software, we mined these vast data archives, employing various predictive modeling, computational, and heuristic techniques such as automatic machine deep space learning, 3D artificial intelligence simulation, and, most importantly, natural, coffee-stimulated human intelligence.

What caught our attention was the space area around Venus. Planet Venus is notorious for being an outlier. First of all, it spins slower than any other planet in the Solar system, even slower than it revolves around the Sun. In fact it spins about 243 times slower than Earth. That means that a day there lasts approximately 243 Earth-days, making it longer than a Venusian year, which is only about 225 Earth-days long.

Second of all, it spins backwards, in the opposite direction from most other planets, including Earth, so that on Venus the sun rises in the west.

Third, it has the highest mean surface temperature of all the Solar System planets – reaching up to 726 °K (452 °C or 870 °F), which is 1.6 times hotter than Mercury, the closest planet to the Sun. This is because of Venus’ thick atmosphere composed mostly of greenhouse gases (carbon dioxide and sulfur dioxide), which trap a good portion of the Sun’s heat.

However, the most unusual thing that we discovered was an aberration in Venus’ gravitational field, suggesting a significant mass (possibly large enough to be a planet) hidden behind it.

The following bubble plot uses a logarithmic scale for x-axis (distance from the Sun) and visualizes our finding:

SAS VA bubble plot showing planets period of revolution around the Sun

Now we can see it clearly. Not only does it show an unknown small planet behind Venus, it also explains why it is not visible from Earth: its period of revolution around the Sun is exactly the same as that of Venus, which is why it has always been obstructed by Venus, and not visible from Earth.

Due to its very close proximity to Venus, there is a good chance that even a slight tangential nudge experienced by planet “2-?” might break gravitational equilibrium, causing it to start orbiting Venus as a moon rather than the Sun.  We will be observing this situation carefully.

Another interesting finding is that this unknown planet is a much more hospitable place than Venus, as it has an oxygen-dominated atmosphere and a relaxing surface temperature only slightly higher than that of Earth, as it is shown in the following chart:

SAS VA bubble plot - Mean surface temperature of the Solar System planets

At this point we had had enough modeling and needed some hard proof.

Space exploration mission

We leveraged our best human intelligence resources around the world to progress rapidly through all the required phases of spacecraft design and construction, crew selection and training, and finally launching and completing the space mission.

All in all, it took just under a year (11 months to be precise) to bring this project to completion. The data collection and exploration phase took around one month, which is in line with the capacity of the SAS Viya analytical environment. The design and build phase took about five months, and it was conducted in parallel with crew selection and training; finally, the fly phase also took 5 months including launch, travel to the Venus space area with a flyby of Venus and the planet X, and a return to Earth. As you can see from the picture at the very top, the unknown planet X hidden behind Venus does indeed exist. However, further analysis and study will be necessary to determine the nature of the observed surface irregularities.

Your participation and input are requested

If you would like you to engage in the fascinating field of space exploration, you are welcome to use the following SAS-generated summary data table:

Data about planets of the Solar System

If you are a detail-oriented type, it will be obvious to you that the circumference of the new planet oddly equals 200π2 (km), which defies all the canons of geometry. You are welcome to prove or disprove the possibility of such an unusual occurrence.

So far, we have referred to this new planet as “2-?” and “planet X”, but it’s about time to give it a name, a real name as all other 8 planets of our Solar System have.

Our first inclination was to name it after our newest SAS analytical software environment, Viya. But do we really want to dilute the brand by applying it to two different though prominent objects?!

That is why we decided to reach out to you, our readers, to solicit ideas for the name of the new planet.  Please provide a brief justification for your name suggestion. We also welcome any insights, hypotheses and data stories you might come up with based on the collected data. We greatly appreciate your input.

Disclosure

Please click here for full disclosure.

SAS discovers a new planet in the Solar System was published on SAS Users.

1月 262018
 

Batch Scripts in SASIf Necessity is the mother of Invention, then, perhaps, the father of Automation is Laziness. Automation is all about convenience, comfort, and productivity. Why do it yourself if you can devise something to do it for you!

In my previous post Running SAS programs in batch under Unix/Linux, we learned that in order to automate SAS jobs submissions, they are often run in batch mode. We also learned that we usually create batch scripts as a convenient way to run SAS programs in batch. To create a unique SAS log file generated with each batch submission, a typical batch script may look like follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

It will allow you to submit your SAS program /sas/code/project1/program1.sas in batch, and also capture SAS log file with a convenient date-time suffix in the same directory.

SAS program to write batch scripts

But what if we are deploying multiple SAS programs? Well, then we would need to create a batch script for each of them. They will all look similar to each other, and that is when most human errors usually occur – when we do something similar, monotonously, over and over again.  Besides, I found working with the Unix Visual Editor (“vi editor”) is not quite a 21st century experience.

What would a normal SAS programmer do in such a situation? That’s right – we would write a SAS program to write a batch script file! Let’s do it. Let’s automate the automation.

In its simplest form, to replicate the above batch script example our SAS program would look like this:

filename b '/sas/code/project1/program1.sh';
 
data _null_;
file b;
input;
put _infile_;
datalines;
#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname
;

Setting up batch file permissions

As we already know from my previous post, we need to assign certain permissions to our batch file in order to make it executable. For example, if you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In order to do that you can to add to your SAS code the following x-statement:

options noxwait;
x 'chmod 750 /sas/code/project1/program1.sh';

Alternatively, you can use %SYSEXEC macro statement (no quoting for the OS command) or SYSTASK statement, or CALL SYSTEM routine (used within a data step).

When you create a batch script by running the above code in SAS Enterprise Guide (EG), you don’t have to leave the comfort of your SAS environment or even touch Unix vi editor. Moreover, you can even submit your SAS job in batch mode right from your SAS EG Program Editor.

However, all of these will work fine, unless XCMD System Option is disabled (NOXCMD).

Assigning batch file permissions when XCMD System Option is disabled

ERROR: Shell escape is not valid in this SAS session.

Bummer! Have you ever seen this error message in SAS Enterprise Guide while trying to run SAS code with the X statement? It indicates that executing OS commands in the SAS environment is not allowed.

In many organizations, IT department policies do not allow enabling the SAS XCMD system option due to cyber-security concerns. This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration.  In this case, no operating system command is allowed to be executed from within SAS.

Of course, this substantially limits SAS’ automation power, but that is the goal and the price to pay for enhanced security.

Still, even without OS command execution at our disposal, we can set Unix script file permissions using FILENAME statement’s PERMISSION= option. Then our above filename statement will look like this:

filename b '/sas/code/project1/program1.sh'
   permission='A::u::rwx,A::g::r-x,A::o::---';

However, it is important to realize that your ability to fully control file permissions via FILENAME statement’s PERMISSION= option is still restricted by the Unix umask value set by your IT system administrator. But usually, it is not overly restrictive, at least for the purpose of creating executable files in the environments I have worked with.

The double benefit of the FILENAME statement’s PERMISSION= option is that it can be used for setting up file permissions in any SAS installation whether the XCMD system option is enabled or disabled.

SAS macro to create batch script files

Let’s wrap all the above SAS code pieces into a SAS macro that writes batch scripts. Here is the macro code definition:

%macro write_shell(code);
   %let fdir = %substr(&code,1,%sysfunc(findc(&code,/,b)));
 
   options dlcreatedir;
   libname _flib "&fdir";
   libname _flib;
 
   %let core = %substr(&code,1,%eval(%length(&code)-4));
   filename _fout "&core..sh" permission='A::u::rwx,A::g::r-x,A::o::---'; 
   data _null_;
      file _fout;
      put
         '#!/bin/sh' //
         'now=$(date +%Y.%m.%d_%H.%M.%S)' /
         "pgmname=""&code""" /
         "logname=""&core._$now.log""" /
         'sas $pgmname -log $logname'
         ;
   run;
   filename _fout;
%mend write_shell;

The single macro parameter (code) represents full path name of your SAS code. And here is a macro invocation example:

 
%write_shell(/sas/code/project1/program1.sas)

The assumption here is that the script file gets created in the same directory as the relevant SAS code and SAS logs for each of the batch runs. It will be assigned the same name as your SAS program, only with the .sh name extension. As you can see, we do some string parsing to derive directory name, script file name and SAS log file name from the single macro parameter representing full path name of your SAS code. As an added bonus, if a specified directory (/sas/code/project1/) does not yet exist, it will be created by this macro. DLCREATEDIR System Option (along with the two subsequent libname statements) are responsible for the directory creation.

If you want to create many script files for your multiple SAS programs, you just invoke the macro as many times. You can even go totally data-driven for mass script file creation.

Do you find this useful?

Please let me know in the comments section below if you find this blog post useful. Thank you for reading! I also invite you to share your ideas and experiences on the topic.

Let SAS write batch scripts for you was published on SAS Users.

12月 202017
 

Running SAS programs in batchWhile SAS program development is usually done in an interactive SAS environment (SAS Enterprise Guide, SAS Display Manager, SAS Studio, etc.), when it comes to running SAS programs in a production or operations environment, it is routinely done in batch mode.

Why run SAS programs in batch mode?

First and foremost, this is done for automation, as the batch process does not require human participation at the time of run. It can be scheduled to run (using Operating System scheduler or other scheduling software) while we sleep, at any time of the day or at any time interval between two consecutive runs.

Running SAS programs in batch mode allows streamlining SAS processing by eliminating the possibility of human error, submitting multiple SAS jobs (programs) all at once or in a sequence securing programs and/or data dependencies.

SAS batch processing also takes care of self-documenting, as it automatically generates and stores SAS logs and outputs.

Imagine the following scenario. Every night, a SAS batch process “wakes up” at 3 a.m. and runs an ETL process on a SAS Application server that extracts multiple tables from a database, transforms, combines, and loads them into a SAS datamart; then moves some data tables across the network and loads them into SAS LASR server, so when you are back to work in the morning your SAS Visual Analytics application has all its data refreshed and ready to roll. Of course, the process schedule can be custom-tailored to your particular needs; your batch jobs may run every 15 minutes, once a week, every first Friday of the month – you name it.

What is a batch script file?

To submit a single SAS program in batch mode manually, you could submit an OS command that looks something like the following:

Unix/Linux

sas /sas/code/proj1/job1.sas -log /sas/code/proj1/job1.log

DOS/Windows

"C:\Program Files\SASHome\SASFoundation\9.4\Sas.exe" -SYSIN c:\proj1\job1.sas -NOSPLASH -ICON -LOG c:\proj1\job1.log

However, submitting an OS command manually has too many drawbacks: it’s too much typing, it only submits one SAS program at a time, and most importantly – it is manual, which means it is prone to human error.

Usually, these OS commands are packaged into so called batch files (shell scripts in Unix) that allow for sequential, parallel, as well as conditional execution of multiple OS line commands. They can be run either manually, or automatically – on schedule, or called by other batch scripts.

In a Windows/DOS Operating System, these script files are called batch files and have .bat filename extensions. In Unix-like operating systems, such as Linux, these script files are called shell scripts and have .sh filename extensions.

Since Windows batch files are similar, but slightly different from the Unix (and its open source cousin Linux) shell scripts, in the below examples we are going to use Unix/Linux shell scripts only, in order to avoid any confusion. And we are going to use terms Unix and Linux interchangeably.

Here is the typical content of a Linux shell script file to run a single SAS program:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

Note, that the shell script syntax allows for some basic programming features like current datetime function, formatting, and variables. It also provides some conditional processing similar to “if-then-else” logic. For detailed information on the shell scripting language you may refer to the following BASH shell script tutorial or any other source of many dialects or flavors of the shell scripting (C Shell, Korn Shell, etc.)

Let’s save the above shell script as the following file:
/sas/code/project1/program1.sh

How to submit a SAS program via Unix script

In order to run this shell script we would submit the following Linux command:
/sas/code/project1/program1.sh

Or, if we navigate to the directory first:
cd /sas/code/project1

then we can submit an abbreviated Linux command
./program1.sh
When run, this shell script not only executes a SAS program (program1.sas), but for every run it also creates and saves a uniquely named SAS Log file. You may create the SAS log file in the same directory where the SAS code is stored, as specified in the script shell above, or specify another directory of your choice.

For example, it creates the following SAS log file:
/sas/code/project1/program1_2017.12.06_09.15.20.log

The file name uniqueness is achieved by adding a date/time stamp suffix between the SAS program name and .log file name extension, in this particular case indicating that this SAS log file was created on December 6, 2017, at 09:15:20 (hours:minutes:seconds).

Unix script for submitting multiple SAS programs

Unix scripts may contain not only OS commands, but also other Unix script calls. You can mix-and-match OS commands and other script calls.

When scripts are created for each individual SAS program that you intend to run in a batch, you can easily combine them into a program flow by creating a flow script containing those single program scripts. For example, let’s create a script file /sas/code/project1/flow1.sh with the following contents:

/sas/code/project1/program1.sh
/sas/code/project1/program2.sh
/sas/code/project1/program3.sh

When submitted as

/sas/code/project1/flow1.sh

it will sequentially execute three scripts - program1.sh, program2.sh, and program3.sh, each of which will execute the corresponding SAS program - program1.sas, program2.sas, and program3.sas, and produce three SAS logs - program1.log, program2.log, and program3.log.

Unix script file permissions

In order to be executable, UNIX script files must have certain permissions. If you create the script file and want to execute it yourself only, the file permissions can be as follows:

-rwxr-----, or 740 in octal representation.

This means that you (the Owner of the script file) have Read (r), Write (w) and Execute (x) permission as indicated by the green highlighting; Group owning the script file has only Read (r) permission as indicated by yellow highlighting;  Others have no permissions to the script file at all as indicated by red highlighting.

If you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In this case, your group has Read (r) and Execute (x) permissions as highlighted in yellow.

In Unix, file permissions are assigned using the chmod Unix command.

Note, that in both examples above we do not give Others any permissions at all. Remember that file permissions are a security feature, and you should assign them at the minimum level necessary.

Conditional execution of scripts and SAS programs

Here is an example of a Unix script file that allows running multiple SAS programs and OS commands at different times.

#!/bin/sh

#1 extract data from a database
/sas/code/etl/etl.sh

&gt;#2 copy data to the Visual Analytics autoload directory
scp -B userid@sasAPPservername:/sas/data/*.sas7bdat userid@sasVAservername:/sas/config/.../AutoLoad

#3 run weekly, every Monday
dow=$(date +%w)
if [ $dow -eq 1 ]
then
   /sas/code/alerts_generation.sh
fi

#4 run monthly, first Friday of every month
dom=$(date +%d)
if [ $dow -eq 5 -a $dom -le 7 ]
then
   /sas/code/update_history.sh
   /sas/code/update_transactions.sh
fi

In this script, the following logical operators are used: -eq (equal), -le (less or equal), -a (logical and).

As you can see, the script logic takes care of branching to execute different SAS programs when certain timing conditions are met. With such an approach, you would need to schedule only this single script to run at a specified time/interval, say daily at 3 a.m.

In this case, the script will “wake up” every morning at 3 a.m. and execute its component scripts either unconditionally, or conditionally.

If one of the included programs needs to run at a different, lesser frequency (e.g. every Monday, or monthly on first Friday of every month) the script logic will trigger those executions at the appropriate times.

In the above script example steps #1 and #2 will execute every time (unconditionally) the script runs (daily). Step #1 runs ETL program to extract data from a database, step #2 copies the extracted data across the network from SAS Application server to the SAS LASR Analytic server’s drop zone from where they are automatically loaded (autoloaded) into the LASR.

Step #3 will run conditionally every Monday ( $dow -eq 1). Step #4 will run conditionally every first Friday of a month ($dow -eq 5 -a $dom -le 7).

For more information on how to format date for use in shell scripts please refer to this post.

Do you run your SAS programs in batch?

Please share your batch experiences in the comment section below. I am sure the rest of us will really appreciate it!

Running SAS programs in batch under Unix/Linux was published on SAS Users.