In automated production (or business operations) environments, we often run SAS job flows in batch mode and on schedule. SAS job flow is a collection of several inter-dependent SAS programs executed as a single process.
In my earlier posts, Running SAS programs in batch under Unix/Linux and Let SAS write batch scripts for you, I described how you can run SAS programs in batch mode by creating UNIX/Linux scripts that in turn incorporate other scripts invocations.
In this scenario you can run multiple SAS programs sequentially or in parallel, all while having a single root script kicked off on schedule. The whole SAS processing flow runs like a chain reaction.
Why and when to stop SAS batch flow process
However, sometimes we need to automatically stop and terminate that chain job flow execution if certain criteria are met (or not met) in a program of that process flow.
Let’s say our first job in a batch flow is a data preparation step (ETL) where we extract data tables from a database and prepare them for further processing. The rest of the batch process is dependent on successful completion of that critical first job. The process is kicked off at 3:00 a.m. daily, however, sometimes we run into a situation when the database connection is unavailable, or the database itself is not finished refreshing, or something else happens resulting in the ETL program completing with ERRORs.
This failure means that our data has not updated properly and there is no reason to continue running the remainder of the job flow process as it might lead to undesired or even disastrous consequences. In this situation we want to automatically terminate the flow execution and send an e-mail notification to the process owners and/or SAS administrators informing them about the mishap.
How to stop SAS batch flow process in UNIX/Linux
Suppose, we run the following main.sh script on UNIX/Linux:
#!/bin/sh #1 extract data from a database /sas/code/etl/etl.sh #2 run the rest of processing flow /sas/code/processing/tail.sh
The etl.sh script runs the SAS ETL process as follows:
#!/usr/bin/sh dtstamp=$(date +%Y.%m.%d_%H.%M.%S) pgmname="/sas/code/etl/etl.sas" logname="/sas/code/etl/etl_$dtstamp.log" /sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname
We want to run tail.sh shell script (which itself runs multiple other scripts) only if etl.sas program completes successfully, that is if SAS ETL process etl.sas that is run by etl.sh completes with no ERRORs or WARNINGs. Otherwise, we want to terminate the main.sh script and do not run the rest of the processing flow.
To do this, we re-write our main.sh script as:
#!/bin/sh #1 extract data from a database /sas/code/etl/etl.sh exitcode=$? echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)" if [ $exitcode -eq 0 ] then #2 run the rest of processing flow /sas/code/processing/tail.sh fi
In this code, we use a special shell script variable ($? for the Bourne and Korn shells, $STATUS for the C shell) to capture the exit status code of the previously executed OS command, /sas/code/etl/etl.sh:
Then the optional echo command just prints the captured value of that status for our information.
Every UNIX/Linux command executed by the shell script or user has an exit status represented by an integer number in the range of 0-255. The exit code of 0 means the command executed successfully without any errors; a non-zero value means the command was a failure.
SAS System plays nicely with the UNIX/Linux Operating System. According to the SAS documentation $? for the Bourne and Korn shells, and $STATUS for the C shell.) A value of 0 indicates successful termination. For additional flexibility, SAS’ Condition
Since our etl.sh script executes SAS code etl.sas, the exit status code is passed by the SAS System to etl.sh and consequently to our main.sh shell script.
Then, in the main.sh script we check if that exit code equals to 0 and then and only then run the remaining flow by executing the tail.sh shell script. Otherwise, we skip tail.sh and exit from the main.sh script reaching its end.
Alternatively, the main.sh script can be implemented with an explicit exit as follows:
#!/bin/sh #1 extract data from a database /sas/code/etl/etl.sh exitcode=$? echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)" if [ $exitcode -ne 0 ] then exit fi #2 run the rest of processing flow /sas/code/processing/tail.sh
In this shell script code example, we check the exit return code value, and if it is NOT equal to 0, then we explicitly terminate the main.sh shell script using exit command which gets us out of the script immediately without executing the subsequent commands. In this case, our #2 command invoking tail.sh script never gets executed that effectively stops the batch flow process.
If you also need to automatically send an e-mail notification to the designated people about the failed batch flow process, you can do it in a separate SAS job that runs right before exit command. Then the if-statement will look something like this:
if [ $exitcode -ne 0 ] then # send an email and exit /sas/code/etl/email_etl_failure.sh exit fi
That is immediately after the email is sent, the shell script and the whole batch flow process gets terminated by the exit command; no shell script commands beyond that if-statement will be executed.
A word of caution
Be extra careful if you use the special script variable $? directly in a script's logical expression, without assigning it to an interim variable. For example, you could use the following script command sequence:
/sas/code/etl/etl.sh if [ $? -ne 0 ] . . .
However, let’s say you insert another script command between them, for example:
/sas/code/etl/etl.sh echo "Status=$? (0=SUCCESS,1=WARNING,2=ERROR)" if [ $? -ne 0 ] . . .
Then the $? variable in the if [ $? -ne 0 ] statement will have the value of the previous echo command, not the /stas/code/etl/etl.sh command as you might imply.
Hence, I suggest capturing the $? value in an interim variable (e.g. exitcode=$?) right after the command, exit code of which you are going to inspect, and then reference that interim variable (as $exitcode) in your subsequent script statements. That will save you from trouble of inadvertently referring to a wrong exit code when you insert some additional commands during your script development.
What do you think about this approach? Did you find this blog post useful? Did you ever need to terminate your batch job flow? How did you go about it? Please share with us.