sas programming

3月 112019
 

A SAS programmer posted an interesting question on a SAS discussion forum. The programmer wanted to iterate over hundreds of SAS data sets, read in all the character variables, and then do some analysis. However, not every data set contains character variables, and SAS complains when you ask it to read the character variables in a data set that contains only numeric variables.

The programmer wanted to use PROC IML to solve the problem, but the issue also occurs in the SAS DATA step. The following program creates three data sets. Two of them (AllChar and Mixed) contain at least one character variable. The third data set (AllNum) does not contain any character variables. For the third data set, an error occurs if you try to use the KEEP=_CHARACTER_ data set option, as shown in the following example:

data AllNum;
   x=1; y=2; z=3;
run;
data AllChar;
   A='ABC'; B='XYZW';
run;
data Mixed;
   name='Joe'; sex='M'; Height=1.8; Weight=81; treatment='Placebo'; 
run;
 
/* try to use DROP=_CHARACTER_ to exclude numeric variables */
data KeepTheChar;
   set AllNum(keep=_CHARACTER_); /* ERROR when no character variables in the data set */
run;
   ERROR: The variable _CHARACTER_ in the DROP, KEEP, or RENAME list has never been referenced.

The same problem occurs in PROC IML if you try to read character variables when none exist:

proc iml;
use AllNum;
   read all var _CHAR_ into X; /* ERROR when no character variables in the data set */
close;
   ERROR: No character variables in the data set.

There are at least two ways to handle this situation:

  1. In both Base SAS and SAS/IML, you can use dictionary tables to determine in advance which data sets contain at least one character variable. You can then read only those data set.
  2. In SAS/IML, you can read all variables into a table, then extract the character variables into a matrix for further processing.

Of course, the same ideas apply if you want to read only numeric variables and you encounter a data set that does not contain any numeric variables.

Use DICTIONARY tables to find information about your SAS session

If you have ever been to a SAS conference, you know that DICTIONARY tables are a favorite topic for SAS programmers. DICTIONARY tables are read-only tables that provide information about the state of the SAS session, including libraries, data sets, variables, and system options. You can access them directly by using PROC SQL. If you want to access the information in the DATA steps or other procedures (like PROC IML), you can use special data views in SASHELP. In particular, the Sashelp.VColumn view provides information about variables in SAS data set and is often used to find data sets that contain certain variable names. (See the references at the end of this article for more information about DICTIONARY tables.)

The following SAS/IML program uses the Sashelp.VColumn to find out which data sets contain at least one character variable:

proc iml;
/* Solution 1: Use dictionary table sashelp.vcolumn */
/* Find data sets in WORK that have AT LEAST ONE character variable */
use sashelp.vcolumn(where=(libname="WORK" & memtype='DATA' & type='char'); /* read only CHAR variables */
   read all var {memname name};  /* memname=data set name; name=name of character variable */
close;
 
/* loop over data sets. If a set contains at least one character variable, process it */
dsName = {'AllNum' 'AllChar' 'Mixed'};       /* names of potential data sets */
do i = 1 to ncol(dsName);
   idx = loc(memname = upcase(dsName[i]));   /* is data set on the has-character-variable list? */
   /* for demo, print whether data set has character variables */
   msg = "The data set " + (dsName[i]) + " contains " + 
          char(ncol(idx)) + " character variables.";
   print msg;
 
   if ncol(idx)>0 then do;            /* the data set exists and has character vars */
      charVars = name[idx];           /* get the names of the character vars */
      use (dsName[i]);                /* open the data set for reading */
      read all var charVars into X;   /* read character variables (always succeeds) */
      close;
      /* process the data */
   end;
end;

The output shows that you can use the DICTIONARY tables to determine which data sets have at least one character variable. You can then use the USE/READ statements in PROC IML to read the character variables and process the data however you wish. As mentioned previously, this technique can also be used in PROC SQL and the DATA step.

Use SAS/IML tables to find character variables

The previous section is very efficient because only character variables are ever read into SAS/IML matrices. However, there might be situations when you want to process character variables (if they exist) and then later process numerical variables (if they exist). Although a SAS/IML matrix contains only one data type (either all numeric or all character), you can read mixed-type data into a SAS/IML table, which supports both numeric and character variables. You can then use the TableIsVarNumeric function to generate a binary indicator variable that tells you which variables in the data are numeric and which are character, as follows:

/* Solution 2: Read all data into a table. Use the TableIsVarNumeric function to determine
   which variables are numeric and which are character. */
dsName = {'AllNum' 'AllChar' 'Mixed'};             /* names of potential data sets */
do i = 1 to ncol(dsName);                          /* for each data set... */
   T = TableCreateFromDataset("WORK", dsName[i]);  /* read all variables into a table */
   numerInd = TableIsVarNumeric(T);                /* binary indicator vector for numeric vars */
   charInd = ^numerInd;                            /* binary indicator vector for character vars */
   numCharVars = sum(charInd);                     /* count of character variables in this data set */
   msg = "The data set " + (dsName[i]) + " contains " + 
         char(numCharVars) + " character variables.";
   print msg;
   if numCharVars > 0 then do;
      X = TableGetVarData(T, loc(charInd));        /* extract the character variables into X */
      /* process the data */
   end;
   /* optionally process the numeric data */
   numNumerVars = sum(numerInd);                   /* count of numeric variables in this data set */
   /* etc */
end;

The output is identical to the output in the previous section.

Summary

In summary, this article discusses a programmer who wants to iterate over many SAS data sets and process only character variables. However, some of the data sets do not have any character variables! This article shows two methods for dealing with this situation: DICTIONARY tables (available through Sashelp views) or SAS/IML tables. The first method is also available in Base SAS.

Of course, you can also use this trick to read all numeric variables when some of the data sets might not have any numeric variable. I've previously written about how to read all numeric variables into a SAS/IML matrix by using the _ALL_ keyword. If the data set contains both numeric and character variables, then only the numeric variables are read.

References

The following resources provide more information about DICTIONARY tables in SAS:

The post How to detect SAS data sets that contain (or do not contain) character variables appeared first on The DO Loop.

3月 062019
 

conditionally terminating a SAS batch flow process in UNIX/LinuxIn automated production (or business operations) environments, we often run SAS job flows in batch mode and on schedule. SAS job flow is a collection of several inter-dependent SAS programs executed as a single process.

In my earlier posts, Running SAS programs in batch under Unix/Linux and Let SAS write batch scripts for you, I described how you can run SAS programs in batch mode by creating UNIX/Linux scripts that in turn incorporate other scripts invocations.

In this scenario you can run multiple SAS programs sequentially or in parallel, all while having a single root script kicked off on schedule. The whole SAS processing flow runs like a chain reaction.

Why and when to stop SAS batch flow process

However, sometimes we need to automatically stop and terminate that chain job flow execution if certain criteria are met (or not met) in a program of that process flow.
Let’s say our first job in a batch flow is a data preparation step (ETL) where we extract data tables from a database and prepare them for further processing. The rest of the batch process is dependent on successful completion of that critical first job. The process is kicked off at 3:00 a.m. daily, however, sometimes we run into a situation when the database connection is unavailable, or the database itself is not finished refreshing, or something else happens resulting in the ETL program completing with ERRORs.

This failure means that our data has not updated properly and there is no reason to continue running the remainder of the job flow process as it might lead to undesired or even disastrous consequences. In this situation we want to automatically terminate the flow execution and send an e-mail notification to the process owners and/or SAS administrators informing them about the mishap.

How to stop SAS batch flow process in UNIX/Linux

Suppose, we run the following main.sh script on UNIX/Linux:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

The etl.sh script runs the SAS ETL process as follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/etl/etl.sas"
logname="/sas/code/etl/etl_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

We want to run tail.sh shell script (which itself runs multiple other scripts) only if etl.sas program completes successfully, that is if SAS ETL process etl.sas that is run by etl.sh completes with no ERRORs or WARNINGs. Otherwise, we want to terminate the main.sh script and do not run the rest of the processing flow.

To do this, we re-write our main.sh script as:

 
#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -eq 0 ]
   then
      #2 run the rest of processing flow
      /sas/code/processing/tail.sh
fi

In this code, we use a special shell script variable ($? for the Bourne and Korn shells, $STATUS for the C shell) to capture the exit status code of the previously executed OS command, /sas/code/etl/etl.sh:

exitcode=$?

Then the optional echo command just prints the captured value of that status for our information.

Every UNIX/Linux command executed by the shell script or user has an exit status represented by an integer number in the range of 0-255. The exit code of 0 means the command executed successfully without any errors; a non-zero value means the command was a failure.

SAS System plays nicely with the UNIX/Linux Operating System. According to the SAS documentation $? for the Bourne and Korn shells, and $STATUS for the C shell.) A value of 0 indicates successful termination. For additional flexibility, SAS’ Condition Exit Status Code All steps terminated normally 0 SAS issued WARNINGs 1 SAS issued ERRORs 2 User issued ABORT statement 3 User issued ABORT RETURN statement 4 User issued ABORT ABEND statement 5 SAS could not initialize because of a severe error 6 User issued ABORT RETURN - n statement n User issued ABORT ABEND - n statement n

Since our etl.sh script executes SAS code etl.sas, the exit status code is passed by the SAS System to etl.sh and consequently to our main.sh shell script.

Then, in the main.sh script we check if that exit code equals to 0 and then and only then run the remaining flow by executing the tail.sh shell script. Otherwise, we skip tail.sh and exit from the main.sh script reaching its end.

Alternatively, the main.sh script can be implemented with an explicit exit as follows:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -ne 0 ]
   then exit
fi
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

In this shell script code example, we check the exit return code value, and if it is NOT equal to 0, then we explicitly terminate the main.sh shell script using exit command which gets us out of the script immediately without executing the subsequent commands. In this case, our #2 command invoking tail.sh script never gets executed that effectively stops the batch flow process.

If you also need to automatically send an e-mail notification to the designated people about the failed batch flow process, you can do it in a separate SAS job that runs right before exit command. Then the if-statement will look something like this:

 
if [ $exitcode -ne 0 ]
   then
      # send an email and exit
      /sas/code/etl/email_etl_failure.sh
      exit
fi

That is immediately after the email is sent, the shell script and the whole batch flow process gets terminated by the exit command; no shell script commands beyond that if-statement will be executed.

A word of caution

Be extra careful if you use the special script variable $? directly in a script's logical expression, without assigning it to an interim variable. For example, you could use the following script command sequence:

/sas/code/etl/etl.sh
if [ $? -ne 0 ]
. . .

However, let’s say you insert another script command between them, for example:

/sas/code/etl/etl.sh
echo "Status=$? (0=SUCCESS,1=WARNING,2=ERROR)"
if [ $? -ne 0 ]
. . .

Then the $? variable in the if [ $? -ne 0 ] statement will have the value of the previous echo command, not the /stas/code/etl/etl.sh command as you might imply.

Hence, I suggest capturing the $? value in an interim variable (e.g. exitcode=$?) right after the command, exit code of which you are going to inspect, and then reference that interim variable (as $exitcode) in your subsequent script statements. That will save you from trouble of inadvertently referring to a wrong exit code when you insert some additional commands during your script development.

Your thoughts

What do you think about this approach? Did you find this blog post useful? Did you ever need to terminate your batch job flow? How did you go about it? Please share with us.

How to conditionally terminate a SAS batch flow process in UNIX/Linux was published on SAS Users.

2月 132019
 

SAS has worked with our exam delivery partners to integrate a live lab into an exam, which can be delivered anywhere, anytime, on-demand.

The post New Performance-Based Certification: Write SAS Code During Your Exam appeared first on SAS Learning Post.

2月 082019
 

Since we added the new "Recommended by SAS" widget in the SAS Support Communities, I often find myself diverted to topics that I probably would not have found otherwise. This is how I landed on this question and solution from 2011 -- "How to convert 5ft. 4in. (Char) into inches (Num)". While the question was deftly answered by my friend (and SAS partner) Jan K. from the Netherlands, the topic inspired me to take it a step further here.

Jan began his response by throwing a little shade on the USA:

Short of moving to a country that has a decent metric system in place, I suggest using a regular expression.

On behalf of my nation I just want say that, for the record, we tried. But we did not get very far with our metrication, so we are stuck with the imperial system for most non-scientific endeavors.

Matching patterns with a regular expression

Regular expressions are a powerful method for finding specific patterns in text. The syntax of regular expressions can be a challenge for those getting started, but once you've solved a few pattern-recognition problems by using them, you'll never go back to your old methods.

Beginning with the solution offered by Jan, I extended this program to read in a "ft. in." measurement, convert to the component number values, express the total value in inches, and then convert the measurement to centimeters. I know that even with my changes, we can think of patterns that might not be matched. But allow me to describe the updates:

Here's my program, followed by the result:

data measure;
 length 
     original $ 25
     feet 8 inches 8 
     total_inches 8 total_cm 8;
 /* constant regex is parsed just once */
 re = prxparse('/(\d*)ft.(\s*)((\d?\.?\d?)in.)?/'); 
 input;
 original = _infile_;
 if prxmatch(re, original) then do;
  feet =   input ( prxposn(re, 1, original), best12.);
  inches = input ( prxposn(re, 4, original), best12.);
  if missing(inches) and not missing(feet) then inches=0;
 end;
 else 
   original = "[NO MATCH] " || original;
 total_inches = (feet*12) + inches;
 total_cm = total_inches * 2.54;
 drop re;
cards;
5ft. 4in.
4ft 0in.
6ft. 10in.
3ft.2in.
4ft.
6ft.     1.5in.
20ft. 11in.
25ft. 6.5in.
Soooooo big
;

Other tools to help with regular expressions

The Internet offers a plethora of online tools to help developers build and test regular expression syntax. Here's a screenshot from RegExr.com, which I used to test some aspects of my program.

Tools like these provide wonderful insight into the capture groups and regex directives that will influence the pattern matching. They are part tutorial, part workbench for regex crafters at all levels.

Many of these tools also include community contributions for matching common patterns. For example, many of us often need to match/parse data with e-mail addresses, phone numbers, and other tokens as data fields. Sites like RegExr.com include the syntax for many of these common patterns that you can copy and use in your SAS programs.

See Also

The post Convert a text-based measurement to a number in SAS appeared first on The SAS Dummy.

1月 182019
 

It seems that everyone knows about GitHub -- the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) -- but now we all use it for all types of projects.

There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.

Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work -- including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:

  • the Base SAS programming language, via a collection of SAS functions.
  • SAS Data Integration Studio, via a new source control plugin
  • SAS Studio (experimental in v3.8)

You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.

SAS functions for Git

Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of "GITFN_" (signifying "Git fun!", I assume). Here's a partial list:

GITFN_CLONE  Clones a Git repository (for example, from GitHub) into a directory on the SAS server.
GITFN_COMMIT  Commits staged files to the local repository
GITFN_DIFF Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository.
GITFN_PUSH  Pushes the committed files in the local repository to the remote repository.
GITFN_NEW_BRANCH  Creates a Git branch

 

The function names make sense if you're familiar with Git lingo. If you're new to Git, you'll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)

You can

data _null_;
 version = gitfn_version();
 put version=;             
 
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "c:\Projects\sas-dummy-blog");
 put rc=;
run;

In one line, this function fetches an entire collection of code files from your source control system. Here's a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try -- here's the code that will be pulled/run. It even works from SAS University Edition.)

options dlcreatedir;
%let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog;
libname repo "&repoPath.";
libname repo clear;
 
/* Fetch latest code from GitHub */
data _null_;
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "&repoPath.");
 put rc=;
run;
 
/* run the code in this session */
%include "&repoPath./rng_example_thanos.sas";

You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results -- whatever you need to keep and version.

Using Git in SAS Data Integration Studio

SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren't media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.

Example of Git in SAS DI Studio

See the documentation for details:

Read more about setup and use in the available here as part of our "Custom Tasks Tuesday" series.

Using Git in SAS Enterprise Guide

This isn't new, but I'll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.

program history

If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here's an important tip. Change your project file properties to "Use paths relative to the project for programs and importable files." You'll find this checkbox in File->Project Properties.

With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.

The post Using built-in Git operations in SAS appeared first on The SAS Dummy.

1月 022019
 

Last year, I wrote more than 100 posts for The DO Loop blog. Of these, the most popular articles were about data visualization, SAS programming tips, and statistical data analysis. Here are the most popular articles from 2018 in each category.

Data Visualization

General SAS programming techniques

Statistics and Data Analysis

I write this blog because I love to learn new things and share what I know with others. If you want to learn something new, read (or re-read!) these popular articles from 2018. Then share this page with one of your colleagues. Happy New Year! I hope we both have many opportunities to learn and share in 2019!

The post Top posts from <em>The DO Loop</em> in 2018 appeared first on The DO Loop.