sas programming

4月 112019
 

This blog post could be subtitled "To Catch a Thief" or maybe "Go ahead. Steal this blog. I dare you."* That's because I've used this technique several times to catch and report other web sites who lift the blog content from blogs.sas.com and present it as their own.

Syndicating blog content is an honorable practice, made possible by RSS feeds that virtually every blog platform supports. With syndicated feeds, your blog content can appear on someone else's site, but this always includes attribution and a link back to the original source. However, if you copy content from another website and publish it natively on your own web property, without attribution or citations...well, that's called plagiarism. And the Digital Millennium Copyright Act (DCMA) provides authors with recourse to have stolen content removed from infringing sites -- if you can establish that you're the rightful copyright owner.

Establishing ownership is a tedious task, especially when someone steals dozens or hundreds of articles. You must provide links to each example of infringing content, along with links to the original authorized content. Fortunately, as I've discussed before, I have ready access to the data about all 17,000+ blog posts that we've published at SAS (see How SAS uses SAS to Analyze SAS Blogs). In this article, I'll show you how I gathered that same information from the infringing websites so that I could file the DCMA "paperwork" almost automatically.

The complete programs from this article are available on GitHub.

Read a JSON feed using the JSON engine

In my experience, the people who steal our blog content don't splurge on fancy custom web sites. They tend to use free or low-cost web site platforms, and the most popular of these include WordPress (operated by Automattic) and Blogspot (operated by Google). Both of these platforms support API-like syndication using feeds.

Blogspot sites can generate article feeds in either XML or JSON. I prefer JSON when it's available, as I find that the JSON libname engine in SAS requires fewer "clues" in order to generate useful tables. (See documentation for the JSON engine.) While you can supply a JSON map file that tells SAS how to assemble your tables and data types, I find it just as easy to read the data as-is and post-process it to join the fields I need and convert data fields. (For an example that uses a JSON map, see Reading data with the SAS JSON libname engine.)

Since I don't want to draw attention to the specific infringing sites, I'll use an example of a popular (legitimate!) Blogspot site named "Maps Mania". If you're into data and maps (who isn't?) you might like their content. In this code I use PROC HTTP to fetch the RSS feed, using "alt=json" to request JSON format and "max-results=100" to retrieve a larger-than-default batch of published posts.

/* Read JSON feed into a local file. */
/* Use Blogspot parameters to get 100 posts at a time */
filename resp temp;
proc http
 url='https://googlemapsmania.blogspot.com/feeds/posts/default?alt=json&max-results=100'
 method="get"
 out=resp;
run;
 
libname rss json fileref=resp;

This JSON libname breaks the data into a series of tables that relate to each other via common keys.

RSS feed tables

With a little bit of exploration in SAS Enterprise Guide and the Query Builder, I was able to design a PROC SQL step to assemble just the fields and records I needed: post title and post URL.

/* Join the relevant feed entry items to make a single table */
/* with post titles and URLs */
proc sql;
   create table work.blogspot as 
   select t2._t as rss_title,
          t1.href as rss_href          
      from rss.entry_link t1
           inner join rss.entry_title t2 on (t1.ordinal_entry = t2.ordinal_entry)
      where t1.type = 'text/html' and t1.rel = 'alternate';
quit;
 
libname rss clear;

RSS output from blogspot

Read an XML feed using the XMLv2 engine

WordPress sites generate XML-based feeds by default. Site owners can install a WordPress plugin to generate JSON feeds as well, but most sites don't bother with that. Like the JSON feeds, the XML feed can contain many fields that relate to each other. I find that with XML, the best approach is to use the SAS XML Mapper application to explore the XML and "design" the final data tables that you need. You use SAS XML Mapper to create a map file, which you can then feed into the SAS XMLv2 engine to instruct SAS how to read the data. (See documentation for the XMLv2 engine.)

SAS XML Mapper is available as a free download from support.sas.com. Download it as a ZIP file (on Windows), and extract the ZIP file to a temporary folder. Then run setup.exe in the root of that folder to install the app on your system.

To design the map, I use an example of the XML feed from the blog that I want to examine. Once again, I'll choose a popular WordPress blog instead of the actual infringing sites. In this case, let's look at the Star Wars News site. I point my browser at the feed address is https://www.starwars.com/news/feed and save as an XML file. Then, I use SAS XML Mapper to Open XML (File menu), and examine the result.

I found everything that I needed in "item" subset of the feed. I dragged that group over to the right pane to include in the map. That creates a data set container named "item." Then dragged just the title, link, and pubDate fields into that data set to include in the final result.

The SAS XML Mapper generates a SAS program that you can include to define the map, and that's what I've done with the following code. It uses DATA step to create the map file just as I need it.

filename rssmap temp;
data _null_;
 infile datalines;
 file rssmap;
 input;
 put _infile_;
 datalines;
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="RSSMAP" version="2.1">
  <NAMESPACES count="0"/>
  <!-- ############################################################ -->
  <TABLE name="item">
    <TABLE-PATH syntax="XPath">/rss/channel/item</TABLE-PATH>
    <COLUMN name="title">
      <PATH syntax="XPath">/rss/channel/item/title</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>250</LENGTH>
    </COLUMN>
    <COLUMN name="link">
      <PATH syntax="XPath">/rss/channel/item/link</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>200</LENGTH>
    </COLUMN>
    <COLUMN name="pubDate">
      <PATH syntax="XPath">/rss/channel/item/pubDate</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>40</LENGTH>
    </COLUMN>
  </TABLE>
</SXLEMAP>
;
run;

Because WordPress feeds return just most recent 25 items by default, I need to use the "pageid=" directive to go deeper into the archive and return older items. I used a simple SAS macro loop to iterate through 5 pages (125 items) in this example. Note how I specified the XMLv2 libname with the XMLMAP= option to include my custom map. That ensures that SAS will read the XML and build the table as I've designed it.

My final DATA step in this part is to recast the pubDate field (a text field by default) into a proper SAS date.

/* WordPress feeds return data in pages, 25 entries at a time        */
/* So using a short macro to loop through past 5 pages, or 125 items */
%macro getItems;
  %do i = 1 %to 5;
  filename feed temp;
  proc http
   method="get"
   url="https://www.starwars.com/news/feed?paged=&i."
   out=feed;
  run;
 
  libname result XMLv2 xmlfileref=feed xmlmap=rssmap;
 
  data posts_&i.;
   set result.item;
  run;
  %end;
%mend;
 
%getItems;
 
/* Assemble all pages of entries                       */
/* Cast the date field into a proper SAS date          */
/* Have to strip out the default day name abbreviation */
/* "Wed, 10 Apr 2019 17:36:27 +0000" -> 10APR2019      */
data allPosts ;
 set posts_:;
 length sasPubdate 8;
 sasPubdate = input( substr(pubDate,4),anydtdtm.);
 format sasPubdate dtdate9.;
 drop pubDate;
run;

Reporting the results

After gathering the data I need from RSS feeds, I use SAS to match that with the WordPress data that I have about our blogs. I can then generate a table that I can easily submit in a DCMA form.

Usually, matching by "article title" is the easiest method. However, sometimes the infringing site will alter the titles a little bit or even make small adjustments to the body of the article. (This reminds me of my college days in computer science, when struggling students would resort to cheating by copying someone else's program, but change just the variable names. It's a weak effort.) With the data in SAS, I've used several other techniques to detect the "distance" of a potentially infringing post from the original work.

Maybe you want to see that code. But you can't expect me to reveal all of my secrets, can you?


* props to Robert "I dare you to knock it off" Conrad.

The post Read RSS feeds with SAS using XML or JSON appeared first on The SAS Dummy.

4月 012019
 

dividing by zero with SAS

Whether you are a strong believer in the power of dividing by zero, agnostic, undecided, a supporter, denier or anything in between and beyond, this blog post will bring all to a common denominator.

History of injustice

For how many years have you been told that you cannot divide by zero, that dividing by zero is not possible, not allowed, prohibited? Let me guess: it’s your age minus 7 (± 2).

But have you ever been bothered by that unfair restriction? Think about it: all other numbers get to be divisors. All of them, including positive, negative, rational, even irrational and imaginary. Why such an injustice and inequality before the Law of Math?

We have our favorites like π, and prime members (I mean numbers), but zero is the bottom of the barrel, the lowest of the low, a pariah, an outcast, an untouchable when it comes to dividing by. It does not even have a sign in front of it. Well, it’s legal to have, but it’s meaningless.

And that’s not all. Besides not being allowed in a denominator, zeros are literally discriminated against beyond belief. How else could you characterize the fact that zeros are declared as pathological liars as their innocent value is equated to FALSE in logical expressions, while all other more privileged numbers represent TRUE, even the negative and irrational ones!

Extraordinary qualities of zeros

Despite their literal zero value, their informational value and qualities are not less than, and in many cases significantly surpass those of their siblings. In a sense, zero is a proverbial center of the universe, as all the other numbers dislocated around it as planets around the sun. It is not coincidental that zeros are denoted as circles, which makes them forerunners and likely ancestors of the glorified π.

Speaking of π, what is all the buzz around it? It’s irrational. It’s inferior to 0: it takes 2 π’s to just draw a single zero (remember O=2πR?). Besides, zeros are not just well rounded, they are perfectly rounded.

Privacy protection experts and GDPR enthusiasts love zeros. While other small numbers are required to be suppressed in published demographical reports, zeros may be shown prominently and proudly as they disclose no one’s personally identifiable information (PII).

No number rivals zero. Zeros are perfect numerators and equalizers. If you divide zero by any non-zero member of the digital community, the result will always be zero. Always, regardless of the status of that member. And yes, zeros are perfect common denominators, despite being prohibited from that role for centuries.

Zeros are the most digitally neutral and infinitely tolerant creatures. What other number has tolerated for so long such abuse and discrimination!

Enough is enough!

Dividing by zero opens new horizons

Can you imagine what new opportunities will arise if we break that centuries-old tradition and allow dividing by zero? What new horizons will open! What new breakthroughs and discoveries can be made!

With no more prejudice and prohibition of the division by zero, we can prove virtually anything we wish. For example, here is a short 5-step mathematical proof of “4=5”:

1)   4 – 4 = 10 – 10
2)   22 – 22 = 5·(2 – 2)
3)   (2 + 2)·(2 – 2) = 5·(2 – 2) /* here we divide both parts by (2 – 2), that is by 0 */
4)   (2 + 2) = 5
5)   4 = 5

Let’s make the next logical step. If dividing by zero can make any wish a reality, then producing a number of our choosing by dividing a given number by zero scientifically proves that division by zero is not only legitimate, but also feasible and practical.

As you will see below, division by zero is not that easy, but with the power of SAS, the power to know and the powers of curiosity, imagination and perseverance nothing is impossible.

Division by zero - SAS implementation

Consider the following use case. Say you think of a “secret” number, write it on a piece of paper and put in a “secret” box. Now, you take any number and divide it by zero. If the produced result – the quotient – is equal to your secret number, wouldn’t it effectively demonstrate the practicality and magic power of dividing by zero?

Here is how you can do it in SAS. A relatively “simple”, yet powerful SAS macro %DIV_BY_0 takes a single number as a numerator parameter, divides it by zero and returns the result equal to the one that is “hidden” in your “secret” box. It is the ultimate, pure artificial intelligence, beyond your wildest imagination.

All you need to do is to run this code:

 
data MY_SECRET_BOX;        /* you can use any dataset name here */
   MY_SECRET_NUMBER = 777; /* you can use any variable name here and assign any number to it */
run;
 
%macro DIV_BY_0(numerator);
 
   %if %sysevalf(&numerator=0) %then %do; %put 0:0=1; %return; %end;
   %else %let putn=&sysmacroname; 
   %let %sysfunc(putn(%substr(&putn,%length(&putn)),words.))=
   %sysevalf((&numerator/%sysfunc(constant(pi)))**&sysrc);  
   %let a=com; %let null=; %let nu11=%length(null); 
   %let com=*= This is going to be an awesome blast! ;
   %let %substr(&a,&zero,&zero)=*Close your eyes and open your mind, then;
   %let imagine = "large number like 71698486658278467069846772 Bytes divided by 0";
   %let O=%scan(%quote(&c),&zero+&nu11); 
   %let l=%scan(%quote(&c),&zero);
   %let _=%substr(%scan(&imagine,&zero+&nu11),&zero,&nu11);
   %let %substr(&a,&zero,&zero)%scan(&&&a,&nu11+&nu11-&zero)=%scan(&&&a,-&zero,!b)_;
   %do i=&zero %to %length(%scan(&imagine,&nu11)) %by &zero+&zero;
   %let null=&null%sysfunc(&_(%substr(%scan(&imagine,&nu11),&i,&zero+&zero))); %end;
   %if &zero %then %let _0=%scan(&null,&zero+&zero); %else;
   %if &nu11 %then %let _O=%scan(&null,&zero);
   %if %qsysfunc(&O(_&can)) %then %if %sysfunc(&_0(&zero)) %then %put; %else %put;
   %put &numerator:0=%sysfunc(&_O(&zero,&zero));
   %if %sysfunc(&l(&zero)) %then;
 
%mend DIV_BY_0;
 
%DIV_BY_0(55); /* parameter may be of any numeric value */

When you run this code, it will produce in the SAS LOG your secret number:

55:0=777

How is that possible without the magic of dividing by zero? Note that the %DIV_BY_0 macro has no knowledge of your dataset name, nor the variable name holding your secret number value to say nothing about your secret number itself.

That essentially proves that dividing by zero can practically solve any imaginary problem and make any wish or dream come true. Don’t you agree?

There is one limitation though. We had to make this sacrifice for the sake of numeric social justice. If you invoke the macro with the parameter of 0 value, it will return 0:0=1 – not your secret number - to make it consistent with the rest of non-zero numbers (no more exceptions!): “any number, except zero, divided by itself is 1”.

Challenge

Can you crack this code and explain how it does it? I encourage you to check it out and make sure it works as intended. Please share your thoughts and emotions in the Comments section below.

Disclosure

This SAS code contains no cookies, no artificial sweeteners, no saturated fats, no psychotropic drugs, no illicit substances or other ingredients detrimental to your health and integrity, and no political or religious statements. It does not collect, distribute or sell your personal data, in full compliance with FERPA, HIPPA, GDPR and other privacy laws and regulations. It is provided “as is” without warranty and is free to use on any legal SAS installation. The whole purpose of this blog post and the accompanied SAS programming implementation is to entertain you while highlighting the power of SAS and human intelligence, and to fool around in the spirit of the date of this publication.

Dividing by zero with SAS was published on SAS Users.

3月 112019
 

A SAS programmer posted an interesting question on a SAS discussion forum. The programmer wanted to iterate over hundreds of SAS data sets, read in all the character variables, and then do some analysis. However, not every data set contains character variables, and SAS complains when you ask it to read the character variables in a data set that contains only numeric variables.

The programmer wanted to use PROC IML to solve the problem, but the issue also occurs in the SAS DATA step. The following program creates three data sets. Two of them (AllChar and Mixed) contain at least one character variable. The third data set (AllNum) does not contain any character variables. For the third data set, an error occurs if you try to use the KEEP=_CHARACTER_ data set option, as shown in the following example:

data AllNum;
   x=1; y=2; z=3;
run;
data AllChar;
   A='ABC'; B='XYZW';
run;
data Mixed;
   name='Joe'; sex='M'; Height=1.8; Weight=81; treatment='Placebo'; 
run;
 
/* try to use DROP=_CHARACTER_ to exclude numeric variables */
data KeepTheChar;
   set AllNum(keep=_CHARACTER_); /* ERROR when no character variables in the data set */
run;
   ERROR: The variable _CHARACTER_ in the DROP, KEEP, or RENAME list has never been referenced.

The same problem occurs in PROC IML if you try to read character variables when none exist:

proc iml;
use AllNum;
   read all var _CHAR_ into X; /* ERROR when no character variables in the data set */
close;
   ERROR: No character variables in the data set.

There are at least two ways to handle this situation:

  1. In both Base SAS and SAS/IML, you can use dictionary tables to determine in advance which data sets contain at least one character variable. You can then read only those data set.
  2. In SAS/IML, you can read all variables into a table, then extract the character variables into a matrix for further processing.

Of course, the same ideas apply if you want to read only numeric variables and you encounter a data set that does not contain any numeric variables.

Use DICTIONARY tables to find information about your SAS session

If you have ever been to a SAS conference, you know that DICTIONARY tables are a favorite topic for SAS programmers. DICTIONARY tables are read-only tables that provide information about the state of the SAS session, including libraries, data sets, variables, and system options. You can access them directly by using PROC SQL. If you want to access the information in the DATA steps or other procedures (like PROC IML), you can use special data views in SASHELP. In particular, the Sashelp.VColumn view provides information about variables in SAS data set and is often used to find data sets that contain certain variable names. (See the references at the end of this article for more information about DICTIONARY tables.)

The following SAS/IML program uses the Sashelp.VColumn to find out which data sets contain at least one character variable:

proc iml;
/* Solution 1: Use dictionary table sashelp.vcolumn */
/* Find data sets in WORK that have AT LEAST ONE character variable */
use sashelp.vcolumn(where=(libname="WORK" & memtype='DATA' & type='char'); /* read only CHAR variables */
   read all var {memname name};  /* memname=data set name; name=name of character variable */
close;
 
/* loop over data sets. If a set contains at least one character variable, process it */
dsName = {'AllNum' 'AllChar' 'Mixed'};       /* names of potential data sets */
do i = 1 to ncol(dsName);
   idx = loc(memname = upcase(dsName[i]));   /* is data set on the has-character-variable list? */
   /* for demo, print whether data set has character variables */
   msg = "The data set " + (dsName[i]) + " contains " + 
          char(ncol(idx)) + " character variables.";
   print msg;
 
   if ncol(idx)>0 then do;            /* the data set exists and has character vars */
      charVars = name[idx];           /* get the names of the character vars */
      use (dsName[i]);                /* open the data set for reading */
      read all var charVars into X;   /* read character variables (always succeeds) */
      close;
      /* process the data */
   end;
end;

The output shows that you can use the DICTIONARY tables to determine which data sets have at least one character variable. You can then use the USE/READ statements in PROC IML to read the character variables and process the data however you wish. As mentioned previously, this technique can also be used in PROC SQL and the DATA step.

Use SAS/IML tables to find character variables

The previous section is very efficient because only character variables are ever read into SAS/IML matrices. However, there might be situations when you want to process character variables (if they exist) and then later process numerical variables (if they exist). Although a SAS/IML matrix contains only one data type (either all numeric or all character), you can read mixed-type data into a SAS/IML table, which supports both numeric and character variables. You can then use the TableIsVarNumeric function to generate a binary indicator variable that tells you which variables in the data are numeric and which are character, as follows:

/* Solution 2: Read all data into a table. Use the TableIsVarNumeric function to determine
   which variables are numeric and which are character. */
dsName = {'AllNum' 'AllChar' 'Mixed'};             /* names of potential data sets */
do i = 1 to ncol(dsName);                          /* for each data set... */
   T = TableCreateFromDataset("WORK", dsName[i]);  /* read all variables into a table */
   numerInd = TableIsVarNumeric(T);                /* binary indicator vector for numeric vars */
   charInd = ^numerInd;                            /* binary indicator vector for character vars */
   numCharVars = sum(charInd);                     /* count of character variables in this data set */
   msg = "The data set " + (dsName[i]) + " contains " + 
         char(numCharVars) + " character variables.";
   print msg;
   if numCharVars > 0 then do;
      X = TableGetVarData(T, loc(charInd));        /* extract the character variables into X */
      /* process the data */
   end;
   /* optionally process the numeric data */
   numNumerVars = sum(numerInd);                   /* count of numeric variables in this data set */
   /* etc */
end;

The output is identical to the output in the previous section.

Summary

In summary, this article discusses a programmer who wants to iterate over many SAS data sets and process only character variables. However, some of the data sets do not have any character variables! This article shows two methods for dealing with this situation: DICTIONARY tables (available through Sashelp views) or SAS/IML tables. The first method is also available in Base SAS.

Of course, you can also use this trick to read all numeric variables when some of the data sets might not have any numeric variable. I've previously written about how to read all numeric variables into a SAS/IML matrix by using the _ALL_ keyword. If the data set contains both numeric and character variables, then only the numeric variables are read.

References

The following resources provide more information about DICTIONARY tables in SAS:

The post How to detect SAS data sets that contain (or do not contain) character variables appeared first on The DO Loop.

3月 062019
 

conditionally terminating a SAS batch flow process in UNIX/LinuxIn automated production (or business operations) environments, we often run SAS job flows in batch mode and on schedule. SAS job flow is a collection of several inter-dependent SAS programs executed as a single process.

In my earlier posts, Running SAS programs in batch under Unix/Linux and Let SAS write batch scripts for you, I described how you can run SAS programs in batch mode by creating UNIX/Linux scripts that in turn incorporate other scripts invocations.

In this scenario you can run multiple SAS programs sequentially or in parallel, all while having a single root script kicked off on schedule. The whole SAS processing flow runs like a chain reaction.

Why and when to stop SAS batch flow process

However, sometimes we need to automatically stop and terminate that chain job flow execution if certain criteria are met (or not met) in a program of that process flow.
Let’s say our first job in a batch flow is a data preparation step (ETL) where we extract data tables from a database and prepare them for further processing. The rest of the batch process is dependent on successful completion of that critical first job. The process is kicked off at 3:00 a.m. daily, however, sometimes we run into a situation when the database connection is unavailable, or the database itself is not finished refreshing, or something else happens resulting in the ETL program completing with ERRORs.

This failure means that our data has not updated properly and there is no reason to continue running the remainder of the job flow process as it might lead to undesired or even disastrous consequences. In this situation we want to automatically terminate the flow execution and send an e-mail notification to the process owners and/or SAS administrators informing them about the mishap.

How to stop SAS batch flow process in UNIX/Linux

Suppose, we run the following main.sh script on UNIX/Linux:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

The etl.sh script runs the SAS ETL process as follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/etl/etl.sas"
logname="/sas/code/etl/etl_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

We want to run tail.sh shell script (which itself runs multiple other scripts) only if etl.sas program completes successfully, that is if SAS ETL process etl.sas that is run by etl.sh completes with no ERRORs or WARNINGs. Otherwise, we want to terminate the main.sh script and do not run the rest of the processing flow.

To do this, we re-write our main.sh script as:

 
#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -eq 0 ]
   then
      #2 run the rest of processing flow
      /sas/code/processing/tail.sh
fi

In this code, we use a special shell script variable ($? for the Bourne and Korn shells, $STATUS for the C shell) to capture the exit status code of the previously executed OS command, /sas/code/etl/etl.sh:

exitcode=$?

Then the optional echo command just prints the captured value of that status for our information.

Every UNIX/Linux command executed by the shell script or user has an exit status represented by an integer number in the range of 0-255. The exit code of 0 means the command executed successfully without any errors; a non-zero value means the command was a failure.

SAS System plays nicely with the UNIX/Linux Operating System. According to the SAS documentation $? for the Bourne and Korn shells, and $STATUS for the C shell.) A value of 0 indicates successful termination. For additional flexibility, SAS’ Condition Exit Status Code All steps terminated normally 0 SAS issued WARNINGs 1 SAS issued ERRORs 2 User issued ABORT statement 3 User issued ABORT RETURN statement 4 User issued ABORT ABEND statement 5 SAS could not initialize because of a severe error 6 User issued ABORT RETURN - n statement n User issued ABORT ABEND - n statement n

Since our etl.sh script executes SAS code etl.sas, the exit status code is passed by the SAS System to etl.sh and consequently to our main.sh shell script.

Then, in the main.sh script we check if that exit code equals to 0 and then and only then run the remaining flow by executing the tail.sh shell script. Otherwise, we skip tail.sh and exit from the main.sh script reaching its end.

Alternatively, the main.sh script can be implemented with an explicit exit as follows:

#!/bin/sh
 
#1 extract data from a database
/sas/code/etl/etl.sh
 
exitcode=$?
echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)"
 
if [ $exitcode -ne 0 ]
   then exit
fi
 
#2 run the rest of processing flow
/sas/code/processing/tail.sh

In this shell script code example, we check the exit return code value, and if it is NOT equal to 0, then we explicitly terminate the main.sh shell script using exit command which gets us out of the script immediately without executing the subsequent commands. In this case, our #2 command invoking tail.sh script never gets executed that effectively stops the batch flow process.

If you also need to automatically send an e-mail notification to the designated people about the failed batch flow process, you can do it in a separate SAS job that runs right before exit command. Then the if-statement will look something like this:

 
if [ $exitcode -ne 0 ]
   then
      # send an email and exit
      /sas/code/etl/email_etl_failure.sh
      exit
fi

That is immediately after the email is sent, the shell script and the whole batch flow process gets terminated by the exit command; no shell script commands beyond that if-statement will be executed.

A word of caution

Be extra careful if you use the special script variable $? directly in a script's logical expression, without assigning it to an interim variable. For example, you could use the following script command sequence:

/sas/code/etl/etl.sh
if [ $? -ne 0 ]
. . .

However, let’s say you insert another script command between them, for example:

/sas/code/etl/etl.sh
echo "Status=$? (0=SUCCESS,1=WARNING,2=ERROR)"
if [ $? -ne 0 ]
. . .

Then the $? variable in the if [ $? -ne 0 ] statement will have the value of the previous echo command, not the /stas/code/etl/etl.sh command as you might imply.

Hence, I suggest capturing the $? value in an interim variable (e.g. exitcode=$?) right after the command, exit code of which you are going to inspect, and then reference that interim variable (as $exitcode) in your subsequent script statements. That will save you from trouble of inadvertently referring to a wrong exit code when you insert some additional commands during your script development.

Your thoughts

What do you think about this approach? Did you find this blog post useful? Did you ever need to terminate your batch job flow? How did you go about it? Please share with us.

How to conditionally terminate a SAS batch flow process in UNIX/Linux was published on SAS Users.

2月 132019
 

SAS has worked with our exam delivery partners to integrate a live lab into an exam, which can be delivered anywhere, anytime, on-demand.

The post New Performance-Based Certification: Write SAS Code During Your Exam appeared first on SAS Learning Post.

2月 082019
 

Since we added the new "Recommended by SAS" widget in the SAS Support Communities, I often find myself diverted to topics that I probably would not have found otherwise. This is how I landed on this question and solution from 2011 -- "How to convert 5ft. 4in. (Char) into inches (Num)". While the question was deftly answered by my friend (and SAS partner) Jan K. from the Netherlands, the topic inspired me to take it a step further here.

Jan began his response by throwing a little shade on the USA:

Short of moving to a country that has a decent metric system in place, I suggest using a regular expression.

On behalf of my nation I just want say that, for the record, we tried. But we did not get very far with our metrication, so we are stuck with the imperial system for most non-scientific endeavors.

Matching patterns with a regular expression

Regular expressions are a powerful method for finding specific patterns in text. The syntax of regular expressions can be a challenge for those getting started, but once you've solved a few pattern-recognition problems by using them, you'll never go back to your old methods.

Beginning with the solution offered by Jan, I extended this program to read in a "ft. in." measurement, convert to the component number values, express the total value in inches, and then convert the measurement to centimeters. I know that even with my changes, we can think of patterns that might not be matched. But allow me to describe the updates:

Here's my program, followed by the result:

data measure;
 length 
     original $ 25
     feet 8 inches 8 
     total_inches 8 total_cm 8;
 /* constant regex is parsed just once */
 re = prxparse('/(\d*)ft.(\s*)((\d?\.?\d?)in.)?/'); 
 input;
 original = _infile_;
 if prxmatch(re, original) then do;
  feet =   input ( prxposn(re, 1, original), best12.);
  inches = input ( prxposn(re, 4, original), best12.);
  if missing(inches) and not missing(feet) then inches=0;
 end;
 else 
   original = "[NO MATCH] " || original;
 total_inches = (feet*12) + inches;
 total_cm = total_inches * 2.54;
 drop re;
cards;
5ft. 4in.
4ft 0in.
6ft. 10in.
3ft.2in.
4ft.
6ft.     1.5in.
20ft. 11in.
25ft. 6.5in.
Soooooo big
;

Other tools to help with regular expressions

The Internet offers a plethora of online tools to help developers build and test regular expression syntax. Here's a screenshot from RegExr.com, which I used to test some aspects of my program.

Tools like these provide wonderful insight into the capture groups and regex directives that will influence the pattern matching. They are part tutorial, part workbench for regex crafters at all levels.

Many of these tools also include community contributions for matching common patterns. For example, many of us often need to match/parse data with e-mail addresses, phone numbers, and other tokens as data fields. Sites like RegExr.com include the syntax for many of these common patterns that you can copy and use in your SAS programs.

See Also

The post Convert a text-based measurement to a number in SAS appeared first on The SAS Dummy.

1月 182019
 

It seems that everyone knows about GitHub -- the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) -- but now we all use it for all types of projects.

There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.

Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work -- including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:

  • the Base SAS programming language, via a collection of SAS functions.
  • SAS Data Integration Studio, via a new source control plugin
  • SAS Studio (experimental in v3.8)

You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.

SAS functions for Git

Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of "GITFN_" (signifying "Git fun!", I assume). Here's a partial list:

GITFN_CLONE  Clones a Git repository (for example, from GitHub) into a directory on the SAS server.
GITFN_COMMIT  Commits staged files to the local repository
GITFN_DIFF Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository.
GITFN_PUSH  Pushes the committed files in the local repository to the remote repository.
GITFN_NEW_BRANCH  Creates a Git branch

 

The function names make sense if you're familiar with Git lingo. If you're new to Git, you'll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)

You can

data _null_;
 version = gitfn_version();
 put version=;             
 
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "c:\Projects\sas-dummy-blog");
 put rc=;
run;

In one line, this function fetches an entire collection of code files from your source control system. Here's a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try -- here's the code that will be pulled/run. It even works from SAS University Edition.)

options dlcreatedir;
%let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog;
libname repo "&repoPath.";
libname repo clear;
 
/* Fetch latest code from GitHub */
data _null_;
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "&repoPath.");
 put rc=;
run;
 
/* run the code in this session */
%include "&repoPath./rng_example_thanos.sas";

You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results -- whatever you need to keep and version.

Using Git in SAS Data Integration Studio

SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren't media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.

Example of Git in SAS DI Studio

See the documentation for details:

Read more about setup and use in the available here as part of our "Custom Tasks Tuesday" series.

Using Git in SAS Enterprise Guide

This isn't new, but I'll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.

program history

If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here's an important tip. Change your project file properties to "Use paths relative to the project for programs and importable files." You'll find this checkbox in File->Project Properties.

With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.

The post Using built-in Git operations in SAS appeared first on The SAS Dummy.