sas programming

7月 172019
 

Problem solving: thinking inside the box

Have you ever tried to pass comma-delimited values to SAS macro or to a SAS macro function? How can SAS distinguish commas separating parameters or arguments from commas separating parts of the values?

Passing comma-delimited value as an argument to a SAS macro function

Let’s say you want to extract the first word from the following string of characters (these words represent column names in the SASHELP.CARS data table):

make, model, type, origin

If you run the following code:

%let firstvar = %scan(make, model, type, origin, 1);

you get is the following ERROR in your SAS log:

ERROR: Macro function %SCAN has too many arguments.

That is because %scan macro function sees and treats those make, model, type and origin as arguments since commas between them are interpreted as argument separators.

Even if you “hide” your comma-delimited value within a macro variable, it still won’t do any good since the macro variable gets resolved during macro compilation before being passed on to a macro or macro function for execution.

%let mylist = make, model, type, origin;
%let firstvar = %scan(&mylist, 1);

You will still get the same ERROR:

ERROR: Macro function %SCAN has too many arguments.

Passing comma-delimited value as a parameter to a SAS macro

Try submitting the following code that passes your macro variable value to a SAS macro as a parameter:

%let mylist = make, model, type, origin;
%macro subset(dsname=, varlist=);
   proc sql;
      select &varlist
      from &dsname;
   quit;
%mend subset;
%subset(dsname=SASHELP.CARS, varlist=&mylist)

You will get another version of the SAS log ERROR:

ERROR: All positional parameters must precede keyword parameters.
NOTE: Line generated by the macro variable "MYLIST".
1                 type, origin
                  ----
                  180
ERROR 180-322: Statement is not valid or it is used out of proper order.

In this case, macro %subset gets as confused as the %scan function above because your macro variable will get resolved during macro compilation, and SAS macro processor will see the macro invocation as:

%subset(dsname=SASHELP.CARS, varlist=make, model, type, origin)

treating each comma as a parameter separator.

All this confusion happens because SAS functions’ arguments and SAS macros’ parameters use commas as their separators, while resolved macro variables introduce their own values’ comma delimiters into the functions/macros constructs’ picture, thus wreaking havoc on your SAS program.

It’s time for a vacation

But don’t panic! To fight that chaos, you need to take a vacation. Not a stay-home, do-nothing vacation, but some serious vacation, with faraway destination and travel arrangements. While real vacation is preferable, an imaginary one would do it too. I mean to start fighting the mess with comma-separated values, pick your destination, book your hotel and flight, and start packing your stuff.

Do you have a “vacation items list”? In my family, we have an individual vacation list for every family member. How many items do you usually take with you? Ten, twenty, a hundred?

Regardless, you don’t show up at the airport checkpoint with a pile of your vacation items. That would’ve been too messy. I don’t think you would be even allowed boarding with an unpacked heap of your stuff. You come to an airport neatly rolling a single item that is called a suitcase. Well, I suppose that some of you may have two of them, but I can’t imagine more than that.

You only started your fantasy vacation, you haven’t even checked in to your flight, but you have already have a solution in your sight, a perfect combine-and-conquer solution for passing comma-delimited values. Even if you have not yet realized that it’s in your plain view.

Thinking inside the box

Forget about “thinking outside the box” metaphor. You can’t solve all your problems with a single strategy. Sometimes, you need to turn your thinking on its head to solve, or even to see the problem.

As for your airport check-in, instead of thinking outside the box, you thought “inside the box” and brought your many items “boxed” as a single item – a suitcase. A container, in a broader sense.

That is exactly how we are going to approach our comma-delimited lists problem. We are going to check them in to a macro or a macro function as a single, boxed item. Just like this:
Passing a comma-separated value to SAS macro or SAS macro function
Or like this:
passing SAS macro variable with comma-separated value to SAS macro or SAS macro function

Not surprisingly, SAS macro language provides a variety of these wonder boxes for many special occasions collectively known as macro quoting functions. Personally, I would prefer calling them “macro masking functions,” as they have nothing to do with “quoting” per se and have everything to do with masking various characters during macro compilation or macro processing. But that is what “macro quoting” means – masking, boxing, - similar to “quoting” a character string to make it a single entity.

Different macro quoting functions mask different special characters (+ - , / ; = etc.) and mnemonics (AND OR GT EQ etc.) so that the macro facility interprets them as text instead of as language symbols.

Here are all 7 SAS macro quoting functions, two of which work at macro compilation - %STR() and %NRSTR(), while other 5 work at macro execution - %QUOTE() and %NRQUOTE(), %BQUOTE() and %NRBQUOTE(), and %SUPERQ().

You may look up what symbols they mask and the timing they apply (macro compilation vs. macro execution) in this macro quoting functions summary. You may also want to look at the following cheat sheet: Deciding When to Use a Macro Quoting Function and Which Function to Use.

As general rule of thumb, use macro quoting functions at compilation time when you mask text constants - (make, model, type, origin); use macro quoting functions at execution time when you mask macro or macro variable references containing & or % - (&mylist).

NOTE: There are many other SAS macro functions that besides their main role also perform macro quoting, e.g. %QSCAN(), %QSUBSTR() and others; they all start with %Q.

Masking commas within a comma-delimited value passed as an argument or a parameter

It turns out that to mask (or to “box”) comma-separated values in a macro function or a SAS macro, any macro quoting function will work. In this case I would suggest using the simplest (and shortest) %STR(). %STR() applies during macro compilation and serves as a perfect “box” for our comma-delimited values to hide (mask) commas to receiving macro function or a macro does not confuse them with its own commas separating arguments / parameters.

With it we can re-write our above examples as:

%let firstvar = %scan(%str(make, model, type, origin), 1);
%put &=firstvar;

SAS log will produce exactly what we expected:

FIRSTVAR=make

Similarly, we can call the above SAS macro as:

%subset(dsname=SASHELP.CARS, varlist=%str(make, model, type, origin) )

It will run without ERRORs and produce a print of the SASHELP.CARS data table with 4 columns specified by the varlist parameter value:

SAS output table as a result of macro run

Masking commas within a macro variable value passed as an argument or parameter

When you assign a comma-delimited list as a value to a macro variable, we want to mask commas within the resolved value during execution. Any of the execution time macro quoting functions will mask comma.

Again, in case of multiple possibilities I would use the shortest one - %QUOTE().

With it we can re-write our above examples as:

%let mylist = make, model, type, origin;
 
%let firstvar = %scan(%quote(&mylist), 1);
 
%subset(dsname=SASHELP.CARS, varlist=%quote(&mylist))

But just keep in mind that the remaining 4 execution time macro quoting functions - %NRQUOTE(), %BQUOTE(), %NRBQUOTE() and %SUPERQ() - will work too.

NOTE: The syntax of the %SUPERQ() function is quite different from the rest of the pack. The %SUPERQ() macro function takes as its argument either a macro variable name without an ampersand or a macro text expression that yields a macro variable name without an ampersand.

Get it going

I realize that macro quoting is not a trivial matter. That is why I attempted to explain its concept on a very simple yet powerful use case. Hope you will expand on this to empower your SAS coding skills.

Passing comma-delimited values into SAS macros and macro functions was published on SAS Users.

7月 102019
 

Posted on behalf of SAS Press author Derek Morgan.


I was sitting in a model railroad club meeting when one of our more enthusiastic young members said, "Wouldn't it be cool if we could make a computer simulation, with trains going between stations and all. We could have cars and engines assigned to each train and timetables and…"

So, I thought to myself, “Timetables… I bet SAS can do that easily… sounds like something fun for Mr. Dates and Times."

As it turns out, the only easy part of creating a timetable is calculating the time. SAS handles the concept of elapsed time smoothly. It’s still addition and subtraction, which is the basis of how dates and times work in SAS. If a train starts at 6:00 PM (64,800 seconds of the day) and arrives at its destination 12 hours (43,200 seconds) later, it arrives at 6:00 AM the next day. The math is start time+duration=end time (108,000 seconds,) which is 6:00 AM the next day. It doesn’t matter which day, that train is always scheduled to arrive at 6:00 AM, 12 hours from when it left.

It got a lot more complicated when the group grabbed onto the idea. One of the things they wanted to do was to add or delete trains and adjust the timing so multiple trains don’t run over the same track at the same time. This wouldn’t be that difficult in SAS; just create an interactive application, but… I’m the only one who has SAS. So how do I communicate my SAS database with the “outside world”? The answer was Microsoft Excel, and this is where it gets thorny.

It’s easy enough to send SAS data to Excel using ODS EXCEL and PROC REPORT, but how could I get Excel to allow the club to manipulate the data I sent?
I used the COMPUTE block in PROC REPORT to display a formula for every visible train column. I duplicated the original columns (with corresponding names to keep it all straight) and hid them in the same spreadsheet. The EXCEL formula code is in line 8.

Compute Block Code:

I also added three rows to the dataset at the top. The first contains the original start time for each train, the second contains an offset, which is always zero in the beginning, while the third row was blank (and formatted with a black background) to separate it from the actual schedule.


Figure 1: Schedule Adjustment File

The users can change the offset to change the starting time of a train (Column C, Figure 2.) The formula in the visible columns adds the offset to the value in each cell of the corresponding hidden column (as long as it isn’t blank.) You can’t simply add the offset to the value of the visible cell, because that would be a circular reference.

The next problem was moving a train to an earlier starting time, because Excel has no concept of negative time (or date—a date prior to the Excel reference date of January 1, 1900 will be a character value in Excel and cause your entire column to be imported into SAS as character data.) Similarly, you can’t enter -1:00 as an offset to move the starting time of our 5:35 AM train to 4:35 AM. Excel will translate “-1:00” as a character value and that will cause a calculation error in Excel. In order to move that train to 4:35 AM, you have to add 23 hours to the original starting time (Column D, Figure 2.)


Figure 2: Adjusting Train Schedules

After the users adjust the schedules, it’s time to return our Excel data to SAS, which creates more challenges. In the screenshot above, T534_LOC is the identifier of a specific train, and the timetable is kept in SAS time values. Unfortunately, PROC IMPORT using DBMS=XLSX brings the train columns into SAS as character data. T534_LOC also imports as the actual Excel value, time as a fraction of a day.


Figure 3: How the Schedule Adjustment File Imports to SAS

While I can fix that by converting the character data to numeric and multiplying by 86,400, I still need the original column name of T534_LOC for the simulation, so I would have to rename each character column and output the converted data to the original column name. There are currently 146 trains spread across 12 files, and that is a lot of work for something that was supposed to be easy! Needless to say, this “little” side project, like most model railroads, is still in progress. However, this exercise in moving time data between Microsoft Excel and SAS gave me even more appreciation for the way SAS handles date and time data.

Figure 4 is a partial sample of the finished timetable file, generated as an RTF file using SAS. The data for trains 534 and 536 are from the spreadsheet in Figure 1.


Figure 4: Partial Sample Timetable

Want to learn more about how to use and manipulate dates, times, and datetimes in SAS? You'll find the answers to these questions and much more in my book The Essential Guide to SAS Dates and Times, Second Edition. Updated for SAS 9.4, with additional functions, formats, and capabilities, the Second Edition has a new chapter dedicated to the ISO 8601 standard and the formats and functions that are new to SAS, including how SAS works with Universal Coordinated Time (UTC). Chapter 1 is available as a free preview here.

For updates on new SAS Press books and great discounts subscribe to the SAS Press New Book Newsletter.

SAS Press Author Derek Morgan on Timetables and Model Trains was published on SAS Users.

7月 032019
 

An informat helps you read data into a SAS data set. SAS supports more than 100 informats. The most common informats are related to dates and times and make it easy to read an input string such as 28JAN2001 and convert it to a SAS date such as 15003. Yet no matter how many built-in informats SAS provides, programmers will inevitably encounter data that are difficult to read. That is why PROC FORMAT in SAS enables you to create a custom informat. A custom informat enables you to process input strings in whatever way is most convenient for your application. This article shows how to create a custom informat in SAS in three easy steps:

  1. Write DATA step code that parses an input string and converts it into a form that is more convenient to work with.
  2. Use PROC FCMP to define a function that implements the same string-to-value conversion. Often you can cut and paste from your DATA step program into PROC FCMP.
  3. Use the INVALUE statement in PROC FORMAT to define a custom informat.

Technically, you can skip Step 1, but I find it much easier to develop and debug a program in the DATA step because I can use the PUT statement or PROC PRINT to look at intermediate results.

A motivating example: Convert durations into seconds

While writing a recent article about how to visualize race times, I researched ways to read times into SAS. For time-of-day data, you can use the TIMEw. or ANYDTTME. informats to read a string in the form HH:MM:SS and convert it to a SAS time value. For example, the time "1:23:45" (AM) is converted to the number of seconds since midnight, which is 5025.

Race times are slightly different because they represents durations of time. Short races such as a 5K are typically recorded in minutes, and sprints are recorded in seconds. Both times contain fractions of seconds, so the data is typically represented as SS.fff or MM:SS.ff. You can use the HHMMSS. informat to read these data and store them as seconds. The following DATA step reads a sprint time (12.345 s) and a 5K time (18:33.24) into a SAS time variable. You can use the HOUR, MINUTE, and SECOND functions to make sure the data are read correctly:

/* You can use the HHMMSS informat to read durations into SAS */
data RaceTimes;
input Time HHMMSS.;               /* read duration = length of time */
format Time TIME10.3;             /* there is no HHMMSS. format, so use TIME. format */
rawTime = Time;
hours = hour(Time);               /* check the results by print hour, min, and sec */
minutes = minute(Time);
seconds = second(Time);
datalines;
12.345
18:33.24 
19:23
;
 
proc print noobs; run;
Use HHMMSS informat to read race times into SAS

The only problem with the HHMMSS. informat is that it reads the string 19:23 as "19 hours, 23 minutes," whereas I prefer "19 minutes, 23 seconds." (It's an ambiguous case. For clarity, the string could be entered as 0:19:23 or 19.23.0.) If a SAS informat does not interpret strings the way you prefer, you can write your own SAS informat. The main challenge is writing DATA step code that parses strings such as "12.345" and "18:33.24" and computes the number of seconds that each string represents. The next section presents a DATA step program that converts these strings to seconds. You can use the program to define a new informat that parses race-time data.

Step 1: Write a DATA step that converts a string into a number

This article is about how to create an informat, not about how to parse strings in SAS, so I'll let the comments in the program speak for themselves. Basically, the following program reads from the right side of a string until it encounters a colon (or the beginning of the string). It extracts the string following the colon and converts it to a number, which is the number of seconds. It repeats the process to find the number of minutes (if any) and the number of hours (if any). It then converts the sum of the hours, minutes, and seconds to seconds and stores it in a variable named Time.

/* Parse strings. Obtain time, in seconds, for durations written in the following forms:
          SS.fff
       MM:SS.fff
   HHH:MM:SS.fff
*/
data RaceTimes(drop= i j z);
length str $13;                     /* HHH.MM.SS.fff */
input str;                          /* read raw string */
s=0; m=0; h=0;
/* read seconds :SS.fff */
i = find(str, ':', -length(str)+1); /* find colon from right */
z = substr(str, i+1);               /* extract seconds */
s = input(z, best6.);               /* convert to number */
/* read minutes :MM: */
if i > 0 then do;
   j = find(str, ':', -(i-1));      /* find colon from right */
   z = substr(str, j+1, i-j-1);     /* extract minutes */
   m = input(z, best6.);            /* convert to number */
   /* read hours HHH:, if any */
   if j > 0 then do;
      z = substr(str, 1, j-1);      /* extract hours */
      h = input(z, best6.);         /* convert to number */
   end;
end;
Time = 3600*h + 60*m + s;           /* convert to seconds */
datalines;
   11.345
 0:11.345
 2:39.66 
18:33.24 
19:23
49:02.89 
72:02.89 
1:12:02.89
100:71:00
;
 
proc print data=RaceTimes noobs; 
   format time 10.3;
run;
Convert race times into seconds with SAS

The DATA step parses each string and converts it to a time, in seconds. The output shows the results. This table is the same as you would get by using the HHMMSS. informat to read the data, except that the string 19:23 is now interpreted as "19 minutes, 23 seconds."

Notice that you can see the intermediate expressions such as s, m, and h. This is useful when you are trying to debug the program.

Step 2: Use PROC FCMP to define a function

You can use PROC FCMP to define a custom function in Base SAS. This example defines a function named Duration, which takes a string as an input argument. Copy and paste the DATA step code and return the Time variable. You can use the OPTIONS CMPLIB= system option to tell SAS where to find the Duration function. You can then call the new function from the DATA step, as follows:

proc fcmp outlib=work.functions.MathFuncs; 
function Duration(str $); 
   s=0; m=0; h=0;
   /* read seconds :SS.fff */
   i = find(str, ':', -length(str)+1); /* find colon from right */
   z = substr(str, i+1);               /* extract seconds */
   s = input(z, best6.);               /* convert to number */
   /* read minutes :MM: */
   if i > 0 then do;
      j = find(str, ':', -(i-1));      /* find colon from right */
      z = substr(str, j+1, i-j-1);     /* extract minutes */
      m = input(z, best6.);            /* convert to number */
      /* read hours, if any */
      if j > 0 then do;
         z = substr(str, 1, j-1);      /* extract hours */
         h = input(z, best6.);         /* convert to number */
      end;
   end;
   Time = 3600*h + 60*m + s;           /* convert to seconds */
   return( Time );
endsub; 
quit; 
 
options cmplib=work.functions; /* where to find Duration() function */
data RaceTimes2;
length str $13;
input str;
/* call Duration function to convert string to a time value */
Time = Duration(str);          
datalines;
   11.345
 0:11.345
 2:39.66 
18:33.24 
19:23
49:02.89 
72:02.89 
1:12:02.89
100:71:00 
;
Convert race times seconds with SAS

Step 3: Use PROC FORMAT to define a custom informat

I have written several articles about how to use PROC FORMAT to create your own custom formats in SAS. The process of creating an informat is similar, but because parsing strings is complicated it is often best to write the code into an FCMP function and then define the informat in terms of the function. This is done in the following call to PROC FORMAT, which uses the INVALUE statement to define an informat named DURATION.:

/* define the DURATION. informat */
proc format; 
   invalue duration (default=13) other=[Duration()]; 
run; 
 
/* use new DURATION informat to read duration times into SAS */
data RaceTimes3;
input Time : DURATION.;
datalines;
   11.345
 0:11.345
 2:39.66 
18:33.24 
19:23
49:02.89 
72:02.89
1:12:02.89
100:71:00 
;
 
proc print data=RaceTimes3; 
   format Time 10.3;
run;
Use custom informat to read race times into SAS

Success! The program uses the DURATION. informat to read strings of the form "SS.fff", "MM:SS.fff", and "HHH:MM:SS.fff". It converts the strings to seconds. You can use the statement FORMAT Time TIME10.3 in PROC PRINT if you want to format the seconds as hours, minutes, and seconds.

In summary, you can use the INVALUE statement in PROC FORMAT to create a custom informat in SAS. An informat enables you to read a string or pattern and convert it to a value that is easier to work with. Although SAS supports the HHMMSS. informat for reading strings that represent durations of time, I chose to create my own informat that converts race times to seconds. You can use PROC FCMP to define a function that parses a string and returns a numerical value. You can then use PROC FORMAT to define the informat, which calls the FCMP function.

The post Create a custom informat in SAS appeared first on The DO Loop.

7月 032019
 

One of my favorite parts of summer is a relaxing weekend by the pool. Summer is the time I get to finally catch up on my reading list, which has been building over the year. So, if expanding your knowledge is a goal of yours this summer, SAS Press has a shelf full of new titles for you to explore. To help navigate your selection we asked some of our authors what SAS books were on their reading lists for this summer?

Teresa Jade


Teresa Jade, co-author of SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, has already started The DS2 Procedure: SAS Programming Methods at Work by Peter Eberhardt. Teresa reports that the book “is a concise, well-written book with good examples. If you know a little bit about the SAS DATA step, then you can leverage what you know to more quickly get up to speed with DS2 and understand the differences and benefits.”
 
 
 

Derek Morgan

Derek Morgan, author of The Essential Guide to SAS® Dates and Times, Second Edition, tells us his go-to books this summer are Art Carpenter’s Complete Guide to the SAS® REPORT Procedure and Kirk Lafler's PROC SQL: Beyond the Basics Using SAS®, Third Edition. He also notes that he “learned how to use hash objects from Don Henderson’s Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study.”
 

Chris Holland

Chris Holland co-author of Implementing CDISC Using SAS®: An End-to-End Guide, Revised Second Edition, recommends Richard Zink’s JMP and SAS book, Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP® and SAS®, which describes how to improve efficiency while reducing costs in trials with centralized monitoring techniques.
 
 
 
 
 

And our recommendations this summer?

Download our two new free e-books which illustrate the features and capabilities of SAS® Viya®, and SAS® Visual Analytics on SAS® Viya®.

Want to be notified when new books become available? Sign up to receive information about new books delivered right to your inbox.

Summer reading – Book recommendations from SAS Press authors was published on SAS Users.

6月 262019
 

"There's a way to do it better - find it." - Thomas A. Edison

Finding a better SAS code

When it comes to SAS coding, this quote by Thomas A. Edison is my best advisor. Time permitting, I love finding better ways of implementing SAS code.

But what code feature means “better” – brevity, clarity or efficiency? It all depends on the purpose of your code. When code is to illustrate a coding concept or technique, clarity is a paramount. However, when processing large data volumes in near real-time, code efficiency becomes critical, not just a luxury or convenience. And brevity won’t hurt in either case. Ideally, your code should be a combination of all three features - brevity, clarity and efficiency.

Parsing a character string

In this blog post we will solve a problem of parsing a character string to find a position of n-th occurrence of a group of characters (substring) in that string.

The closest out-of-box solution to this problem is SAS’ FIND() function. Except this function searches only for a single/first instance of specified substring of characters within a character string. Close enough, and with some do-looping we can easily construct what we want.

After some internet and soul searching to find the Nth occurrence of a substring within a string, I came up with the following DATA STEP code snippet:

   p = 0;
   do i=1 to n until(p=0); 
      p = find(s, x, p+1);
   end;

Here, s is a text string (character variable) to be parsed; x is a character variable holding a group of characters that we are searching for within s; p is a position of x value found within s; n is an instance number.

If there is no n-th instance of x within s found, then the code returns p=0.

In this code, each do-loop iteration searches for x within s starting from position p+1 where p is position found in prior iteration: p = find(s,x,p+1);.

Notice, if there is no prior-to n instance of x within s, the do-loop ends prematurely, based on until(p=0) condition, thus cutting the number of loops to the minimal necessary.

Reverse string search

Since find() function allows for a string search in a reverse direction (from right to left) by making the third augment negative, the above code snippet can be easily modified to do just that: find Nth instance (from right to left) of a group of characters within a string. Here is how you can do that:

   p = length(s) + 1;
   do i=1 to n until(p=0); 
      p = find(s, x, -p+1);
   end;

The difference here is that we start from position length(s)+1 instead of 0, and each iteration searches substring x within string s starting from position –(p-1)=-p+1 from right to left.

Testing SAS code

You can run the following SAS code to test and see how these searches work:

data a;
   s='AB bhdf +BA s Ab fs ABC Nfm AB ';
   x='AB';
   n=3;
 
   /* from left to right */
   p = 0;
   do i=1 to n until(p=0); 
      p = find(s, x, p+1);
   end;
   put p=;
 
   /* from right to left */
   p = length(s) + 1;
   do i=1 to n until(p=0); 
      p = find(s, x, -p+1);
   end;
   put p=;
run;

FINDNTH() function

We can also combine the above left-to-right and right-to-left searches into a single user-defined SAS function by means of SAS Function Compiler (PROC FCMP) procedure:

proc fcmp outlib=sasuser.functions.findnth;
   function findnth(str $, sub $, n);
      p = ifn(n>=0,0,length(str)+1);
      do i=1 to abs(n) until(p=0);
         p = find(str,sub,sign(n)*p+1);
      end;
      return (p);
   endsub;
run;

We conveniently named it findnth() to match the Tableau FINDNTH(string, substring, occurrence) function that returns the position of the nth occurrence of substring within the specified string, where the occurrence argument defines n.

Except our findnth() function allows for both, positive (for left-to-right searches) as well as negative (for right-to-left searches) third argument while Tableau’s function only allows for left-to-right searches.

Here is an example of the findnth() function usage:

options cmplib=sasuser.functions;
data a;
   s='AB bhdf +BA s Ab fs ABC Nfm AB ';
   x='AB';
   n=3;
 
   /* from left to right */
   p=findnth(s,x,n);
   put p=;
 
   /* from right to left */
   p=findnth(s,x,-n);
   put p=;
run;

Using Perl regular expression

As an alternative solution I also implemented SAS code for finding n-th occurrence of a substring within a string using Perl regular expression (regex or prx):

data a;
   s='AB bhdf +BA s Ab fs ABC Nfm AB ';
   x='AB';
   n=3;
 
   /* using regex */
   xid = prxparse('/'||x||'/o');
   p = 0;
   do i=1 to n until(p=0);
      from = p + 1;
      call prxnext(xid, p + 1, length(s), s, p, len);
   end;
   put p=;
run;

However, efficiency benchmarking tests demonstrated that the above solutions using FIND() function or FINDNTH() SAS user-written function run roughly twice faster than this regex solution.

Challenge

Can you come up with an even better solution to the problem of finding Nth instance of a sub-string within a string? Please share your thoughts and solutions with us. Thomas A. Edison would have been proud of you!

Finding n-th instance of a substring within a string was published on SAS Users.

6月 062019
 

Want to learn SAS programming but worried about taking the plunge? Over at SAS Press, we are excited about an upcoming publication that introduces newbies to SAS in a peer-review instruction format we have found popular for the classroom. Professors Jim Blum and Jonathan Duggins have written Fundamentals of Programming in SAS using a spiral curriculum that builds upon topics introduced earlier and at its core, uses large-scale projects presented as case studies. To get ready for release, we interviewed our new authors on how their title will enhance our SAS Books collection and which of the existing SAS titles has had an impact on their lives!

What does your book bring to the SAS collection? Why is it needed?

Blum & Duggins: The book is probably unique in the sense that it is designed to serve as a classroom textbook, though it can also be used as a self-study guide. That also points to why we feel it is needed; there is no book designed for what we (and others) do in the classroom. As SAS programming is a broad topic, the goal of this text is to give a complete introduction of effective programming in Base SAS – covering topics such as working with data in various forms and data manipulation, creating a variety of tabular and visual summaries of data, and data validation and good programming practices.

The book pursues these learning objectives using large-scale projects presented as case studies. The intent of coupling the case-study approach with the introductory programming topics is to create a path for a SAS programming neophyte to evolve into an adept programmer by showing them how programmers use SAS, in practice, in a variety of contexts. The reader will gain the ability to author original code, debug pre-existing code, and evaluate the relative efficiencies of various approaches to the same problem using methods and examples supported by pedagogical theory. This makes the text an excellent companion to any SAS programming course.

What is your intended audience for the book?

Blum & Duggins: This text is intended for use in both undergraduate and graduate courses, without the need for previous exposure to SAS. However, we expect the book to be useful for anyone with an aptitude for programming and a desire to work with data, as a self-study guide to work through on their own. This includes individuals looking to learn SAS from scratch or experienced SAS programmers looking to brush up on some of the fundamental concepts. Very minimal statistical knowledge, such as elementary summary statistics (e.g. means and medians), would be needed. Additional knowledge (e.g. hypothesis testing or confidence intervals) could be beneficial but is not expected.

What SAS book(s) changed your life? How? And why?

Blum: I don’t know if this qualifies, but the SAS Programming I and SAS Programming II course notes fit this best. With those, and the courses, I actually became a SAS programmer instead of someone who just dabbled (and dabbled ineffectively). From there, many doors were opened for me professionally and, more importantly, I was able to start passing that knowledge along to students and open some doors for them. That experience also served as the basis for building future knowledge and passing it along, as well.

Duggins: I think the two SAS books that most changed my outlook on programming (which I guess has become most of my life, for better or worse) would either be The Essential PROC SQL Handbook for SAS Users by Katherine Prairie or Jane Eslinger's The SAS Programmer's PROC REPORT Handbook, because I read them at different times in my SAS programming career. Katherine's SQL book changed my outlook on programming because, until then, I had never done enough SQL to consistently consider it as a viable alternative to the DATA step. I had taken a course that taught a fair amount of SQL, but since I had much more experience with the DATA step and since that is what was emphasized in my doctoral program, I didn't use SQL all that often. However, after working through her book, I definitely added SQL to my programming arsenal. I think learning it, and then having to constantly evaluate whether the DATA step or SQL was better suited to my task, made me a better all-around programmer.

As for Jane's book - I read it much later after having used some PROC REPORT during my time as a biostatistician, but I really wasn't aware of how much could be done with it. I've also had the good fortune to meet Jane, and I think her personality comes through clearly - which makes that book even more enjoyable now than it was during my first read!

Read more

We at SAS Press are really excited to add this new release to our collection and will continue releasing teasers until its publication. For almost 30 years SAS Press has published books by SAS users for SAS users. Here is a free excerpt located on our Duggins' author page to whet your appetite. (Please know that this excerpt is an unedited draft and not the final content). Look out for news on this new publication, you will not want to miss it!

Want to find out more about SAS Press? For more about our books, subscribe to our newsletter. You’ll get all the latest news and exclusive newsletter discounts. Also, check out all our new SAS books at our online bookstore.

Interview with new SAS Press authors: Jim Blum and Jonathan Duggins was published on SAS Users.

6月 042019
 


Two sayings I’ve heard countless times throughout my life are “Work smarter, not harder,” and “Use the best tool for the job.” If you need to drive a nail, you pick up a hammer, not a wrench or a screwdriver. In the programming world, this could mean using an existing function library instead of writing your own or using an entirely different language because it’s more applicable to your problem. While that sounds good in practice, in the workplace you don’t always have that freedom.

So, what do you do when you’re given a hammer and told to fasten a screw? Or, like in the title of this article’s case, what do you do when you have Python functions you want to use in SAS?

Recently I was tasked with documenting an exciting new feature for SAS — the ability to call Python functions from within SAS. In this article I will highlight everything I’ve learned along the way to bring you up to speed on this powerful new tool.

PROC FCMP Python Objects

Starting with May 2019 release of SAS 9.4M6, the PROC FCMP procedure added support for submitting and executing functions written in Python from within a SAS session using the new Python object. If you’re unfamiliar with PROC FCMP, I’d suggest reading the documentation. In short, FCMP, or the SAS Function Compiler, enables users to write their own functions and subroutines that can then be called from just about anywhere a SAS function can be used in SAS. Users are not restricted to using Python only inside a PROC FCMP statement. You can create an FCMP function that calls Python code, and then call that FCMP function from the DATA step. You can also use one of the products or solutions that support Python objects including SAS High Performance Risk and SAS Model Implementation Platform.

The Why and How

So, what made SAS want to include this feature in our product? The scenario in mind we imagined when creating this feature was a customer who already had resources invested in Python modeling libraries but now wanted to integrate those libraries into their SAS environment. As much fun as it sounds to convert and validate thousands of lines of Python code into SAS code, wouldn’t it be nice if you could simply call Python functions from SAS? Whether you’re in the scenario above with massive amounts of Python code, or you’re simply more comfortable coding in Python, PROC FCMP is here to help you. Your Python code is submitted to a Python interpreter of your choice. Results are packaged into a Python tuple and brought back inside SAS for you to continue programming.

Programming in Two Languages at Once

So how do you program in SAS and Python at the same time? Depending on your installation of SAS, you may be ready to start, or there could be some additional environment setup you need to complete first. In either case, I recommend pulling up the Using PROC FCMP Python Objects documentation before we continue. The documentation outlines the addition of an output string that must be made to your Python code before it can be submitted from SAS. When you call a Python function from SAS, the return value(s) is stored in a SAS dictionary. If you’re unfamiliar with SAS dictionaries, you can read more about them here Dictionaries: Referencing a New PROC FCMP Data Type.

Getting Started

There are multiple methods to load your Python code into the Python object. In the code example below, I’ll use the SUBMIT INTO statement to create an embedded Python block and show you the basic framework needed to execute Python functions in SAS.

/* A basic example of using PROC FCMP to execute a Python function */
proc fcmp;
 
/* Declare Python object */
declare object py(python);
 
/* Create an embedded Python block to write your Python function */
submit into py;
def MyPythonFunction(arg1, arg2):
	"Output: ResultKey"
	Python_Out = arg1 * arg2
	return Python_Out
endsubmit;
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
rc = py.call("MyPythonFunction", 5, 10);
 
/* Store the result in a SAS variable and examine the value */
SAS_Out = py.results["ResultKey"];
put SAS_Out=;
run;

You can gather from this example that there are essentially five parts to using PROC FCMP Python objects in SAS:

  1. Declaring your Python object.
  2. Loading your Python code.
  3. Publishing your Python code to the interpreter.
  4. Executing your Python Code.
  5. Retrieving your results in SAS.

From the SAS side, those are all the pieces you need to get started importing your Python code. Now what about more complicated functions? What if you have working models made using thousands of lines and a variety of Python packages? You still use the same program structure as before. This time I’ll be using the INFILE method to import my Python function library by specifying the file path to the library. You can follow along in by copying my Python code into a .py file. The file, blackscholes.py, contains this code:

def internal_black_scholes_call(stockPrice, strikePrice, timeRemaining, volatility, rate):
    import numpy
    from scipy import stats
    import math
    if ((strikePrice != 0) and (volatility != 0)):
        d1 = (math.log(stockPrice/strikePrice) + (rate + (volatility**2)\
                       /  2) * timeRemaining) / (volatility*math.sqrt(timeRemaining))
        d2 = d1 - (volatility * math.sqrt(timeRemaining))
        callPrice = (stockPrice * stats.norm.cdf(d1)) - \
        (strikePrice * math.exp( (-rate) * timeRemaining) * stats.norm.cdf(d2))
    else:
        callPrice=0
    return (callPrice)
 
def black_scholes_call(stockPrice, strikePrice, timeRemaining, volatility, rate):
    "Output: optprice"
    import numpy
    from scipy import stats
    import math
    optPrice = internal_black_scholes_call(stockPrice, strikePrice,\
                                           timeRemaining, volatility, rate)
    callPrice = float(optPrice)
    return (callPrice,)

My example isn’t quite 1000 lines, but you can see the potential of having complex functions all callable inside SAS. In the next figure, I’ll call these Python functions from SAS.

(/*Using PROC FCMP to execute Python functions from a file */
proc fcmp;
 
/* Declare Python object */
declare object py(python);
 
/* Use the INFILE method to import Python code from a file */
rc = py.infile("C:\Users\PythonFiles\blackscholes.py");
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
rc = py.call("black_scholes_call", 132.58, 137, 0.041095, .2882, .0222);
 
/* Store the result in a SAS variable and examine the value */
SAS_Out = py.results["optprice"];
put SAS_Out=;
run;

Calling Python Functions from the DATA step

You can take this a step further and make it useable in the DATA step-outside of a PROC FCMP statement. We can use our program from the previous example as a starting point. From there, we just need to wrap the inner Python function call in an outer FCMP function. This function within a function design may be giving you flashbacks of Inception, but I promise you this exercise won’t leave you confused and questioning reality. Even if you’ve never used FCMP before, creating the outer function is straightforward.

/* Creating a PROC FCMP function that calls a Python function  */
proc fcmp outlib=work.myfuncs.pyfuncs;
 
/* Create the outer FCMP function */
/* These arguments are passed to the inner Python function */
function FCMP_blackscholescall(stockprice, strikeprice, timeremaining, volatility, rate);
 
/* Create the inner Python function call */
/* Declare Python object */
declare object py(python);
 
/* Use the INFILE method to import Python code from a file */
rc = py.infile("C:\Users\PythonFiles\blackscholes.py");
 
/* Publish the code to the Python interpreter */
rc = py.publish();
 
/* Call the Python function from SAS */
/* Since this the inner function, instead of values in the call           */
/* you will pass the outer FCMP function arguments to the Python function */
rc = py.call("black_scholes_call", stockprice, strikeprice, timeremaining, volatility, rate);
 
/* Store the inner function Python output in a SAS variable                              */
FCMP_out = py.results["optprice"];
 
/* Return the Python output as the output for outer FCMP function                        */
return(FCMP_out);
 
/* End the FCMP function                                                                 */
endsub;
run;
 
/* Specify the function library you want to call from                                    */
options cmplib=work.myfuncs;
 
/*Use the DATA step to call your FCMP function and examine the result                    */
data _null_;
   result = FCMP_blackscholescall(132.58, 137, 0.041095, .2882, .0222);
   put result=;
run;

With your Python function neatly tucked away inside your FCMP function, you can call it from the DATA step. You also effectively reduced the statements needed for future calls to the Python function from five to one by having an FCMP function ready to call.

Looking Forward

So now that you can use Python functions in SAS just like SAS functions, how are you going to explore using these two languages together? The PROC FCMP Python object expands the capabilities of SAS and by result improves you too as a SAS user. Depending on your experience level, completing a task in Python might be easier for you than completing that same task in SAS. Or you could be in the scenario I mentioned before where you have a major investment in Python and converting to SAS is non-trivial. In either case, PROC FCMP now has the capability to help you bridge that gap.

SAS or Python? Why not use both? Using Python functions inside SAS programs was published on SAS Users.

4月 112019
 

This blog post could be subtitled "To Catch a Thief" or maybe "Go ahead. Steal this blog. I dare you."* That's because I've used this technique several times to catch and report other web sites who lift the blog content from blogs.sas.com and present it as their own.

Syndicating blog content is an honorable practice, made possible by RSS feeds that virtually every blog platform supports. With syndicated feeds, your blog content can appear on someone else's site, but this always includes attribution and a link back to the original source. However, if you copy content from another website and publish it natively on your own web property, without attribution or citations...well, that's called plagiarism. And the Digital Millennium Copyright Act (DCMA) provides authors with recourse to have stolen content removed from infringing sites -- if you can establish that you're the rightful copyright owner.

Establishing ownership is a tedious task, especially when someone steals dozens or hundreds of articles. You must provide links to each example of infringing content, along with links to the original authorized content. Fortunately, as I've discussed before, I have ready access to the data about all 17,000+ blog posts that we've published at SAS (see How SAS uses SAS to Analyze SAS Blogs). In this article, I'll show you how I gathered that same information from the infringing websites so that I could file the DCMA "paperwork" almost automatically.

The complete programs from this article are available on GitHub.

Read a JSON feed using the JSON engine

In my experience, the people who steal our blog content don't splurge on fancy custom web sites. They tend to use free or low-cost web site platforms, and the most popular of these include WordPress (operated by Automattic) and Blogspot (operated by Google). Both of these platforms support API-like syndication using feeds.

Blogspot sites can generate article feeds in either XML or JSON. I prefer JSON when it's available, as I find that the JSON libname engine in SAS requires fewer "clues" in order to generate useful tables. (See documentation for the JSON engine.) While you can supply a JSON map file that tells SAS how to assemble your tables and data types, I find it just as easy to read the data as-is and post-process it to join the fields I need and convert data fields. (For an example that uses a JSON map, see Reading data with the SAS JSON libname engine.)

Since I don't want to draw attention to the specific infringing sites, I'll use an example of a popular (legitimate!) Blogspot site named "Maps Mania". If you're into data and maps (who isn't?) you might like their content. In this code I use PROC HTTP to fetch the RSS feed, using "alt=json" to request JSON format and "max-results=100" to retrieve a larger-than-default batch of published posts.

/* Read JSON feed into a local file. */
/* Use Blogspot parameters to get 100 posts at a time */
filename resp temp;
proc http
 url='https://googlemapsmania.blogspot.com/feeds/posts/default?alt=json&max-results=100'
 method="get"
 out=resp;
run;
 
libname rss json fileref=resp;

This JSON libname breaks the data into a series of tables that relate to each other via common keys.

RSS feed tables

With a little bit of exploration in SAS Enterprise Guide and the Query Builder, I was able to design a PROC SQL step to assemble just the fields and records I needed: post title and post URL.

/* Join the relevant feed entry items to make a single table */
/* with post titles and URLs */
proc sql;
   create table work.blogspot as 
   select t2._t as rss_title,
          t1.href as rss_href          
      from rss.entry_link t1
           inner join rss.entry_title t2 on (t1.ordinal_entry = t2.ordinal_entry)
      where t1.type = 'text/html' and t1.rel = 'alternate';
quit;
 
libname rss clear;

RSS output from blogspot

Read an XML feed using the XMLv2 engine

WordPress sites generate XML-based feeds by default. Site owners can install a WordPress plugin to generate JSON feeds as well, but most sites don't bother with that. Like the JSON feeds, the XML feed can contain many fields that relate to each other. I find that with XML, the best approach is to use the SAS XML Mapper application to explore the XML and "design" the final data tables that you need. You use SAS XML Mapper to create a map file, which you can then feed into the SAS XMLv2 engine to instruct SAS how to read the data. (See documentation for the XMLv2 engine.)

SAS XML Mapper is available as a free download from support.sas.com. Download it as a ZIP file (on Windows), and extract the ZIP file to a temporary folder. Then run setup.exe in the root of that folder to install the app on your system.

To design the map, I use an example of the XML feed from the blog that I want to examine. Once again, I'll choose a popular WordPress blog instead of the actual infringing sites. In this case, let's look at the Star Wars News site. I point my browser at the feed address is https://www.starwars.com/news/feed and save as an XML file. Then, I use SAS XML Mapper to Open XML (File menu), and examine the result.

I found everything that I needed in "item" subset of the feed. I dragged that group over to the right pane to include in the map. That creates a data set container named "item." Then dragged just the title, link, and pubDate fields into that data set to include in the final result.

The SAS XML Mapper generates a SAS program that you can include to define the map, and that's what I've done with the following code. It uses DATA step to create the map file just as I need it.

filename rssmap temp;
data _null_;
 infile datalines;
 file rssmap;
 input;
 put _infile_;
 datalines;
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="RSSMAP" version="2.1">
  <NAMESPACES count="0"/>
  <!-- ############################################################ -->
  <TABLE name="item">
    <TABLE-PATH syntax="XPath">/rss/channel/item</TABLE-PATH>
    <COLUMN name="title">
      <PATH syntax="XPath">/rss/channel/item/title</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>250</LENGTH>
    </COLUMN>
    <COLUMN name="link">
      <PATH syntax="XPath">/rss/channel/item/link</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>200</LENGTH>
    </COLUMN>
    <COLUMN name="pubDate">
      <PATH syntax="XPath">/rss/channel/item/pubDate</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>40</LENGTH>
    </COLUMN>
  </TABLE>
</SXLEMAP>
;
run;

Because WordPress feeds return just most recent 25 items by default, I need to use the "pageid=" directive to go deeper into the archive and return older items. I used a simple SAS macro loop to iterate through 5 pages (125 items) in this example. Note how I specified the XMLv2 libname with the XMLMAP= option to include my custom map. That ensures that SAS will read the XML and build the table as I've designed it.

My final DATA step in this part is to recast the pubDate field (a text field by default) into a proper SAS date.

/* WordPress feeds return data in pages, 25 entries at a time        */
/* So using a short macro to loop through past 5 pages, or 125 items */
%macro getItems;
  %do i = 1 %to 5;
  filename feed temp;
  proc http
   method="get"
   url="https://www.starwars.com/news/feed?paged=&i."
   out=feed;
  run;
 
  libname result XMLv2 xmlfileref=feed xmlmap=rssmap;
 
  data posts_&i.;
   set result.item;
  run;
  %end;
%mend;
 
%getItems;
 
/* Assemble all pages of entries                       */
/* Cast the date field into a proper SAS date          */
/* Have to strip out the default day name abbreviation */
/* "Wed, 10 Apr 2019 17:36:27 +0000" -> 10APR2019      */
data allPosts ;
 set posts_:;
 length sasPubdate 8;
 sasPubdate = input( substr(pubDate,4),anydtdtm.);
 format sasPubdate dtdate9.;
 drop pubDate;
run;

Reporting the results

After gathering the data I need from RSS feeds, I use SAS to match that with the WordPress data that I have about our blogs. I can then generate a table that I can easily submit in a DCMA form.

Usually, matching by "article title" is the easiest method. However, sometimes the infringing site will alter the titles a little bit or even make small adjustments to the body of the article. (This reminds me of my college days in computer science, when struggling students would resort to cheating by copying someone else's program, but change just the variable names. It's a weak effort.) With the data in SAS, I've used several other techniques to detect the "distance" of a potentially infringing post from the original work.

Maybe you want to see that code. But you can't expect me to reveal all of my secrets, can you?


* props to Robert "I dare you to knock it off" Conrad.

The post Read RSS feeds with SAS using XML or JSON appeared first on The SAS Dummy.

4月 012019
 

dividing by zero with SAS

Whether you are a strong believer in the power of dividing by zero, agnostic, undecided, a supporter, denier or anything in between and beyond, this blog post will bring all to a common denominator.

History of injustice

For how many years have you been told that you cannot divide by zero, that dividing by zero is not possible, not allowed, prohibited? Let me guess: it’s your age minus 7 (± 2).

But have you ever been bothered by that unfair restriction? Think about it: all other numbers get to be divisors. All of them, including positive, negative, rational, even irrational and imaginary. Why such an injustice and inequality before the Law of Math?

We have our favorites like π, and prime members (I mean numbers), but zero is the bottom of the barrel, the lowest of the low, a pariah, an outcast, an untouchable when it comes to dividing by. It does not even have a sign in front of it. Well, it’s legal to have, but it’s meaningless.

And that’s not all. Besides not being allowed in a denominator, zeros are literally discriminated against beyond belief. How else could you characterize the fact that zeros are declared as pathological liars as their innocent value is equated to FALSE in logical expressions, while all other more privileged numbers represent TRUE, even the negative and irrational ones!

Extraordinary qualities of zeros

Despite their literal zero value, their informational value and qualities are not less than, and in many cases significantly surpass those of their siblings. In a sense, zero is a proverbial center of the universe, as all the other numbers dislocated around it as planets around the sun. It is not coincidental that zeros are denoted as circles, which makes them forerunners and likely ancestors of the glorified π.

Speaking of π, what is all the buzz around it? It’s irrational. It’s inferior to 0: it takes 2 π’s to just draw a single zero (remember O=2πR?). Besides, zeros are not just well rounded, they are perfectly rounded.

Privacy protection experts and GDPR enthusiasts love zeros. While other small numbers are required to be suppressed in published demographical reports, zeros may be shown prominently and proudly as they disclose no one’s personally identifiable information (PII).

No number rivals zero. Zeros are perfect numerators and equalizers. If you divide zero by any non-zero member of the digital community, the result will always be zero. Always, regardless of the status of that member. And yes, zeros are perfect common denominators, despite being prohibited from that role for centuries.

Zeros are the most digitally neutral and infinitely tolerant creatures. What other number has tolerated for so long such abuse and discrimination!

Enough is enough!

Dividing by zero opens new horizons

Can you imagine what new opportunities will arise if we break that centuries-old tradition and allow dividing by zero? What new horizons will open! What new breakthroughs and discoveries can be made!

With no more prejudice and prohibition of the division by zero, we can prove virtually anything we wish. For example, here is a short 5-step mathematical proof of “4=5”:

1)   4 – 4 = 10 – 10
2)   22 – 22 = 5·(2 – 2)
3)   (2 + 2)·(2 – 2) = 5·(2 – 2) /* here we divide both parts by (2 – 2), that is by 0 */
4)   (2 + 2) = 5
5)   4 = 5

Let’s make the next logical step. If dividing by zero can make any wish a reality, then producing a number of our choosing by dividing a given number by zero scientifically proves that division by zero is not only legitimate, but also feasible and practical.

As you will see below, division by zero is not that easy, but with the power of SAS, the power to know and the powers of curiosity, imagination and perseverance nothing is impossible.

Division by zero - SAS implementation

Consider the following use case. Say you think of a “secret” number, write it on a piece of paper and put in a “secret” box. Now, you take any number and divide it by zero. If the produced result – the quotient – is equal to your secret number, wouldn’t it effectively demonstrate the practicality and magic power of dividing by zero?

Here is how you can do it in SAS. A relatively “simple”, yet powerful SAS macro %DIV_BY_0 takes a single number as a numerator parameter, divides it by zero and returns the result equal to the one that is “hidden” in your “secret” box. It is the ultimate, pure artificial intelligence, beyond your wildest imagination.

All you need to do is to run this code:

 
data MY_SECRET_BOX;        /* you can use any dataset name here */
   MY_SECRET_NUMBER = 777; /* you can use any variable name here and assign any number to it */
run;
 
%macro DIV_BY_0(numerator);
 
   %if %sysevalf(&numerator=0) %then %do; %put 0:0=1; %return; %end;
   %else %let putn=&sysmacroname; 
   %let %sysfunc(putn(%substr(&putn,%length(&putn)),words.))=
   %sysevalf((&numerator/%sysfunc(constant(pi)))**&sysrc);  
   %let a=com; %let null=; %let nu11=%length(null); 
   %let com=*= This is going to be an awesome blast! ;
   %let %substr(&a,&zero,&zero)=*Close your eyes and open your mind, then;
   %let imagine = "large number like 71698486658278467069846772 Bytes divided by 0";
   %let O=%scan(%quote(&c),&zero+&nu11); 
   %let l=%scan(%quote(&c),&zero);
   %let _=%substr(%scan(&imagine,&zero+&nu11),&zero,&nu11);
   %let %substr(&a,&zero,&zero)%scan(&&&a,&nu11+&nu11-&zero)=%scan(&&&a,-&zero,!b)_;
   %do i=&zero %to %length(%scan(&imagine,&nu11)) %by &zero+&zero;
   %let null=&null%sysfunc(&_(%substr(%scan(&imagine,&nu11),&i,&zero+&zero))); %end;
   %if &zero %then %let _0=%scan(&null,&zero+&zero); %else;
   %if &nu11 %then %let _O=%scan(&null,&zero);
   %if %qsysfunc(&O(_&can)) %then %if %sysfunc(&_0(&zero)) %then %put; %else %put;
   %put &numerator:0=%sysfunc(&_O(&zero,&zero));
   %if %sysfunc(&l(&zero)) %then;
 
%mend DIV_BY_0;
 
%DIV_BY_0(55); /* parameter may be of any numeric value */

When you run this code, it will produce in the SAS LOG your secret number:

55:0=777

How is that possible without the magic of dividing by zero? Note that the %DIV_BY_0 macro has no knowledge of your dataset name, nor the variable name holding your secret number value to say nothing about your secret number itself.

That essentially proves that dividing by zero can practically solve any imaginary problem and make any wish or dream come true. Don’t you agree?

There is one limitation though. We had to make this sacrifice for the sake of numeric social justice. If you invoke the macro with the parameter of 0 value, it will return 0:0=1 – not your secret number - to make it consistent with the rest of non-zero numbers (no more exceptions!): “any number, except zero, divided by itself is 1”.

Challenge

Can you crack this code and explain how it does it? I encourage you to check it out and make sure it works as intended. Please share your thoughts and emotions in the Comments section below.

Disclosure

This SAS code contains no cookies, no artificial sweeteners, no saturated fats, no psychotropic drugs, no illicit substances or other ingredients detrimental to your health and integrity, and no political or religious statements. It does not collect, distribute or sell your personal data, in full compliance with FERPA, HIPPA, GDPR and other privacy laws and regulations. It is provided “as is” without warranty and is free to use on any legal SAS installation. The whole purpose of this blog post and the accompanied SAS programming implementation is to entertain you while highlighting the power of SAS and human intelligence, and to fool around in the spirit of the date of this publication.

Dividing by zero with SAS was published on SAS Users.

3月 112019
 

A SAS programmer posted an interesting question on a SAS discussion forum. The programmer wanted to iterate over hundreds of SAS data sets, read in all the character variables, and then do some analysis. However, not every data set contains character variables, and SAS complains when you ask it to read the character variables in a data set that contains only numeric variables.

The programmer wanted to use PROC IML to solve the problem, but the issue also occurs in the SAS DATA step. The following program creates three data sets. Two of them (AllChar and Mixed) contain at least one character variable. The third data set (AllNum) does not contain any character variables. For the third data set, an error occurs if you try to use the KEEP=_CHARACTER_ data set option, as shown in the following example:

data AllNum;
   x=1; y=2; z=3;
run;
data AllChar;
   A='ABC'; B='XYZW';
run;
data Mixed;
   name='Joe'; sex='M'; Height=1.8; Weight=81; treatment='Placebo'; 
run;
 
/* try to use DROP=_CHARACTER_ to exclude numeric variables */
data KeepTheChar;
   set AllNum(keep=_CHARACTER_); /* ERROR when no character variables in the data set */
run;
   ERROR: The variable _CHARACTER_ in the DROP, KEEP, or RENAME list has never been referenced.

The same problem occurs in PROC IML if you try to read character variables when none exist:

proc iml;
use AllNum;
   read all var _CHAR_ into X; /* ERROR when no character variables in the data set */
close;
   ERROR: No character variables in the data set.

There are at least two ways to handle this situation:

  1. In both Base SAS and SAS/IML, you can use dictionary tables to determine in advance which data sets contain at least one character variable. You can then read only those data set.
  2. In SAS/IML, you can read all variables into a table, then extract the character variables into a matrix for further processing.

Of course, the same ideas apply if you want to read only numeric variables and you encounter a data set that does not contain any numeric variables.

Use DICTIONARY tables to find information about your SAS session

If you have ever been to a SAS conference, you know that DICTIONARY tables are a favorite topic for SAS programmers. DICTIONARY tables are read-only tables that provide information about the state of the SAS session, including libraries, data sets, variables, and system options. You can access them directly by using PROC SQL. If you want to access the information in the DATA steps or other procedures (like PROC IML), you can use special data views in SASHELP. In particular, the Sashelp.VColumn view provides information about variables in SAS data set and is often used to find data sets that contain certain variable names. (See the references at the end of this article for more information about DICTIONARY tables.)

The following SAS/IML program uses the Sashelp.VColumn to find out which data sets contain at least one character variable:

proc iml;
/* Solution 1: Use dictionary table sashelp.vcolumn */
/* Find data sets in WORK that have AT LEAST ONE character variable */
use sashelp.vcolumn(where=(libname="WORK" & memtype='DATA' & type='char'); /* read only CHAR variables */
   read all var {memname name};  /* memname=data set name; name=name of character variable */
close;
 
/* loop over data sets. If a set contains at least one character variable, process it */
dsName = {'AllNum' 'AllChar' 'Mixed'};       /* names of potential data sets */
do i = 1 to ncol(dsName);
   idx = loc(memname = upcase(dsName[i]));   /* is data set on the has-character-variable list? */
   /* for demo, print whether data set has character variables */
   msg = "The data set " + (dsName[i]) + " contains " + 
          char(ncol(idx)) + " character variables.";
   print msg;
 
   if ncol(idx)>0 then do;            /* the data set exists and has character vars */
      charVars = name[idx];           /* get the names of the character vars */
      use (dsName[i]);                /* open the data set for reading */
      read all var charVars into X;   /* read character variables (always succeeds) */
      close;
      /* process the data */
   end;
end;

The output shows that you can use the DICTIONARY tables to determine which data sets have at least one character variable. You can then use the USE/READ statements in PROC IML to read the character variables and process the data however you wish. As mentioned previously, this technique can also be used in PROC SQL and the DATA step.

Use SAS/IML tables to find character variables

The previous section is very efficient because only character variables are ever read into SAS/IML matrices. However, there might be situations when you want to process character variables (if they exist) and then later process numerical variables (if they exist). Although a SAS/IML matrix contains only one data type (either all numeric or all character), you can read mixed-type data into a SAS/IML table, which supports both numeric and character variables. You can then use the TableIsVarNumeric function to generate a binary indicator variable that tells you which variables in the data are numeric and which are character, as follows:

/* Solution 2: Read all data into a table. Use the TableIsVarNumeric function to determine
   which variables are numeric and which are character. */
dsName = {'AllNum' 'AllChar' 'Mixed'};             /* names of potential data sets */
do i = 1 to ncol(dsName);                          /* for each data set... */
   T = TableCreateFromDataset("WORK", dsName[i]);  /* read all variables into a table */
   numerInd = TableIsVarNumeric(T);                /* binary indicator vector for numeric vars */
   charInd = ^numerInd;                            /* binary indicator vector for character vars */
   numCharVars = sum(charInd);                     /* count of character variables in this data set */
   msg = "The data set " + (dsName[i]) + " contains " + 
         char(numCharVars) + " character variables.";
   print msg;
   if numCharVars > 0 then do;
      X = TableGetVarData(T, loc(charInd));        /* extract the character variables into X */
      /* process the data */
   end;
   /* optionally process the numeric data */
   numNumerVars = sum(numerInd);                   /* count of numeric variables in this data set */
   /* etc */
end;

The output is identical to the output in the previous section.

Summary

In summary, this article discusses a programmer who wants to iterate over many SAS data sets and process only character variables. However, some of the data sets do not have any character variables! This article shows two methods for dealing with this situation: DICTIONARY tables (available through Sashelp views) or SAS/IML tables. The first method is also available in Base SAS.

Of course, you can also use this trick to read all numeric variables when some of the data sets might not have any numeric variable. I've previously written about how to read all numeric variables into a SAS/IML matrix by using the _ALL_ keyword. If the data set contains both numeric and character variables, then only the numeric variables are read.

References

The following resources provide more information about DICTIONARY tables in SAS:

The post How to detect SAS data sets that contain (or do not contain) character variables appeared first on The DO Loop.