10月 312019
 

Buy my costume, Georgie!

While growing up in the 80's, I watched The Golden Girls on TV with my Grandma Betty. Now, when my sister visits, we binge watch reruns on TV Land. I was excited when I saw for this Halloween, you could buy Golden Girls costumes! Too bad they sold out right away, making them one this year's most popular costumes.

For the record, I wasn't planning to dress up tonight as a Golden Girl, but the news got me to thinking how Halloween costumes have changed over the years. What was popular when? Hence, this post. I explain how to use SAS REST APIs to append a table containing historic costume data with this year's most popular costumes (including the Golden Girls and Pennywise from It). While looking at costume data in this example, consider the append steps as a template to translate any table needing updates in SAS Viya, using REST APIs.

The data

There I am, in the middle

I created a data set containing the most popular Halloween costumes from the last 50 years (1968-2018). I compiled the data from several sources who couldn't seem to agree on the best-selling costume for a given year, so I combined the lists. Many years have two entries. The data here isn't as important as the append table procedure. What fun to review the costumes list! It was not hard to tell in what year certain movies (and their sequels) were released. Only one costume I wore made the list – my 1979 Ace Frehley outfit!

The process

The procedures in this example run on SAS Viya, utilizing the Cloud Analytics Services (CAS) REST APIs. CAS REST APIs use CAS actions to perform statistical methods across a variety of SAS products. You can learn more about CAS REST APIs and CAS Actions in the Using SAS Cloud Analytics Service REST APIs to run CAS Actions post or on developer.sas.com.

Below, I'll detail each REST call along with sample code. I originally used Postman to organize my calls. This allowed me to utilize pre and post-call scripting to handle responses and create variables. You can find the entire Postman collection here on GitHub. For ease-of-display purposes in this post, I'll use equivalent cURL commands.

Pre-requisites

I registered my client, obtained access token, and added it as an environment variable ACCESSTOKEN. For more information on registering a client or getting an access token, see my earlier post Authentication to SAS Viya: a couple of approaches.

Create a CAS session

Before running any CAS actions, we need to establish a connection to the SAS Viya server.

curl -X POST https://sasserver:8777/cas/sessions \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/vnd.sas.cas.session+json'

The result of this call is a session id in the form of a089ce2b-8116-7a40-b3e3-6e39b7b5566d. This will be used in all subsequent REST calls. You could easily create another variable for further use. In the examples below I substitute the actual session id with <session-id>. You will need to substitute this place holder when attempting the steps on your own.

Create a global Caslib HALLOWEEN

Data in CAS is stored in a Caslib. In the step below, I create a Caslib called HALLOWEEN and link it to a physical server path (/home/sasdemo/halloween), where the table is stored.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"HALLOWEEN","path":"/home/sasdemo/halloween","description":"HALLOWEEN","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Note that I created the directory ~/halloween and set permissions as needed. Further, since the Caslib is global, other users have access to the data. This step (and the next step) are one time requests. If you were to repeat this process you would not need to create the Caslib nor upload the data.

Copy data set costumesByYear into HALLOWEEN's path

Now that we have a Caslib and a path, we load the data table to the server. In this instance, I copy the costumesByYear.xlsx file into /home/sasdemo/halloween. There are multiple ways to upload data to the server. You can read more about the various methods in the SAS documentation.

Create a temporary Caslib TEMP

While our data lives in the HALLOWEEN Caslib, we want to create a temporary Caslib to run the append step. We will then save the appended table back into HALLOWEEN. The following code creates a new Caslib called TEMP.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"TEMP","path":"/home/sasdemo/temp","description":"TEMP","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Now we're ready to load the data into memory and append the table.

Load costumesByYear into memory

First, we load costumesByYear into memory in the TEMP Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.loadTable
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"path":"costumesByYear.xlsx","caslib":"HALLOWEEN","importOptions":{"fileType":"EXCEL"},"casOut":{"caslib":"TEMP","name":"costumesByYear","promote":"true"}}

Create a temporary table data2019 with containing append data

Next, we create a new table called data2019 with costume data for 2019 in TEMP.

curl -X PUT https://sasserver:8777/cas/sessions/<session-id>/actions/upload
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: text/plain' \
  -H 'JSON-Parameters: {"casOut":{"caslib":"TEMP","name":"data2019","promote":"true"},"importOptions":{"fileType":"CSV"}}' \
  --data-binary $'Year,Costume\n2019,The Golden Girls\n2019,Pennywise'

Run data step to append data2019 to costumesByYear table

Finally, we run data step code to append table data2019 to table costumesByYear.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/runCode \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"code": "data temp.costumesbyyear(append=force) ; set temp.data2019;run;"}

Save the costumesByYear table back to the HALLOWEEN CASlib

Now that we have successfully appended the costumesByYear table in the TEMP Caslib, we are ready to save back to the HALLOWEEN Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.save \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"table":{"name":"costumesbyyear","caslib":"TEMP","singlePass":"false"},"name":"costumesbyyear","replace":"true","compress":"false","caslib":"HALLOWEEN","exportOptions":{"fileType":"BASESAS"}}

Delete TEMP Caslib

The TEMP Caslib is just that, temporary. With the code below we drop the Caslib and all its data.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/table.dropCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"caslib":"TEMP"}

Delete the CAS session

The final step is to close our connection to the CAS server.

curl -X DELETE https://sasserver:8777/cas/sessions/<session-id> \
  -H 'Authorization: Bearer $ACCESSTOKEN'

Wrapping it up

There you have it. With a few simple commands we were able to load, append, and save a table. This example is fairly simple in scope, but translates into more complex use cases. The steps for my 2 x 50 table are the same as it would be for a 5GB table with hundreds of columns and millions of rows.

I have asked my mother to send the Polaroid photo of me as Ace in 1979. She just has to dig it out of a photo album. Check back in a week so you can gain fodder and poke fun at me.

Additional Resources

developer.sas.com - developers site for SAS
GitHub resources - GitHub repository for code used in this post

Append tables in SAS® Viya® with REST APIs – a treat, no tricks was published on SAS Users.

10月 302019
 

Every year at Halloween, I post an article that shows a SAS trick that is a real treat. This article shows how to use the INTNX function to find dates that are related to a specified date. The INTNX function is a sweet treat, indeed.

I previously wrote an article about how to use the INTCK and INTNX functions to compute with dates. These Base SAS functions are very powerful and deserve to be known more widely. In particular, the INTNX function enables you to compute the next or previous date that is a certain number of time units away from a known date. The "time unit" is usually a day, week, a month, or a year, but the function supports many options.

Recently, a SAS programmer asked how to get "the first and last days of the previous month." The programmer added that the expression needs to be used in a WHERE clause. Because the expression needs to appear in a WHERE clause, it should be concise and not require several statements or temporary variables.

The first day of a previous (or subsequent) time interval

Finding the first day of the previous month is an ideal situation for using the INTNX function. The basic syntax of the INTNX function is
      INTNX(timeUnit, startDate, numberOfUnits)
This form of the INTNX function returns the first day of the specified time unit. For example, the following statements give dates relative to the bombing of Pearl Harbor on 07DEC1941. The time interval is 'month'. Notice that you can ask for dates after the given date (a positive number of time units) or before the given date (a negative number of units). If you specify 0 for the third argument, you get the current month.

data Months;
date = '07DEC1941'd;
FirstDayCurrMonth = intnx('month', Date,  0);   /*  0 = current month */
FirstDayPrevMonth = intnx('month', Date, -1);   /* -1 = previous month */
FirstDayNextMonth = intnx('month', Date,  1);   /*  1 = next month */
FirstDay6Months   = intnx('month', Date,  6);   /*  6 = six months later */
format _ALL_ date9.;
run;
 
proc print data=Months noobs;
run;

Because the time unit is 'month' for this example, the calculated dates are the first day of the months relative to 07DEC1941. If you change 'month' to 'year', all the calculated dates will be 01JAN of some year relative to 1941.

The last (or same) day of a previous (or subsequent) time interval

A cool fact about the INTNX function is that it supports an optional fourth argument that enables you to specify whether you want the calculated date to be at the beginning, the middle, or the end of the specified time interval. You can even specify that you want the "same" characteristics as the source date, which is useful for finding anniversaries of an event. For example, the following statements vary the fourth argument, which can be one of four values:

data FirstLastMiddle;
date = '07DEC1941'd;
FirstDayPrevMonth = intnx('month', Date, -1, 'B');   /* B = beginning */
LastDayPrevMonth  = intnx('month', Date, -1, 'E');   /* E = end */
MiddlePrevMonth   = intnx('month', Date, -1, 'M');   /* M = middle */
FirstAnniv        = intnx('year',  Date,  1, 'S');   /* S = same */
format _ALL_ date9.;
run;
 
proc print data=FirstLastMiddle noobs;
run;

The program shows that you can find the first day of the previous month, the last day of the previous month, the middle of the previous month, or an anniversary of the specified date. In particular, the program answers the programmer's question by showing a concise "one-liner" that you can use to get the first and last days of the previous month.

In summary, the INTNX function is a powerful tool for working with dates. It enables you to find dates that are related to a specified date. You can use the first argument to specify the time unit (day, week, month, year,...) and the third argument to specify the number of time units before or after the specified data. The optional fourth argument determines whether you want the first, last, middle, or "same" portion of the time interval.

Whether you work with dates in SAS every day or whether you work with them occasionally, the INTNX function is a sweet treat to remember. No tricks required.

The post Compute the first or last day of a month or year appeared first on The DO Loop.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 302019
 

I suffer from arthritis. You can tell just by watching me walk: Depending on the day, I have a slight limp, which varies in severity based on a number of factors such as the time of day and recent physical activity. Years of treatment for my condition have shown me [...]

I applied AI to my arthritis assessment. Here’s what happened. was published on SAS Voices by Mark Wolff

10月 292019
 

Thank you to Lora Delwiche and Susan Slaughter for providing the following information:

Six editions is a lot! If you had told us back when we wrote the first edition of The Little SAS Book that someday we would write a sixth, we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created new ways that are even better. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix, and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to:

• Subset your data
• Join data sets
• Add summary statistics to a data set
• Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

• More examples with permanent SAS data sets, CSV files, or tab-delimited files
• More log notes throughout the book showing what to look for
• LIKE or sounds-like (=*) operators in WHERE statements
• CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
• Grouping data with a user-defined format and the PUT function
• Iterative DO groups
• DO WHILE and DO UNTIL statements
• %DO statements

Even though we have added a lot to this edition, it is still a little book. In fact, this edition is shorter than the last—by 12 pages! We think this is the best edition yet. For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and to exclusive discounts -- make sure to subscribe to the newsletter.

The Little SAS Book 6.0: The best-selling SAS book gets even better was published on SAS Users.

10月 282019
 

If you're looking for advice on developing an analytics strategy, there's no shortage of resources, including this from SAS: Building your data and analytics strategy.  If, on the other hand, you're looking for advice on how to apply analytics to strategic planning, your search has likely to come up wanting.  [...]

8 ways analytics can support strategic planning and decision making was published on SAS Voices by Leo Sadovy

10月 282019
 

A common task in SAS programming is to specify a list of variables that satisfy some pattern. You can specify lists for the KEEP= or DROP= data set options, and you can use lists of variables on many SAS statements such as the VAR and MODEL statements. Although SAS has built-in support for some patterns (like variables that start with the same prefix), you might want to match variable names to less-common patterns. In those situations, you can use regular expressions to match variable names by using the PRXMATCH function in Base SAS. The PRXMATCH function is one of several functions in SAS that support Perl regular expressions (PRX).

Built-in support for specifying variables in SAS

In a previous article, I discussed six different ways to create a list of variable names in SAS. Of these, the most common are

  • The colon operator for specifying names that have a common prefix. For example, AGE: specifies variables that begin with the prefix "age" (recall that SAS variable names are case insensitive).
  • The dash operator for specifying a sequence of variable names that have the same prefix and a numerical suffix. For example, X1-X5 matches the variables X1, X2, X3, X4, and X5.

For example, the following table shows variables in the Sashelp.Heart data set:

proc contents data=Sashelp.Heart short varnum; run;

You can see that several variables begin with the prefix 'Age'. By using the colon operator (Age:) you can match all variables that match the prefix, such as in the following example:

title "Colon (Prefix) Variable Names";
data PrefixVars;
   set Sashelp.Heart(keep=Age:);
run;
proc contents short varnum; run;

Specifying variables that have a common suffix

Several of the variables in the Sashelp.Heart data end with the suffix 'Status'. Unfortunately, there is no built-in operator in SAS to match a suffix such as 'Status'. However, Bruno Mueller and Mark Jordan have provided a SAS macro that uses regular expressions to select variables that match any pattern. The list of variables is returned as a text string, so you can use it in the usual places, such as the KEEP= options or a VAR statement. The following example assumes you have downloaded and run the definition of the %varListPattern macro.

title "Suffix Variable Names";
data SuffixVars;
   set Sashelp.Heart(keep=%varListPattern(sashelp.Heart,*status));
run;
proc contents short varnum; run;

This macro fills a need, and I like it a lot. In addition to matching variables that share a common suffix, you can also use the macro to find variables that match other patterns. For example, you can use the pattern "*at*" to match variables that have the string "at" anywhere in their name. In addition to the "status" variables found above, that pattern also matches the variables DeathCause, AgeAtStart, and AgeAtDeath.

Some programmers don't realize that all SAS procedures support data set options (such as KEEP= and DROP=) when you specify the name of a data set in a procedure. That means that you can use the built-in pattern matching and the %varListPattern macro throughout SAS. For example, here is how you would read a set of prefix-variables and a set of suffix-variables into SAS/IML matrices:

proc iml;
use Sashelp.Heart(keep=Age:);       /* use the coon operator to match prefixes */
   read all var _ALL_ into X[colname=prefixVars];
close;
print prefixVars;
 
use Sashelp.Heart(keep=%varListPattern(sashelp.Heart,*status)); /* use macro to match suffixes */
   read all var _ALL_ into C[colname=suffixVars];
close;
print suffixVars;

Extract a vector of strings that match a pattern

At its heart, the %varListPattern macro calls the PRXMATCH function. You can use the PRXMATCH function to determine whether a variable name (in fact, any string!) matches a pattern that is specified by a regular expression. You can use the PRXMATCH function when the names of variables are in a data set. You can then use PROC SQL to create a macro variable that contains a space-separated list of all variables that match the pattern.

I needed this functionality recently when I was writing a SAS/IML function and needed to select all strings that had a common suffix. To understand the next example, recall the following SAS/IML programming features:

The following SAS/IML functions construct patterns like /^PREFIX/i and /.*SUFFIX$/i and use the PRXMATCH function to find strings that match the patterns. To demonstrate the function, the program uses the variable names in the Sashelp.Heart data.

proc iml;
/* Use PRXMATCH to find strings with a common prefix */
start MatchPrefix(prefix, str);
   re = '/^' + strip(prefix) + '/i';    /* beginning of word, case insensitive */
   idx = loc(prxmatch(re, str));
   if ncol(idx)=0 then return( {} );    /* return empty matrix */
   return( str[idx] );
finish;
 
/* Use PRXMATCH to find strings with a common suffix */
start MatchSuffix(suffix, str);
   re = '/.*' + strip(suffix) + '$/i';  /* end of word, case insensitive */
   idx = loc(prxmatch(re, right(str))); /* shift right so no blank spaces at end */
   if ncol(idx)=0 then return( {} );    /* return empty matrix */
   return( str[idx] );
finish;
 
Variable = contents( "Sashelp", "Heart" );   /* get variable names in data order */
prefixVars = MatchPrefix("Age", Variable);
suffixVars = MatchSuffix("status", Variable);
print prefixVars, suffixVars;

I emphasize that although this example uses variable names as the strings, you can use the PRXMATCH function to search for patterns in arbitrary strings. In a similar way, you can construct other functions that find strings that match any regular expression that you can construct.

In summary, regular expressions in SAS are a powerful feature for matching strings that contain some pattern of characters. The examples in this article use simple regular expressions to find all variables that contain a common prefix (same as the built-in colon operator) or a common suffix. For many SAS procedures that require a list of variable names, you can use the %varListPattern macro to generate the variable list. For other applications, you can call the PRXMATCH function directly.

For an introduction to regular expressions in SAS, see "An Introduction to Perl Regular Expressions in SAS 9" (Cody, 2004) and "The Basics of the PRX Functions" (Cassell, 2007).

The post Use regular expressions to specify variable names in SAS appeared first on The DO Loop.

10月 242019
 

“Analytics Can Save Higher Education. Really.” is a call to action for the higher education community to leverage data and analytics for better decision making at colleges and universities. It stresses the importance of using data and analytics to improve student outcomes, campus operations and much more. Oklahoma State University [...]

Establishing an analytics culture: An interview with Oklahoma State University was published on SAS Voices by Georgia Mariani