Leonid Batkhan

5月 282020
 

SAS toolbox: macro functions
Did you know you could have a single universal function that can replace all the functions in the world? All those sin(x), log(x), … whatever(x) can all be replaced by a single super function f(x). Don’t believe me? Just make those functions names – sin, log, … whatever to be another argument to that all-purpose function f, just like that: f(x, sin), f(x, log), … f(x, whatever). Now, we only must deal with a single function instead of many, and its second argument will define what transformation needs to be done with the first argument in order to arrive at this almighty function’s value.

How many functions there are in SAS

Last time I counted there were more than 600 SAS functions, and that is excluding call routines and macro functions. But even that huge number grossly under-represents the actual number of functions available in SAS. That is because there are some functions that are built like the universal multi-purpose super function described above. For example, look at the following functions:

finance() function represents several dozen various financial functions;

finfo() function represents multiple functions returning various information items about files (file size, date created, date modified, access permission, etc.);

dinfo() function returns similar information items about directories;

attrn() function returns numeric attributes of a data set (number of observations, number of variables, etc.)

attrc() function returns character attributes of a data set (engine name, encoding name, character set, etc.)

Each of these functions represents not a single function, but a group of functions, and one of their arguments stipulates specific functionality (an information item or an attribute) that is being requested. You can think of this argument as a function modifier.

%sysfunc SAS macro function

%sysfunc() is a super macro function that brings a wealth of SAS functions into SAS macro language. With very few exceptions, most SAS functions are available in SAS macro language thanks to the %sysfunc().

Moreover, we can build our own user-defined macro functions using SAS-supplied macro functions (such as %eval, %length, %quote, %scan, etc.), as well as hundreds of the SAS non-macro functions wrapped into the %sysfunc() super macro function.

Building a super macro function to retrieve information about data sets

Armed with such a powerful arsenal, let’s build a multi-purpose macro function that taps into the data tables’ metadata and extracts various information items about those tables.

Let’s make this macro function return any of the following most frequently used values:

  • Number of observations
  • Number of variables
  • Variables list (positional, separated by spaces)
  • Variables list (positional, separated by commas)

Obviously, we can create much more of these information items and attributes, but here I am just showing how to do this so that you can create your own list depending on your needs.

In my earlier blog post, How to create and use SAS macro functions, we had already built a macro function for getting the number of observations; let’s expand on that.

Here is the SAS Macro code that handles extraction of all four specified metadata items:

%macro dsinfo(dset,info);
/* dset - data set name                             */
/* info - modifier (NOBS, NVARS, VARLIST, VARLISTC) */      
   %local dsid result infocaps i;
   %let infocaps = %upcase(&info);
   %let dsid = %sysfunc(open(&dset));
   %if &dsid %then
   %do;
      %if &infocaps=NOBS %then %let result = %sysfunc(attrn(&dsid,nlobs));
      %else %if &infocaps=NVARS %then %let result = %sysfunc(attrn(&dsid,nvars));
      %else %if &infocaps=VARLIST %then
      %do i=1 %to %sysfunc(attrn(&dsid,nvars));
         %let result = &result %sysfunc(varname(&dsid,&i));
      %end;
      %else %if &infocaps=VARLISTC %then
      %do i=1 %to %sysfunc(attrn(&dsid,nvars));
         %if &i eq 1 %then %let result = %sysfunc(varname(&dsid,&i));
         %else %let result = &result,%sysfunc(varname(&dsid,&i));
      %end;
      %let dsid = %sysfunc(close(&dsid));
   %end;
   %else %put %sysfunc(sysmsg());
   &result
%mend dsinfo;

The SAS log will show:

%put NOBS=***%dsinfo(SASHELP.CARS,NOBS)***;
NOBS=***428***
%put NVARS=***%dsinfo(SASHELP.CARS,NVARS)***;
NVARS=***15***
%put VARLIST=***%dsinfo(SASHELP.CARS,VARLIST)***;
VARLIST=***Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length***
%put VARLISTC=***%dsinfo(SASHELP.CARS,VARLISTC)***;
VARLISTC=***Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length***

Macro function code highlights

We used the following statement to make our macro function case-insensitive regarding the info argument:

%let infocaps = %upcase(&info);

Then depending on the up-cased second argument of our macro function (modifier) we used the attrn(), varnum() and varname() functions within %sysfunc() to retrieve and construct our result macro variable.

We stick that result macro variable value, &result, right before the %mend statement so that the value is returned to the calling environment.

While info=VARLIST (space-separated variable list) is useful in DATA steps, info=VARLISTC (comma-separated variable list) is useful in PROC SQL.

Usage example

Having this %dsinfo macro function at hands, we can use it in multiple programming scenarios. For example:

/* ending SAS session if no observations to process */
%if %dsinfo(SASHELP.CARS,NOBS)=0 %then %do; endsas; %end;
 
/* further processing */
data MYNEWDATA (keep=%dsinfo(SASHELP.CARS,VARLIST));
   retain %dsinfo(SASHELP.CARS,VARLIST);
   set SASHELP.CARS;
   if _n_=1 then put %dsinfo(SASHELP.CARS,VARLIST);
   /* ... */
run;

Here we first check if there is at least one observation in a data set. If not (0 observations) then we stop the SAS session and don’t do any further processing. Otherwise, when there are some observations to process, we continue.

If SAS code needs multiple calls to the same macro function with the same argument, we can shorten the code by first assigning that macro function’s result to a macro variable and then reference that macro variable instead of repeating macro function invocation. Here is an example:

/* further processing */
%let vlist = %dsinfo(SASHELP.CARS,VARLIST);
data MYNEWDATA (keep=&vlist);
   retain &vlist;
   set SASHELP.CARS;
   if _n_=1 then put &vlist;
   /* ... */
run;

Additional resources

Your thoughts?

Do you see the benefits of these multi-purpose SAS macro functions? Can you suggest other scenarios of their usage? Please share your thoughts in the comments section below.

Multi-purpose macro function for getting information about data sets was published on SAS Users.

4月 232020
 

SAS macro function

SAS® Macro Language immensely empowers SAS programmers with versatility and efficiency of their code development. It allows SAS users to modularize programming code with “once written – many times used” components, and in many cases automatically generate data-driven SAS code.

Macro language and macro processor

Generally, SAS software processes your SAS program step by step, first scanning it for macro language objects - macro variables referenced as &somename, and macros referenced as %somename. If found, SAS software activates macro processor which resolves and substitutes those macro references according to the macro language syntax before SAS compiles and executes your programming steps.

SAS macro language vs. SAS programming language

A SAS program usually consists of two, often interwoven layers – macro layer and non-macro layer, each with its own syntax and its own timing of compilation and execution. In other words, SAS code is a combination of two distinct languages:

  • SAS programming language (comprised of DATA steps, PROC steps and global statements such as LIBNAME, OPTIONS, TITLE etc.)
  • SAS macro language (comprised of %LET, %IF, %DO, macro functions etc.) which is processed separately from and before SAS compiler executes SAS programming language code.

The difference between them is like a difference between cooking a meal and eating the meal. In this analogy meal=code, cooking=SAS macro language, eating=SAS programming language. Clear understanding of this difference is the key to becoming a successful SAS programmer.

Two types of SAS macros

There are two distinct types of SAS macros:

  1. Macros that generate some SAS programming language code which can span across SAS statements or steps;
  2. Macros that generate some string values which can be used as part of SAS programming language code or data values, but they are not complete SAS statements or steps. This type does not generate any SAS executable code, just a value.

What is a SAS macro function?

SAS macro function is a SAS macro that generates a value. In other words, it is the type 2 macro described above. As any SAS macros, SAS macro functions can have any number (zero or more) of positional or/and named parameters (arguments). SAS users may define their own macro functions, but in doing so you may not utilize any SAS language syntax; only SAS macro language syntax is allowed. You can use existing macro functions in your own macro function definition. Among others, one of the most powerful is %SYSFUNC macro function which brings a wealth of SAS language functions into SAS macro language.

Sources of SAS macro functions

SAS macro functions may come from the following three sources.

1. Pre-built macro functions

Pre-built macro functions that are part of the macro processor. These are such macro functions as %eval, %length, %quote, %scan, %str, %sysfunc, %upcase, etc. Here is a complete list of the pre-built SAS macro functions.

2. Auto-call macro functions

Auto-call macros, some of them are type 1 (macros), and some – type 2 (macro functions) such as %cmpres, %left, %lowcase, %trim, %verify, etc. These macro functions supplement the pre-built macro functions. The main difference from the pre-built macro functions is that the auto-call macro functions are program samples of the user-defined macro functions that are written in SAS macro language and made available to you without having to define or include them in your programs. The auto-call macro functions come with your SAS software installation and usually pre-configured for you by setting MAUTOSOURCE and SASAUTOS= macro system options. They may include several macro libraries depending on the SAS products licensed at your site. For example, for my SAS BASE installation the auto-call macro library is in the following folder:

C:\Program Files\SASHome\SASFoundation\9.4\core\sasmacro

Here is a selected list of auto-call macros provided with SAS software.

From the usage standpoint, you will not notice any difference between the pre-built and the auto-call macro functions. For example, macro function %upcase() is pre-built, while macro function %lowcase() is auto-call macro function. They belong to entirely different families, but we use them as if they are complementary siblings.

3. User-defined macro functions

Finally, there are user-defined macro functions that do not come with SAS installation. These are the macro functions that you define on your own. Usually, they are kept separately from the auto-call macros mainly in order to distinguish them from SAS-supplied ones.

To enable access to your own SAS macro library in addition to the auto-call macro library (or libraries), you can use the INSERT= system option:

options insert=(sasautos="path_to_your_own_macro_library_folder");

Instead of replacing the SASAUTOS value, this option inserts an additional value into the existing SASAUTOS option as the first value, thus allowing you to tap into your own macro library first, and then also into pre-set SAS auto-call libraries.

Creating user-defined macro function

Let’s consider the following example. Suppose, we want to create a macro function that takes a data set name as an argument and returns a value equal to the number of observations in that data set.

We know that the following code calculates the number of observations in a data set:

data _null_;
   set SASHELP.CARS (obs=0) nobs=n;
   call symputx('NOBS',n);
run;
 
%put &=NOBS;
NOBS=428

Can we create a SAS macro function by enclosing this code into macro? Something like this:

%macro nobs(dset=,result=);
   data _null_;
      set &dset (obs=0) nobs=n;
      call symputx("&result",n);
   run;
%mend nobs;

The answer is “No”. Yes, we created a valid macro; we can invoke this macro to produce the result:

%nobs(dset=SASHELP.CARS, result=NOBS);
%put &=NOBS;
NOBS=428

But this is not a macro function. Remember type 2 macro that does not generate any SAS programming language code, just a value? But this macro does generate SAS code which assigns a value to the macro variable specified as the second argument (result=NOBS).

In order to create a valid macro function, our macro should not have any SAS language code in it – neither a DATA step, nor a PROC step. It may only be comprised of the SAS macro language code. Here it is:

%macro nobs(dset);
   %local dsid n rc;
   %let dsid = %sysfunc(open(&dset));
   %if &dsid %then
   %do;
      %let n = %sysfunc(attrn(&dsid,nlobs));
      %let dsid = %sysfunc(close(&dsid));
   %end;
   %else %put %sysfunc(sysmsg());
   &n
%mend nobs;

When macro processor executes this macro, the only object that gets passed to the SAS language compiler is the value shown in the line right before the %mend. This is the calculated value of the number of observations (denoted by &n ). This is the only thing that is visible by the SAS language compiler, the rest is the macro language code visible and being handled by SAS macro processor.

IMPORTANT: When defining SAS macro function always use %local statement to list ALL macro variables that are created in your macro to ensure they will not accidentally overwrite same-named macro variables in the calling environment. You don’t need to declare %local for macro parameters as they are always local automatically.

SAS macro functions usage examples

When a macro function is defined this way, wherever you place its invocation %nobs(SASHELP.CARS) in your SAS code it will be evaluated and replaced with the corresponding value (in this case it is number 428) by the SAS macro processor. That way you can avoid substandard hard-coding and make your SAS code dynamic and powerful. You can use macro functions in many SAS coding contexts. For example:

  • Assignment statements for macro variable: %let NOBS=%nobs(SASHELP.CARS);
  • Assignment statement in a DATA step: x = %nobs(SASHELP.CARS);
  • As a value of the iterative do loop: do i=1 to %nobs(SASHELP.CARS);
  • As part of condition in IF statement: if %nobs(SASHELP.CARS) > 500 then do;

And so on.

Your thoughts?

Do you find this post useful? Do you use SAS macro functions? Can you suggest other usage examples? Please share with us in the Comments below.

Additional resources

 

How to create and use SAS macro functions was published on SAS Users.

3月 112020
 

Automating SAS applications development

SAS variable labels are unique features of SAS data tables (aka data sets) that allow SAS users to enhance reading and interpretation of tables and reports.

Whether you use SAS data table as a data source in any of the reporting procedures or interactive interface such as SAS Visual Analytics, you will benefit from pre-assigning meaningful labels during the data preparation process. Besides being more efficient, such an early label assignment secures consistency of the data elements descriptions (labels) across different developers.

The most direct way of creating column labels is by explicitly assigning them to the data variables. You can do it during the data table creation in a DATA step using either LABEL statement or ATTRIB statement. Alternatively, you can do it after your data table is already created by using PROC DATASETS’ MODIFY statement with the LABEL= option.

However, in many situations there are ways of automating this tedious and voluminous process of column labels creation. Let’s look at one of them that I found useful for bulk column labeling. Plus, we are going to explore SAS coding technique using _DATA_and_LAST_special data sets.

Deriving variable labels from variable names

This method is suitable when variable names are well-formed, for example CUSTOMER_ADDRESS, FIRST_NAME, LAST_NAME, COMPANY_NAME, PLACE_OF_BIRTH, etc. Kudos to data designer!

We can transform these names into labels by replacing underscores with space characters and converting words from upper case to proper case. These are the labels we will get: Customer Address, First Name, Last Name, Company Name, Place Of Birth.

Let’s say our original data table is DEMO:

data DEMO;
   input CUSTOMER_CITY $ 1-15 FIRST_NAME $ 16-26 LAST_NAME $27-37 COMPANY_NAME $38-50 COUNTRY_OF_BIRTH $51-65;
   datalines;
Washington     Peter      Birn       Citibank     USA
Denver         Lisa       Roth       IBM          UK
Cary           Antony     Bessen     SAS          Spain
;

Then the following macro will create variable labels out of the variable names as described above:

options mprint; 
%macro ilabel (dataset);
   %local lbref dsname vname vlabel nvars;
 
   %if %index(&dataset,.) %then
   %do; /* 2-level dataset name */
      %let lbref  = %scan(&dataset,1,'.');
      %let dsname = %scan(&dataset,2,'.');
   %end;
   %else
   %do; /* 1-level dataset name */
      %let lbref  = WORK;
      %let dsname = &dataset;
   %end;
 
   /* get variable names */
   proc contents data=&dataset out=_data_(keep=name) noprint;
   run;
 
   /* create name/label pairs */
   data _null_;
      set _last_ end=eof nobs=n;
      call symput('vname'!!strip(put(_n_,best.)),name);
      lbl = propcase(translate(name,' ','_'));
      call symput('vlabel'!!strip(put(_n_,best.)),trim(lbl));
      if eof then call symputx('nvars',n);
   run;
 
   /* modify variable labels */
   proc datasets lib=&lbref nolist;
      modify &dsname;
         label
            %do i=1 %to &nvars;
               &&vname&i = "&&vlabel&i"
            %end;
         ;
   quit;
 
%mend ilabel;

You can invoke this macro by either one line of code:

%ilabel(DEMO)

or

%ilabel(WORK.DEMO)

Here are how our DEMO table looks before and after %ilabel macro modifies/assigns the labels based on the column names:

BEFORE:

Data table showing column names

 

AFTER:

Data table showing column labels

Macro code highlights

In this macro, we:

  1. Define local macro variables to make sure their names will not interfere with possible namesakes in the calling program.
  2. Determine libref and one-level data set name for the input data set.
  3. Create a table containing variable names in the input data set using PROC CONTENTS.
  4. Use DATA _NULL_ step to read through the variable names, and derive labels as

    lbl = propcase(translate(name,' ','_'));

    Here, transalate() function replaces underscores with blanks, then propcase() function converts every word in an argument to proper case (upper case for the first character and lower case for the remaining characters). We also create macro variables for each name/label pair (vname1, vlabel1, vname2, vlabel2, …) and macro variable nvars representing the number of such pairs.

  5. Use PROC DATASETS with MODIFY and LABEL statements to assign generated column labels to the source data set.

If some of the labels assigned by this macro are not what you need you may run another PROC DATASETS to individually adjust (re-assign) them according to your wishes or specification. But when you need to label data set columns on a large scale (many tables with dozens or hundreds of columns) this can be a good first draft that can save you time and efforts.

_DATA_ and _LAST_ special data sets

You might notice that I used _data_ data set name in the out= option of the PROC CONTENTS. This is not an explicit data set name; it is a keyword, a special data set that allows SAS to assign one of the available data set names dynamically. The created output data set will have a name that looks something like DATA1 or DATA2, etc. Try running this code:

data _data_;
   x=1;
run;

and look in the SAS LOG at what data set is created. I got:

NOTE: The data set WORK.DATA1 has 1 observations and 1 variables.

Special data set name _data_ tells SAS to create a data set in the WORK library from a list of names DATA1, DATA2, … according to the DATAn naming convention. These names (as well as WORK library) are unique for a given SAS session. The first time you use _data_ within a SAS session it will create data set named WORK.DATA1, the second time you use _data_ it will create WORK.DATA2, and so on.

Consequently, I used special data set name _last_ in the SET statement of the DATA step following the PROC CONTENTS. Again, here _last_ is a keyword, not a data set name; it is a special data set that refers to the name of the last created data set during your SAS session. That causes SAS to use the latest data set created prior to the _last_ reference.

Special data sets _data_ and _last_ are reserved names (or SAS keywords) along with special data set _null_ that is used in the DATA _NULL_ statement and causes SAS to execute the DATA step without creating a data set. (By the way, using DATA _NULL_ can increase your code efficiency when you use the DATA step for custom report writing or creating macro variables or other processing for which the output data set is not needed as it does not consume computer resources for writing and storing the output data set.)

If I were using an explicit table name in this macro instead, and your calling program accidentally were using the same table name, then the macro would overwrite your table which would wreak havoc to your program. Using _data_ and _last_ special data sets protect your SAS program from a possibility of inadvertently overwriting your other data set with the same name by executing the %ilabel macro. It is similar to using %LOCAL for macro variable names for protecting from possible overwrites of your %GLOBAL macro variables with the same names.

A WORD OF CAUTION: Remember, that the _data_ keyword creates table names that are unique only within a SAS session, so it works perfectly for the WORK data library which itself is a unique instance for a SAS session. While it is syntactically correct to use special data set notation _data_ for creating permanent data sets such as libref._data_ (including SASUSER._data_), I have to warn you against using it as it will not guarantee the name uniqueness in the permanent data library, and you may end up overwriting data sets that already exist there.

Your thoughts?

Do you find this post useful? How do you handle the task of assigning variable labels on a mass scale? Do you use _data_ and _last_ special data sets in your SAS coding? Please share in the comments section below.

Automating SAS variable labels creation was published on SAS Users.

1月 272020
 

Workday calendar including weekends and holidays

Way too often, SAS programmers run into a task when for a given date (e.g. event date) there is a need to shift (add or subtract) it by a specified number of days excluding weekends and holidays — in other words to move a date by a given number of workdays. It does not matter how many days off are in our date span as long as it contains exactly the required number of workdays.

For the purpose of this blog post, we will use the following words as synonyms: workdays, work days, working days, business days; as opposed to their antonym: days off.

In the ideal world

If not for gifts from governments and employers called holidays, shifting (incrementing or decrementing) a date by a number of workdays using SAS would be a piece of cake. It’s literally a single line of code using INTNX function with the WEEKDAY date interval. For example, the following DATA Step code:

data _null_;
   event_date = '02JAN2020'd;
   shift_date = intnx('weekday', event_date, -10);
   put shift_date= date9.;
run;

produces in the SAS log:

shift_date=19DEC2019

Here:

  • 'weekday' is date interval covering Monday through Friday;
  • event_date is starting date point;
  • -10 is number of workdays to shift by (positive number means increment; negative number means decrement).

Note, that the WEEKDAY date interval can be modified to accommodate different weekend days. For example:

  • WEEKDAY17W - five-day work week with a Sunday (1) and Saturday (7) weekend (same as WEEKDAY);
  • WEEKDAY1W - six-day week with Sunday (1) as the only weekend day;
  • WEEKDAY67W - five-day week with Friday (6) and Saturday (7) as weekend days, etc.

Holidays schedule

In the real world, however, weekends defined by the WEEKDAY interval are not the only days off, as they do not account for holidays. In the example above, when we shifted our starting date (2 January 2020) by -10 we arrived at 19 December 2019 which means we miscounted several holidays as workdays.

Which holidays (and how many) we miscounted depends on jurisdiction (country, state, business), as their holidays schedules vary. For example, for US federal agencies we would miss (1. New Year – 1Jan2020, 2. Christmas Day – 25Dec2019, and 3. Christmas Eve Day – 24Dec2019 – although this is not an official US federal holiday, most federal employees were given that day off by presidential executive order).

For SAS Institute (USA), we would miscue 6 non-weekend holiday days (Winter Holiday 25Dec2019 – 27Dec2019 and 30Dec2019 - 1Jan2020).

In other countries or businesses, this holidays schedule might be quite different, and that is why this date-shifting task that would account for holidays schedule causes so much confusion. Let’s straighten it out with the help of our old friend – SAS user-defined format. But first, let’s create a workday calendar – a data table listing all OUR work days as well as days off.

Workday calendar

Practically every organization has (or must have) a workday calendar that defines the valid working days and consists of a repeating pattern of days on and days off, as well as exceptions to that pattern. While such a calendar may span multiple years, for our purposes, we can use a subset of that calendar, which reliably covers the date range of our interest.

Let’s create an example of the workday calendar as a SAS data table:

data DAYS_OF_WEEK;
   format DATE date9.;
   do DATE='01JAN2019'd to '31JAN2020'd;
      WEEK_DAY = weekday(DATE);
      DAY_NAME = put(DATE,downame.);
      WORK_DAY = 1<WEEK_DAY<7;
      output;
   end;
run;
 
data DAYS_HOLIDAY;
   format DATE date9.;
   input DATE date9.;
   WORK_DAY = 0;
   datalines;
01JAN2019
21JAN2019
18FEB2019
27MAY2019
04JUL2019
02SEP2019
11NOV2019
28NOV2019
24DEC2019
25DEC2019
01JAN2020
20JAN2020
; 
 
/* Overlay holidays onto weekdays */
data DAYS_WEEKENDS_AND_HOLIDAYS;
   merge
      DAYS_OF_WEEK
      DAYS_HOLIDAY;
   by DATE;
run;

Here is a fragment of the resulting workday calendar table:
Workday calendar table

If date shifting is needed on an individual-level, then workday calendars should be created for every person and must include working days, weekends, holidays as well as personal days off such as vacations, sick days etc.

SAS format to distinguish workdays from days off

Now, for the dates range of our interest, we want to create a SAS user-defined format that lists all the days off while workdays are lumped into the other category. It’s just more efficient that way, as the number of days off is usually smaller than the number of work days so our explicit list of dates will be shorter. For example:

proc format;
   value dayoff
   '01DEC2019'd = 'Y'
   '07DEC2019'd = 'Y'
   '08DEC2019'd = 'Y'
   . . .
   '24DEC2019'd = 'Y'
   '25DeC2019'd = 'Y'
   '01JAN2020'd = 'Y'
   '20JAN2020'd = 'Y'
   other = 'N'
   ;
run;

In this user-defined SAS format values labeled 'Y' mean day off, and values labeled 'N' mean workday. That includes and takes care of both weekends and holidays.

The proc format above serves only for illustrational purposes of what kind of format we are going to create. However, by no means do I suggest implementing it this hard-coded way. Quite the contrary, we are going to create format dynamically and 100% data-driven. Here is how we do it:

data WORK.DAYSOFF (rename=(DATE=START));
   set DAYS_WEEKENDS_AND_HOLIDAYS(where=(WORK_DAY=0)) end=last;
   retain FMTNAME 'dayoff' TYPE 'n' LABEL 'Y';
   output;
   if last then do;
      HLO = 'O';
      LABEL = 'N';
      output;
   end;
run;
 
proc format cntlin=WORK.DAYSOFF;
run;

In the above code, HLO='O' and LABEL='N' are responsible for generating the OTHER category for the dayoff format.

Shifting dates by a number of workdays using dayoff format

With the dayoff user-defined format at hands, we can easily increment or decrement dates by a number of workdays. Here is how:

/* data table of some dates */
data EVENTS;
   do EVENT_DATE='01DEC2019'd to '31DEC2019'd;
      output;
   end;
   format EVENT_DATE date9.;
run;
 
/* Calculating new dates shifted by a number of workdays */
data EVENTS_WITH_SHIFTS;
   set EVENTS;
 
   /* Decrement EVENT_DATE by 10 workdays */ 
   d = EVENT_DATE;
   do i=1 to 10;
      d = d - 1;
      if put(d, dayoff.)='Y' then i = i - 1;
   end;
   BEFORE_DATE = d;
 
   /* Increment EVENT_DATE by 12 workdays */ 
   d = EVENT_DATE;
   do i=1 to 12;
      d = d + 1;
      if put(d, dayoff.)='Y' then i = i - 1;
   end;
   AFTER_DATE = d;
 
   format BEFORE_DATE AFTER_DATE date9.;
   drop d i;
run;

In this code, we decrement (d=d-1) or increment (d=d+1) our event date every time the do-loop iterates. It will iterate while counter i does not exceed the number of workdays. However, within the do-loop we modify counter i to i-1 every time we come across a day off as determined by condition put(d,dayoff.)='Y'. This will effectively exclude days off from counting towards the number of workdays. The do-loop will iterate the number of workdays plus the number of days off thus moving date d by the number of days that includes exactly the given number of workdays (plus some number of days off which we don’t care about). Just pause for a second and think to absorb this.

This simple technique can be modularized by implementing it as a SAS user-defined function or a SAS data-step macro.

User-defined function to shift a date by a number of workdays

Here is the user-defined function shiftwd() that shifts a beginning date specified in the first argument from_date by a number of workdays specified in the second argument shift_by. The second argument can be either positive or negative. Positive second argument means advancing the first argument (incrementing); negative second argument means subtracting workdays from the first argument (decrementing). Both arguments can be either variable names or numerals representing whole numbers.

libname funclib 'c:\projects\shift\functions';
proc fcmp outlib=funclib.funcs.dates; 
   function shiftwd(from_date, shift_by); 
      d = from_date; 
      do i=1 to abs(shift_by); 
         d = d + sign(shift_by); 
         if put(d,dayoff.)='Y' then i = i - 1; 
      end; 
      return(d); 
   endfunc; 
run;

Function usage example:

libname funclib 'c:\projects\shift\functions';
options cmplib= funclib.funcs;
 
data EVENTS_WITH_SHIFTS;
   set EVENTS;
   BEFORE_DATE = shiftwd(EVENT_DATE,-10); /* Decrement EVENT_DATE by 10 workdays */ 
   AFTER_DATE  = shiftwd(EVENT_DATE, 12); /* Increment EVENT_DATE by 12 workdays */ 
   format BEFORE_DATE AFTER_DATE date9.;
run;

SAS macro to shift a date by a number of workdays

Similarly, the same can be implemented as a data-step macro:

%macro shiftwd (fromvar=,endvar=,wdays=,sign=);
   &endvar = &fromvar;
   do i=1 to &wdays;
      &endvar = &endvar &sign 1;
      if put(&endvar, dayoff.)='Y' then i = i - 1;  
   end;
   drop i;
%mend;

This macro has 4 required parameters:

  • fromvar - variable name of the beginning date;
  • endvar - variable name of the ending date;
  • wdays - variable name or numeral representing number of workdays to move from the beginning date;
  • sign - operation sign defining direction of the date move (+ for incrementing, - for decrementing).

Macro usage example:

data EVENTS_WITH_SHIFTS;
   set EVENTS;
   %shiftwd(fromvar=EVENT_DATE,endvar=BEFORE_DATE,wdays=10,sign=-); /* Decrement EVENT_DATE by 10 workdays */ 
   %shiftwd(fromvar=EVENT_DATE,endvar=AFTER_DATE, wdays=12,sign=+); /* Increment EVENT_DATE by 12 workdays */ 
   format BEFORE_DATE AFTER_DATE date9.;
run;

Related materials

Calculating the number of working days between two dates (Blog post)

Custom Time Intervals (SAS Documentation)

Your thoughts?

Do you find this material useful? How do you handle the task of adding or subtracting workdays from a date? Please share in the comments section below.

Shifting a date by a given number of workdays was published on SAS Users.

12月 092019
 

Building on my last post, How to create checklist tables in SAS®, this one shows you how to compare SAS data Check mark and cross mark sets that include common and uncommon columns. You'll learn how to visualize side-by-side columns commonalities and differences in data tables.

As before, we're working with a comparison matrix (aka checklist table) where check-marks / x-marks indicate included / excluded columns.

Data tables will be comparable products while their columns (variables) will represent product features. We'll add background color to highlight which attributes are different in the common columns. Since there might be several different attributes for a given column, we will use a hierarchy typelengthlabel to indicate only the highest mismatched level of hierarchy. For example:

  • If same-named columns have different type (Numeric vs. Character), their corresponding check-mark will be shown on a light-red background, which indicates the highest degree of mismatch.
  • If same-named columns have the same type, a yellow background will indicate any difference in variables length.
  • When same-named variables type and length match, a light-blue background marks any difference in variables label.

SAS code to create color-enhanced comparison matrix

Let’s compare variable attributes in two data tables: one is SAS-supplied SASHELP.CARS, and another WORK.NEWCARS that I derive from the first one, slightly scrambling its column definitions:

data WORK.NEWCARS (drop=temp:);
   set SASHELP.CARS (rename=(Origin=Region EngineSize=temp1 Make=temp2));
   length EngineSize $3 Make $20;
   EngineSize = put(temp1,3.1);
   Make = temp2; 
   label Type='New Car Type';
run;

In this NEWCARS data table, I did the following:

  • Replaced column name Origin with Region
  • Changed type of column EngineSize from Numeric to Character
  • Changed length of column Make from $13 to $20
  • Changed label of column Type from blank to “New Car Type”

Now let’s build the comparison matrix:

proc contents data=SASHELP.CARS noprint out=DS1(keep=Name Type Length Label);
run;
 
proc contents data=WORK.NEWCARS noprint out=DS2(keep=Name Type Length Label);
run;
 
data comparison_matrix;
   merge
      DS1(in=in1 rename=(Type=Typ1 Length=Len1 Label=Lab1))
      DS2(in=in2 rename=(Type=Typ2 Length=Len2 Label=Lab2));
   by Name;
 
   /* set symbol shape: 1=V; 0=X */
   ds1 = 1; ds2 = 1;
   if in1 and not in2 then ds2 = 0; else
   if in2 and not in1 then ds1 = 0;
 
   /* add background color */
   if ds1=ds2=1 then
   select;
      when(Typ1^=Typ2) do; ds1=2; ds2=2; end;
      when(Len1^=Len2) do; ds1=3; ds2=3; end;
      when(Lab1^=Lab2) do; ds1=4; ds2=4; end;
      otherwise; 
   end;
 
   label
      Name = 'Column Name'
      ds1 = 'SASHELP.CARS'
      ds2 = 'WORK.NEWCARS'
      ;
run;
 
proc format;
   value chmark
      0   = '(*ESC*){unicode "2718"x}'
      1-4 = '(*ESC*){unicode "2714"x}'
      ;
   value chcolor
      0   = red
      1-4 = green
      ;
   value bgcolor
      2 = 'cxffccbb'
      3 = 'cxffe177'
      4 = 'cxd4f8d4' 
      ;
run;
 
ods html path='c:\temp' file='comp_marix.html' style=Seaside;
ods escapechar='^';
title 'Data set columns comparison matrix';
 
proc odstext;
   p '<div align="center">Mismatch Legend:'||
     '<span style="background-color:#ffccbb;margin-left:17px">^_^_^_^_</span> Type'||
     '<span style="background-color:#ffe177;margin-left:17px">^_^_^_^_</span> Length'||
     '<span style="background-color:#d4f8d4;margin-left:17px">^_^_^_^_</span> Label</div>'
   / style=[fontsize=9pt];
run;
 
title; 
proc print data=comparison_matrix label noobs;
   var Name / style={fontweight=bold width=100px};
   var ds1 ds2 / style={color=chcolor. backgroundcolor=bgcolor. just=center fontweight=bold width=120px};
   format ds1 ds2 chmark.;
run;
 
ods html close;

Here is a brief explanation of the code:

  1. Two PROC CONTENTS produce alphabetical lists (as datasets) of the data table column names, as well as their attributes (type, length, label)
  2. The DATA STEP merges these 2 lists and creates DS1 and DS2 variables indicating common name (values 1, 2, 3, 4) or uncommon name (value 0).
  3. PROC FORMAT creates 3 user-defined formats chmark, chcolor, bgcolor responsible for checkmark shape, checkmark color, and background color respectively. For checkmark shape, we use Unicode characters, and for colors we use both, color names (e.g. red, green) and hexadecimal RGB color notations (e.g. 'cxFFCCBB').
  4. PROC ODSTEXT’s P statement is used to display color legend for the comparison matrix.
  5. Finally, PROC PRINT with user-defined formats produces our color-enhanced comparison matrix.

Data tables comparison matrix – OUTPUT

The above code will generate the following HTML output with the comparison matrix for variables in two data sets:

Comparison matrix for common/uncommon variables in 2 datasets

Adding more detail to the comparison matrix chart

We can further enhance our output comparison matrix by adding detailed descriptive information about differences between variable attributes. For comprehensive view, we can add a COMMENTS column that spells out differences (attributes mismatches). In addition to the hierarchical logic defining only one mismatch of the highest degree indicated by color highlighting above, comments can include all found discrepancies. Simply add the following two pieces of SAS code:

1. Add the following group of statements to the above DATA Step (right after SELECT statement):

 length Comments $200;
   if ds1>1 then
   do;
      if Typ1^=Typ2 then Comments = catx(' ', Comments, 'Type1=',   Typ1, '; Type2=',   Typ2, ';');
      if Len1^=Len2 then Comments = catx(' ', Comments, 'Length1=', Len1, '; Length2=', Len2, ';');
      if Lab1^=Lab2 then Comments = catx(' ', Comments, 'Label1=',  Lab1, '; Label2=',  Lab2, ';');
   end;

Depending on your needs this Comments can be added unconditionally – you would just need to remove IF-THEN logic keeping only:

length Comments $200;
Comments = catx(' ', Comments, 'Type1=',   Typ1, '; Type2=',   Typ2, ';');
Comments = catx(' ', Comments, 'Length1=', Len1, '; Length2=', Len2, ';');
Comments = catx(' ', Comments, 'Label1=',  Lab1, '; Label2=',  Lab2, ';');

2. Add the following statement to the above PROC PRINT (right before the FORMAT statement):

var comments / style={width=250px};

Then your HTML output will look as follows:

Detailed comparison matrix for common/uncommon variables in 2 datasets

Conclusion

Comparison matrix charts are a convenient tool for data development and metadata validation when you're comparing a data table’s metadata against requirements descriptions.

It allows us to quickly identify tables’ common and uncommon variables, as well as common variable inconsistencies by type, length and other attributes, such as labels and formats.

We can easily add detailed descriptive information when needed.

On a related note

While this post focused on visualizing SAS data sets comparison vis-à-vis common and uncommon columns, it's worth noting SAS websites have plenty of info on finding common variables (or columns) in data sets. For example:

Your thoughts?

Do you find this material useful? What other usages of the checklist tables and color-enhanced comparison matrices can you suggest?

How to compare SAS data tables for common/uncommon columns was published on SAS Users.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

10月 302019
 

Check mark
When I'm about to make a major purchase, I appreciate being able to compare products features at a glance, side by side. I am sure you have seen these ubiquitous comparison tables with check marks showing which features are characteristic of different products and which are not.

These data visualizations, sometimes called comparison matrixes, are also commonly known as checklist tables or checklist table charts. Such charts are extremely useful, persuasive visuals as they allow us to quickly identify differences as well as commonalities between comparable products or solutions and quickly decide which one of them is more desirable or suitable for our needs.

For example, here is such a table that I created in MS Word:
Checklist table example created in Word

SAS code to create checklist table chart

Thanks to SAS’ ability to use Unicode characters in formatted data, it’s very easy to create such a checklist table in SAS. Just imagine that each cell value with visible check mark is assigned value of 1 and each cell value with no check mark is assigned value of 0. That is exactly the data table that lies behind this data visualization. To print this data table with proper formatting, we will format number 1 to a more visually appealing check mark, and 0 to a “silent” blank. Here is the SAS code to accomplish this:

data CHECKLIST;
   length FEATURE $10;
   input FEATURE $1-10 A B C;
   label
      FEATURE = 'Feature'
      A = 'Product A'
      B = 'Product B'
      c = 'Product C';
   datalines;
Feature 1 1 1 1
Feature 2 1 0 1
Feature 3 0 1 1
Feature N 1 0 1
;
 
proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      other = ' ';
   value chcolor
      1 = green;
run;
 
ods html path='c:\temp' file='checklist1.html' style=HTMLBlue;
 
proc print data=CHECKLIST label noobs;
   var FEATURE / style={fontweight=bold};
   var A B C / style={color=chcolor. just=center fontweight=bold};
   format A B C chmark.;
run;
 
ods html close;

If you run this SAS code, your output will look much as the one above created in MS Word:
Checklist table created in SAS
Key elements of the SAS code that produce this checklist table are user-defined formats in the PROC FORMAT. You format the values of 1 to a Unicode 2714 corresponding to a checkmark character ✔ in a user-defined format chmark. Also, the value of 1 is formatted to green color in the chcolor user-defined format. The syntax for using Unicode symbols in user-defined formats is this:

value chmark
1 = '(*ESC*){unicode "2714"x}'

NOTE: ESC here must be upper-case; x at the end stands for “hexadecimal.”

Unicode characters for checklist tables

Unicode or Unicode Transformation Format (UTF) is an international encoding standard by which each letter, digit or symbol is assigned a unique numeric value that applies across different platforms and programs. The Unicode standard is supported by many operating systems and all modern browsers.

It is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The most commonly used Unicode encodings standards are UTF-8 and UTF-16. HTML 5 supports both UTF-8 and UTF-16.

You can use this HTML Unicode (UTF-8) Reference to look up and choose symbols you can embed in your report using SAS user-defined formats. They are grouped by categories to make it easier to find the ones you needed.

Here is just a small random sample of the Unicode symbols that can be used to spice up your checklist tables to get their different flavors:

Unicode characters and codes
You can also apply colors to all these symbols the way we did it in the SAS code example above.

Different flavors of checklist tables

By just changing user-defined formats for the symbol shapes and colors we can get quite a variety of different checklist tables.

For example, we can format 0 to ✘ instead of blank and also make it red to explicitly visualize feature exclusion from product (in addition to explicit inclusion). All we need to do is to modify our PROC FORMAT to look like this:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2714"x}'
      0 = '(*ESC*){unicode "2718"x}';
   value chcolor
      1 = green
      0 = red;
run;

SAS output comparison matrix will look a bit more dramatic and persuasive:
SAS-generated checklist table
Or, if you’d like, you can use the following format definition:

proc format;
   value chmark
      1 = '(*ESC*){unicode "2611"x}'
      0 = '(*ESC*){unicode "2612"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following SAS-generated ballot-like table checklist:
Ballot-like checklist table created in SAS
Here is another one:

proc format;
   value chmark
      1 = '(*ESC*){unicode "1F5F9"x}'
      0 = '(*ESC*){unicode "20E0"x}';
   value chcolor
      1 = green
      0 = red;
run;

producing the following variation of the checklist table:
Another SAS-generated checklist table
As you can see, the possibilities are endless.

Your thoughts?

Do you find these comparison matrixes or checklist tables useful? Do you envision SAS producing them for your presentation, documentation, data story or marketing materials? What Unicode symbols do you like? Can you come up with some creative usages of symbols and colors? For example, table cells background colors...

How to create checklist tables in SAS® was published on SAS Users.

9月 162019
 

SAS SPDS is lightning fastJust when you think you’ve seen it all, life can surprise you in a big way, making you wonder what else you've missed.

That is what happened when I recently had a chance to work with the SAS® Scalable Performance Data Server, a product that's been around a while, but I never crossed paths with before. I open an SPDS table of a hundred million records in SAS® Enterprise Guide, and I can scroll it as fast as if it were an Excel “baby” spreadsheet of a hundred rows. That’s how powerful it feels, to say nothing about the lighting speed of the queries.

What is the SAS Scalable Performance Data Server?

Also known as the SAS SPD Server (or SPDS), it's a data storage system designed for high-performance data delivery. Its primary purpose is to provide rapid queries of vast amounts of data. We are talking terabytes of data with tables containing billions of rows. SPDS employs parallel storage and efficient indexing, coupled with a multi-threaded server system concurrently processing tasks distributed across multiple processors.

Availability of the SPDS client in SAS® Viya effectively integrates SAS SPDS with SAS Viya, extending functionality of its applications beyond the native Cloud Analytic Services (CAS) where you can continue reaping all the benefits of the SAS SPDS.

SPDS library

In addition to connecting to SPD Server with explicit SQL pass-through, connection to SPD Server with a LIBNAME statement is available as well, for example:

libname mylibref sasspds 'serverdomain' host='nodename_or_ip' service='5400'
                         user='mySPDuserid' password='{SAS003}XXXXXXX...XXX';

This effectively creates an SPDS library, and the tables in that library can be referenced by two-level name mylibref.tablename as if this were a SAS BASE library.

Cluster tables vs. member tables

Besides ordinary data tables, SPDS library offers so called dynamic cluster tables – or clusters for short – enabling transparent access to large amounts of data.

Dynamic cluster tables (cluster tables or clusters) are virtual tables that allow users to access many server tables (member tables) as if they were one table. A dynamic cluster table is a collection of SPD Server tables that are presented to the end-user application as a single table through a metadata layer acting like a view.

Member tables can be added to the cluster as well as replaced and removed from the cluster.

The role of PROC SPDO

PROC SPDO is the SAS procedure for the SPD Server operator interface. It performs a wide range of SPD server, user and table management tasks:

  • create, list, modify, destroy, and undo dynamic cluster tables
  • add, remove, replace, and fix cluster table members
  • add, modify, list, and delete access control lists (ACLs) for server resources
  • define, describe, and remove WHERE constraints on tables for row-level security definition and management
  • issue system commands on server nodes

In addition to PROC SPDO, SPD Server plug-in for SAS® Data Management Console is also available.

Retrieving SPDS library contents

If you open an SPDS library in SAS Enterprise Guide, you won’t be able to tell which table in that library is a member table and which is a cluster table – they all look the same. But in many cases, we need to know what is what. Moreover, for data-driven processing we need to capture the SPDS library objects into a dataset and identify them whether they are clusters or member tables.

Luckily, PROC CONTENTS with OUT= option allows us to do just that. While MEMTYPE column is equal to ‘DATA’ for both, clusters and member tables, there is another, less known column inversely called TYPEMEM that has value of 'DATA' for clusters and blank value ' ' for member tables. The following simple code allows you to retrieve SPDS library objects list into WORK.SPDSTYPES dataset where TABLETYPE column specifies whether it’s a cluster or a member for each library object MEMNAME:

proc contents data=SPDSLIB._all_ out=WORK.ALLOBJECTS (keep=MEMNAME TYPEMEM);
run;
 
proc sort data=WORK.ALLOBJECTS nodupkey;
   by MEMNAME;
run;
 
data WORK.SPDSTYPES;
   set WORK.ALLOBJECT;
   attrib TABLETYPE $7 label='SPDS table type';
   select(TYPEMEM);
      when('DATA') TABLETYPE = 'CLUSTER';
      when('')     TABLETYPE = 'MEMBER';
      otherwise    TABLETYPE = '';
   end;
run;

In this code PROC CONTENTS produces one record per column NAME in every object MEMNAME in the SPDS library; PROC SORT reduces (un-duplicates) this list to one record per MEMNAME; finally, data step creates TABLETYPE column indicating which MEMNAME is CLUSTER and which is MEMBER.

Retrieving SPDS cluster’s member list

In addition to retrieving a list of objects in the SPDS library described above, we also need a way of capturing the content (a list of members) of the cluster itself in order to control removing or replacing its members. PROC SPDO’s CLUSTER LIST statement produces such a list, and its OUT= option allows you to dump that list into a dataset:

proc spdo lib=SPDSLIB;
   cluster list CLUSTER1 out=CLUSTER1_MEMBERS;
   cluster list CLUSTER1 out=CLUSTER2_MEMBERS;
   /* ... */
   cluster list CLUSTER1 out=CLUSTERN_MEMBERS;
quit;

This approach creates one output table per cluster, and you can’t use the same OUT= destination table for different clusters, for they will be overwritten with each subsequent CLUSTER LIST statement, not appended.

If you need to capture contents of several clusters into one dataset, then instead of the above method of outputting each cluster content into separate table and then appending (concatenating) them, the good old ODS OUTPUT with CLUSTERLIST= option allows us to do it in a single step:

ods noresults;
ods output clusterlist=WORK.CLUSTER_MEMS;
proc spdo lib=SPDSLIB;
   cluster list CLUSTER1;
   cluster list CLUSTER2;
   /* ... */
   cluster list CLUSTERN;
quit;
ods output close;
ods results;

As additional bonus ODS NORESULTS suppresses printed output when it’s not needed, e.g. for automatic data-driven processing.

Your thoughts?

What is your experience with SAS SPDS? How might you use it in the future? Please comment below.

How to retrieve contents of SAS® Scalable Performance Data Server library was published on SAS Users.

8月 182019
 

Have you ever thought of selling sand on the beach? Neither have I. To most people the mere idea is preposterous. But isn’t it how all great discoveries and inventions are made? Someone comes up with an outwardly crazy, outlandish idea, and despite all the skepticism, criticism, ostracism, ridicule and [...]

Selling sand at the beach was published on SAS Voices by Leonid Batkhan

7月 172019
 

Problem solving: thinking inside the box

Have you ever tried to pass comma-delimited values to SAS macro or to a SAS macro function? How can SAS distinguish commas separating parameters or arguments from commas separating parts of the values?

Passing comma-delimited value as an argument to a SAS macro function

Let’s say you want to extract the first word from the following string of characters (these words represent column names in the SASHELP.CARS data table):

make, model, type, origin

If you run the following code:

%let firstvar = %scan(make, model, type, origin, 1);

you get is the following ERROR in your SAS log:

ERROR: Macro function %SCAN has too many arguments.

That is because %scan macro function sees and treats those make, model, type and origin as arguments since commas between them are interpreted as argument separators.

Even if you “hide” your comma-delimited value within a macro variable, it still won’t do any good since the macro variable gets resolved during macro compilation before being passed on to a macro or macro function for execution.

%let mylist = make, model, type, origin;
%let firstvar = %scan(&mylist, 1);

You will still get the same ERROR:

ERROR: Macro function %SCAN has too many arguments.

Passing comma-delimited value as a parameter to a SAS macro

Try submitting the following code that passes your macro variable value to a SAS macro as a parameter:

%let mylist = make, model, type, origin;
%macro subset(dsname=, varlist=);
   proc sql;
      select &varlist
      from &dsname;
   quit;
%mend subset;
%subset(dsname=SASHELP.CARS, varlist=&mylist)

You will get another version of the SAS log ERROR:

ERROR: All positional parameters must precede keyword parameters.
NOTE: Line generated by the macro variable "MYLIST".
1                 type, origin
                  ----
                  180
ERROR 180-322: Statement is not valid or it is used out of proper order.

In this case, macro %subset gets as confused as the %scan function above because your macro variable will get resolved during macro compilation, and SAS macro processor will see the macro invocation as:

%subset(dsname=SASHELP.CARS, varlist=make, model, type, origin)

treating each comma as a parameter separator.

All this confusion happens because SAS functions’ arguments and SAS macros’ parameters use commas as their separators, while resolved macro variables introduce their own values’ comma delimiters into the functions/macros constructs’ picture, thus wreaking havoc on your SAS program.

It’s time for a vacation

But don’t panic! To fight that chaos, you need to take a vacation. Not a stay-home, do-nothing vacation, but some serious vacation, with faraway destination and travel arrangements. While real vacation is preferable, an imaginary one would do it too. I mean to start fighting the mess with comma-separated values, pick your destination, book your hotel and flight, and start packing your stuff.

Do you have a “vacation items list”? In my family, we have an individual vacation list for every family member. How many items do you usually take with you? Ten, twenty, a hundred?

Regardless, you don’t show up at the airport checkpoint with a pile of your vacation items. That would’ve been too messy. I don’t think you would be even allowed boarding with an unpacked heap of your stuff. You come to an airport neatly rolling a single item that is called a suitcase. Well, I suppose that some of you may have two of them, but I can’t imagine more than that.

You only started your fantasy vacation, you haven’t even checked in to your flight, but you have already have a solution in your sight, a perfect combine-and-conquer solution for passing comma-delimited values. Even if you have not yet realized that it’s in your plain view.

Thinking inside the box

Forget about “thinking outside the box” metaphor. You can’t solve all your problems with a single strategy. Sometimes, you need to turn your thinking on its head to solve, or even to see the problem.

As for your airport check-in, instead of thinking outside the box, you thought “inside the box” and brought your many items “boxed” as a single item – a suitcase. A container, in a broader sense.

That is exactly how we are going to approach our comma-delimited lists problem. We are going to check them in to a macro or a macro function as a single, boxed item. Just like this:
Passing a comma-separated value to SAS macro or SAS macro function
Or like this:
passing SAS macro variable with comma-separated value to SAS macro or SAS macro function

Not surprisingly, SAS macro language provides a variety of these wonder boxes for many special occasions collectively known as macro quoting functions. Personally, I would prefer calling them “macro masking functions,” as they have nothing to do with “quoting” per se and have everything to do with masking various characters during macro compilation or macro processing. But that is what “macro quoting” means – masking, boxing, - similar to “quoting” a character string to make it a single entity.

Different macro quoting functions mask different special characters (+ - , / ; = etc.) and mnemonics (AND OR GT EQ etc.) so that the macro facility interprets them as text instead of as language symbols.

Here are all 7 SAS macro quoting functions, two of which work at macro compilation - %STR() and %NRSTR(), while other 5 work at macro execution - %QUOTE() and %NRQUOTE(), %BQUOTE() and %NRBQUOTE(), and %SUPERQ().

You may look up what symbols they mask and the timing they apply (macro compilation vs. macro execution) in this macro quoting functions summary. You may also want to look at the following cheat sheet: Deciding When to Use a Macro Quoting Function and Which Function to Use.

As general rule of thumb, use macro quoting functions at compilation time when you mask text constants - (make, model, type, origin); use macro quoting functions at execution time when you mask macro or macro variable references containing & or % - (&mylist).

NOTE: There are many other SAS macro functions that besides their main role also perform macro quoting, e.g. %QSCAN(), %QSUBSTR() and others; they all start with %Q.

Masking commas within a comma-delimited value passed as an argument or a parameter

It turns out that to mask (or to “box”) comma-separated values in a macro function or a SAS macro, any macro quoting function will work. In this case I would suggest using the simplest (and shortest) %STR(). %STR() applies during macro compilation and serves as a perfect “box” for our comma-delimited values to hide (mask) commas to receiving macro function or a macro does not confuse them with its own commas separating arguments / parameters.

With it we can re-write our above examples as:

%let firstvar = %scan(%str(make, model, type, origin), 1);
%put &=firstvar;

SAS log will produce exactly what we expected:

FIRSTVAR=make

Similarly, we can call the above SAS macro as:

%subset(dsname=SASHELP.CARS, varlist=%str(make, model, type, origin) )

It will run without ERRORs and produce a print of the SASHELP.CARS data table with 4 columns specified by the varlist parameter value:

SAS output table as a result of macro run

Masking commas within a macro variable value passed as an argument or parameter

When you assign a comma-delimited list as a value to a macro variable, we want to mask commas within the resolved value during execution. Any of the execution time macro quoting functions will mask comma.

Again, in case of multiple possibilities I would use the shortest one - %QUOTE().

With it we can re-write our above examples as:

%let mylist = make, model, type, origin;
 
%let firstvar = %scan(%quote(&mylist), 1);
 
%subset(dsname=SASHELP.CARS, varlist=%quote(&mylist))

But just keep in mind that the remaining 4 execution time macro quoting functions - %NRQUOTE(), %BQUOTE(), %NRBQUOTE() and %SUPERQ() - will work too.

NOTE: The syntax of the %SUPERQ() function is quite different from the rest of the pack. The %SUPERQ() macro function takes as its argument either a macro variable name without an ampersand or a macro text expression that yields a macro variable name without an ampersand.

Get it going

I realize that macro quoting is not a trivial matter. That is why I attempted to explain its concept on a very simple yet powerful use case. Hope you will expand on this to empower your SAS coding skills.

Passing comma-delimited values into SAS macros and macro functions was published on SAS Users.