Leonid Batkhan

10月 182022
 

Chameleons are examples of adaptability to environments for they change their colors to better blend into their surroundingThe SAS applications development process typically includes the following three phases of the software development life cycle (SDLC): Development (Dev), Testing (Test) and Production (Prod).

In order to protect the Production (or Operation) environment from unforeseen disruptions, these stages are usually implemented in their own separate environments. Sometimes, Development and Testing environments are combined into a single Development environment, which just treats Testing as an integral part of the Development stage.

(For brevity of the further narrative, we will use the two-stage process/environment paradigm, which can be easily expanded as needed.)

During the software development life cycle, when our programming modules are designed and built (developed), tested, validated and have proved to reliably produce the desired results, we promote (transition) them from Dev to the Prod environment.

The software promotions are often managed by change control procedures adopted by your organization.

Software promotion limitations

However, no matter how strict the change control procedures are or how scrupulously we conduct testing and promotion from Dev to Prod, we must recognize the limitations of this paradigm. The alternative would be a fallacy of dreams in “wishful thinking” territory.

Let’s look into two highly desirable, but rarely attainable, “dreams”.

Dream 1

If a software is validated and reliably works in Dev, it is guaranteed that it will work just fine in Prod and therefore no testing in Prod is required.

Reality 1

Although a thorough testing in Dev will keep failures in Prod to a minimum, it cannot completely eliminate them or guarantee your code’s success when running in Prod. Even if a promoted module does not require even a single change and you just copy it from Dev to Prod, there are still many possibilities that it will not work properly in Prod. Consider potential environmental differences such as data sources (structures and access), file system permissions and SAS metadata groups and permissions.

You will need to account for these environmental differences and still conduct some testing in Prod to make sure that the environment itself does not break your code/software.

Dream 2

When software is developed, tested and validated in Dev, promoting it to Prod is just a matter of copying it from Dev to Prod without any changes/adjustments.

Reality 2

However, in actuality, again environmental differences may require adjusting your code for the new environment. For example, data sources and data targets may have different library names or library references, or different directories or files names. If your application automatically sends out an email to a distribution list of your application users, you most likely will have different distribution lists in Dev vs. Prod.

Of course, you may capture all these differences into a configuration file, which will keep all the differences between Dev and Prod environments out of your code. However, in this case, your application becomes data-driven (with the configuration file being a data driver) and, as such, the configuration file itself becomes part of your software. Obviously, these configuration files simply cannot be copied from Dev to Prod during the promotion since they are inherently different.

Therefore, you cannot say that for promotion from Dev to Prod you can just copy your software. At least that is not true for that configuration file.

This reality makes the promotion process quite sensitive and susceptible to errors. Besides, we must maintain two versions of the application (Dev and Prod) and make efforts not to mix them up and not overwrite each other.

Adaptive programming

Adaptability is the ability to adjust to different environments, much like a chameleon changes its color to better blend into its surroundings.

If we can develop a program that automatically adjusts itself to run in either the Development or Production environment, it would mean that we have found a solution making Dream 2 a reality.

You can copy such a program between the environments back and forth and you don’t need to change a thing in it. Because it is identical in both, Dev and Prod.

You can even store a single version of this adaptive program in a centralized shared location and access it to run by either the Development or Production server. This holds true even though these environments may use different data sources, different data targets, different email distribution lists…

The key for such an implementation is the program’s self-awareness when it knows which environment it is running in, and based on that knowledge adapts to the environment by adjusting its configuration variables.

But these are just words of make-believe promises. Let’s get to the proof.

Identifying the environment your SAS program runs in

Suppose you have SAS software installed in the following two environments:

  • Development, where the SAS application server runs on computer DEVSRV.YOURDOMAIN.COM
  • Production, where the SAS application server runs on computer PRODSRV.YOURDOMAIN.COM

If you have SAS® Enterprise BI Server or SAS® BI Server installed you may have your metadata servers either on the same device/computer/machine/server as the application servers or on separate machines. Let’s say these are separate machines: DEVMETASRV.YOURDOMAIN.COM and PRODMETASRV.YOURDOMAIN.COM.

When we run a program in a particular environment (server), we usually know in which environment we run this program. However, in order to build an adaptive program we need to make sure the program itself knows which environment it is running in.

The key here is the SYSHOSTNAME automatic macro variable that the SAS System conveniently makes available within any SAS session. It contains the host name of the computer that is running the SAS program. In our adaptive program, &SYSHOSTNAME macro variable reference will be equal to either DEVSRV or PRODSRV depending on which environment our program runs in.

Making a SAS program adapt to the Development or Production environment

Now, when our program knows its running environment, we can make it adjust to that environment.

Let’s consider a simplified scenario where our program needs to do the following different things depending on the running environment.

Adaptability use case

1. Read a source data table SALES from a metadata-bound library:

  • In DEV: library named ‘SQL CORP DEV’;
  • In PROD: library named ‘SQL CORP’.

2. Produce target Excel file, report.xlsx and save it a specified directory on a non-SAS machine:

  • In DEV: \\DEVOUTSRV\outpath
  • In PROD: \\PRODOUTSRV\outpath

3. Send out an email to a distribution list (contact group) of users when the report is ready:

Adaptive code implementation

The following SAS code illustrates this adaptive technique:

/* ----- Configuration section ----- */
 
%if "&syshostname"="DEVSRV" %then
%do; /* Development server */
   %let metasrv = DEVMETASRV;
   %let lname = SQL CORP DEV;
   %let outsrv = DEVOUTSRV;
   %let tolist = "developers@yourdomain.com";
   options symbolgen mprint mlogic fullstimer;
%end;
%else
%if "&syshostname"="PRODSRV" %then
%do; /* Production server */
   %let metasrv = PRODMETASRV;
   %let lname = SQL CORP;
   %let outsrv = PRODOUTSRV;
   %let tolist = "businessusers@yourdomain.com" "developers@yourdomain.com";
   options nosymbolgen nomprint nomlogic nofullstimer;
 
   /* adjust email distribution list based on test_trigger_YYYYMMDD.txt file existence */
   %let yyyymmdd = %sysfunc(date(),yymmddn8.);
   %if %sysfunc(fileexist(&projpath\test_trigger_&yyyymmdd..txt) %then
      %let tolist = "testers@yourdomain.com";
%end;
%else
%do; /* Unauthorized server */
   %put ERROR: This program is not designed to run on &syshostname server;
   %put ERROR: SAS session has been terminated.;
   endsas;
%end;
 
options 
   metaserver="&metasrv" metaport=8561 metarepository=Foundation metaprotocol=bridge
   metauser='service-accout-ID' metapass="encrypted-sas-password";
 
libname SRCLIB meta "&lname";
 
%let outpath = \\&outsrv\outpath;
%let outname = Report.xlsx;
 
/* ----- End of Configuration section ----- */
 
/* Produce Excel report */
ods excel file="&outpath\&outname";
proc print data=SRCLIB.SOMETABLE;
   where Product='Loan';
run;
ods excel close;
 
/* Send out email */
filename fm email to=(&tolist) from='sender@mydomain.com' subject='Your subject';
 
data _null_;
   file fm;
   put '*** THIS IS AUTOMATICALLY GENERATED EMAIL ***' //
       "Loan report job completed successfully on &syshostname server." /
       "The following Excel file has been generated: &outpath\&outname" //
       'Sincerely,' / 'Your automation team';
run;

Code highlights

As you can see, in the configuration section we conditionally set various macro variables (including &tolist distribution list) and global options depending on where this code runs, in Dev or Prod, determined by the &SYSHOSTNAME macro variable. For unauthorized &SYSHOSTNAME values, we write a relevant message to the SAS log and terminate the SAS session.

Then we establish connection to the metadata server, assign source data library SRCLIB as well as output directory (&outpath) and output file name (&outname).

Based on these conditionally defined macro variables and libref, in the remaining sections, we:

You can wrap this configuration section into a macro and store it as a separate file; then the main program would just invoke this macro. In either case, the code achieves total environmental awareness and naturally, logically adapts to the environment in which we designed it to function.

This adaptive coding approach ensures 100% equality between Dev and Prod code versions. In fact, the two versions of the code are identical. They are self-contained, self-reliant, and do not require any adjustments during promotion throughout the development life cycle. You can copy them from one environment to another (and vice versa) without fear of accidentally wiping out and replacing the right file with the wrong one.

Can we say that this implementation makes Dream 2 come true?

Testing in the Production environment

Notice a little section dubbed /* adjust email distribution list based on test_trigger_YYYYMMDD.txt file existence */ within the Production Server logic. It allows for triggering a test run, in this case by limiting the scope of email distribution in Prod. Just plant / create a file named test_trigger_YYYYMMDD.txt in a specified location &projpath and run your code on a desired test date YYYYMMDD.  You can delete this file afterwards if a full-scale run is scheduled for the same date or otherwise keep it for future reference (it becomes “harmless” and irrelevant for any other dates). You can use the same trigger file tactic to modify other parts of your code as needed in both Prod and Dev.

Even though this feature does not make Dream 1 come true, it does alleviate its Reality.

Questions? Thoughts? Comments?

Do you find this blog post useful? Do you have questions, concerns, suggestions, or comments? Please share with us below in the Comments section.

Additional Resources

Adaptive SAS programming for the Software Development Life Cycle was published on SAS Users.

9月 202022
 

SAS automation using Windows batch scripts

Let’s consider the following ubiquitous scenario. You have created a SAS program that you want to run automatically on schedule on a SAS server under the Microsoft Windows operating system.

If you have SAS® Enterprise BI Server or SAS® BI Server and Platform Suite for SAS you can do the scheduling using the Schedule Manager plug-in of SAS Management Console.

But even if you only have SAS Foundation installed on a Windows server (or any Windows machine), you can automate/schedule your SAS program to run in batch using Windows Task Scheduler. In order to do that you would need a Windows batch file.

What are Windows batch files?

A Windows batch file (or batch script) is a text file consisting of a series of Windows commands executed by the command-line interpreter. It can be created using a plain text editor such as Notepad or Notepad++ and saved as a text file with the .bat file name extension. (Do not use a word processor like Microsoft Word for batch script writing!)

Historically, batch files originated in DOS (Disk Operating System), but modern versions of Windows continually support them. Batch files get executed when you double click on them in Windows File Explorer or type & enter their name in the Command Prompt window.

Even though Windows batch files are not as powerful as PowerShell scripts or Unix/Linux scripts they are still quite versatile and useful for automating processes running SAS in batch mode.

Besides simply running a bunch of OS command sequentially one after another, batch scripts can be more sophisticated. For example, they can use environment variables (which are similar to SAS macro variables). They can use functions and formats. They can capture exit code from one command (or SAS program) and then, depending on its value, conditionally execute a command or a group of commands (which might include another SAS program or another batch script). They can even do looping.

They can submit a SAS program to run by SAS System in batch mode and pass to this program a parameter string.

They can control where SAS program saves its log and generate a dynamic log file name adding a datetime stamp suffix to it.

Windows batch script examples

Let’s explore a couple of examples.

Example 1. Simple script running SAS program in batch mode

set sas=D:\SAS94\sashome\SASFoundation\9.4\sas.exe
set proj=C:\Projects\SAS_to_Excel
set name=mysasprogram
set pgm=%proj%\%name%.sas
set log=%proj%\logs\%name%.log
%sas% -sysin %pgm% -log %log% -nosplash -nologo -icon

Here, we define several environment variables (proj, name, pgm, log) and reference these variables by surrounding their names with %-sign as %variable-name% (analogous to SAS macro variables which are defined by %let mvar-name = mvar-value; and referenced as &mvar-name).

This script will initiate SAS session in batch mode which executes your SAS program mysasprogram.sas and outputs SAS log as mysasprogram.log file.

Example 2. Running SAS program in batch and date/time stamping SAS log

Batch scripts are often used to run a SAS programs repeatedly at different times. In order to preserve SAS log for each run, we can assign unique names for the log files by suffixing their names with a date/time. For example, instead of saving SAS log with the same name mysasprogram.log we can dynamically generate unique names for SAS log, e.g. mysasprogram_YYYYMMDD_HHMMSS, where YYYYMMDD_HHMMSS indicates the date (YYYYMMDD) and time (HHMMSS) of run. This will effectively keep all SAS logs and indicate when (date and time) each log file was created. Here is a Windows batch script that does it:

: generate suffix dt=YYYYMMDD_HHMMSS, pad HH with leading 0
set z=%time: =0%
set dt=%date:~-4%%date:~4,2%%date:~7,2%_%z:~0,2%%z:~3,2%%z:~6,2%

set sas=D:\SAS94\sashome\SASFoundation\9.4\sas.exe
set proj=C:\Projects\SAS_to_Excel
set name=mysasprogram
set pgm=%proj%\%name%.sas
set log=%proj%\logs\%name%_%dt%.log
%sas% -sysin %pgm% -log %log% -nosplash -nologo -icon

Windows batch scripts with conditional execution

Let’s enhance our script by adding the following functionality:

  • Captures exit code from the mysasprogram.sas (exit code 0 mean there are no ERRORs or WARNINGs)
  • If exit code is not equal to 0, conditionally execute another SAS program my_error_email.sas which sends out an email to designated recipients informing them that mysasprogram.sas failed (successful execution email can be sent from mysasprogram.sas itself).

One would expect that it can be achieved by adding the following scripting code to the above example 2:

: capture exit code from sas
set exitcode=%ERRORLEVEL%

: generate email if ERROR and/or WARNING
if not %exitcode% == 0 (
   set ename=my_error_email
   set epgm=%proj%\Programs\%ename%.sas
   set elog=%proj%\SASLogs\%ename%_%dt%.log
   %sas% -sysin %epgm% -log %elog% -nosplash -nologo -icon -sysparm %log%
)

However, you might be in for a big surprise (I was!) when you discover that my_error_email.sas program runs regardless of whether exitcode is equal on not equal to 0.  How is that possible!

It turned out that Windows script environment variable references in form of %variable-name% do not resolve at execution time like SAS macro variable references &mvar-name or Unix/Linux script variable references $variable-name . They resolve during the initial script parsing before the run-time exitcode is evaluated. As a result, all the commands within the parentheses of the IF-command (including SAS session kickoff) are resolved and executed unconditionally.

Initially, DOS (and later Windows) scripts were implemented without conditional execution (IF command) and looping (FOR command) functionality and their environment variable references were resolving during script parsing. Later, when scripting language was brought to a higher standard that did include conditional execution and looping, the developers decided to keep the original  behavior of the %variable-name% references intact, but added a new form of the environment variable references !variable-name! surrounding variable names with exclamation marks. They called it "delayed expansion" and cardinally altered the variable references behavior causing them to resolve (expand) during execution time rather than parse time.

The following scripting command enables delayed expansion:

SetLocal EnableDelayedExpansion

We can place this SetLocal command right before the IF section and replace variable references in it with !variable-name! . Alternatively, for consistency, we can place SetLocal EnableDelayedExpansion at the beginning of the script and replace all environment variable references with !variable-name! . In the latter case, all our variables will be resolved at execution time.  Here is the final script:

SetLocal EnableDelayedExpansion

: generate suffix dt=YYYYMMDD_HHMMSS, pad HH with leading 0
set z=!time: =0!
set dt=!date:~-4!!date:~4,2!!date:~7,2!_!z:~0,2!!z:~3,2!!z:~6,2!

set sas=D:\SAS94\sashome\SASFoundation\9.4\sas.exe
set proj=C:\Projects\SAS_to_Excel
set name=mysasprogram
set pgm=!proj!\!name!.sas
set log=!proj!\logs\!name!_!dt!.log
!sas! -sysin !pgm! -log !log! -nosplash -nologo -icon

: capture exit code from sas
set exitcode=!ERRORLEVEL!

: generate email if ERROR and/or WARNING
if not !exitcode! == 0 (
   set ename=my_error_email
   set epgm=!proj!\Programs\!ename!.sas
   set elog=!proj!\SASLogs\!ename!_!dt!.log
   !sas! -sysin !epgm! -log !elog! -nosplash -nologo -icon -sysparm !log!
)

Notice, how we pass in to the my_error_email.sas program the log name of failed mysasprogram.sas:

-sysparm !log!

This log name can be captured in the my_error_email.sas program by using the SYSPARM automatic macro variable:

%let failed_log = &sysparm;

Then, it can be used either to attach that log file to the automatically generated email or at least provide its path and name in the email body.

Job scheduling

With Windows batch script file in place, you can easily schedule and run your SAS program in batch mode on a SAS machine that have just SAS Foundation installed using Microsoft Windows Task Scheduler. In order to do that you would need to specify your script’s fully qualified name in the Windows Task Scheduler (Create Task →  New Action Program/script field) as shown below:

Creating a new Task in Windows Task Scheduler

Then you would need to specify (add) new Trigger(s) that ultimately define the scheduling rules:

Setting up a new Trigger to define scheduling rules

That’s all. You can now sleep well while your job runs at 3:00 am.

Questions? Thoughts? Comments?

Do you have questions, concerns, comments or use other ways of automating SAS jobs? Please share with us below in the Comments section.

Additional Resources

Automating SAS processes using Windows batch files was published on SAS Users.

7月 072022
 

Multi-tab address bookIn this post, we expand on my previous post Automating Excel workbooks creation using SAS and demonstrate how to automatically (programmatically) split a data table into separate worksheets (tabs) of a single Microsoft Excel workbook.

While there are multiple ways of splitting data tables, for the purpose of this post we are going to split up a data table based on the value of one of its categorical variables (columns).

The approach described here is applicable to both SAS data tables and non-SAS data tables. In either case, you don’t even have to explicitly create interim data table subsets – each data table gets split directly into multiple Excel sheets.

In addition, there are no manual steps, the whole process is data-driven and is suitable for automatic execution either on its own schedule or as part of another automated process.

SAS macro solution for splitting a dataset into Excel worksheets

Let’s consider the following task. We have a data table SASHELP.FISH that has 159 observations and 7 variables. One of the variables, SPECIES, has few unique values (such as ‘Perch’, ‘Whitefish, ‘Pike’, ‘Smelt’, etc.) We want to split this dataset by the unique SPECIES values so all the observations pertaining to each species are presented in separate worksheets of a single Excel workbook. These worksheets will be accessible by clicking on the corresponding tabs of the Excel workbook.

The following code demonstrates how this can be done using SAS macro with a macro loop.

/* Split data table into multiple sheets of an Excel workbook */
 
/* Create a list of and count unique SPECIES  */
proc sql noprint;
   select distinct SPECIES, count(distinct SPECIES)  
      into :sp_list separated by ' ', :nsp   
   from SASHELP.FISH;
quit;
 
/* Create TABS color list */
%let color_list = #feb8b6 #c4feb6 #fefdb6 #feb6fa #b6d5fe #f0b6fe #fee6b6;
 
/* Initiate ODS EXCEL destination */
filename target 'C:\Projects\SAS_to_Excel\Fish.xlsx';
ods excel file=target options (frozen_headers='on' embedded_titles='on');
 
/* Macro to loop through the list of unique TABS values */
%macro split_data_to_excel (intable=, tabvar=);
   %do i=1 %to &nsp;
      %let sp = %scan(&sp_list, &i, %str( ));
      %let tc = %scan(&color_list, &i, %str( ));
      ods excel options (sheet_name="&sp" tab_color="&tc");
      title "&intable where &tabvar=&sp";
      proc print data=&intable noobs; 
         where &tabvar="&sp";
      run;
   %end;
%mend split_data_to_excel;
 
%split_data_to_excel (intable=SASHELP.FISH, tabvar=SPECIES)
ods excel close;

Here are the code highlights:

  • The first step, PROC SQL, creates two macro variables - sp_list containing a list of unique (distinct) values of variable SPECIES, and nsp containing the number (count) of the unique values for SPECIES.
  • %let color_list assigns a list of colors (hexadecimal values) that we use to color-code our tabs.
  • Then we specify the output Excel workbook file and open ODS EXCEL destination. Here we specify options (frozen_headers='on' embedded_titles='on') which are going to apply to all the tabs (sheets) in our Excel workbook.
  • Then we define macro split_data_to_excel that accepts two parameters:
    • intable – input table name;
    • tabvar – tab variable whose unique values are used for subsetting the input data.
  • Within this macro we have a macro loop with macro variable index i that iterates from 1 to &nsp (number of unique &tabvar values).
  • Within the macro loop, we scan &sp_list and &color_list for &i-th word and use that word in the following statement:
    ods excel options (sheet_name="&sp" tab_color="&tc");
    With each iteration of the macro %DO-loop, this statement creates a new sheet, gives it a name and assigns a tab color.
  • The following TITLE and PROC PRINT will be generated for each %DO iteration and produce a subset of the input data table on the corresponding sheet.
  • After the macro definition, we invoke this macro as
    %split_data_to_excel (intable=SASHELP.FISH, tabvar=SPECIES)
    and close ODS Excel.

Running this code will produce the following Excel workbook:

Multi-tab Excel workbook with colored tabs
As you can see, each SPECIES is presented in its own worksheet (tab) and each tab is color-coded and properly labeled.

Data step solution for splitting a dataset into Excel worksheets

Besides the above macro solution, there is an alternative solution using a single SAS data step with CALL EXECUTE to dynamically generate SAS code and push it outside of the data step boundaries for execution. The following data step solution replaces the macro definition and macro invocation in the above macro solution:

data _null_;
   intable = 'SASHELP.FISH';
   tabvar = 'SPECIES';
   do i=1 to &nsp;
      sp = scan("&sp_list",i,' ');
      tc = scan("&color_list",i,' ');
      call execute('ods excel options (sheet_name="'||trim(sp)||'" tab_color="'||trim(tc)||'");');
      call execute('title "'||intable||' where '||tabvar||'='||trim(sp)||'";');
      call execute('proc print data='||intable||' noobs;');
      call execute('   where '||tabvar||'="'||trim(sp)||'";');
      call execute('run;');
   end;
run;

For each unique SPECIES value, CALL EXECUTE will generate SAS code and push it outside of the data step in a queue where SAS compiles and executes it after the data step in the order it was generated. For each do-loop iteration, you will see the following corresponding snippets (marked with + sign) of the generated code in the SAS log:

NOTE: CALL EXECUTE generated line.
1         + ods excel options (sheet_name="Bream" tab_color="#feb8b6");
2         + title "SASHELP.FISH where SPECIES=Bream";
3         + proc print data=SASHELP.FISH noobs;
4         +    where SPECIES="Bream";
5         + run;
 
NOTE: There were 35 observations read from the data set SASHELP.FISH.
      WHERE SPECIES='Bream';

The produced output will be identical to the above macro solution.

Questions

Which solution is more appealing to you? Do you have questions, comments, suggestions, ideas, other solutions, tips or tricks about splitting a data table into multiple Excel worksheets? Please share with us in the Comments section below.

Additional Resources

Splitting a data table into multiple sheets of an Excel workbook was published on SAS Users.

6月 162022
 

MS Excel logo No matter which powerful analytical tools data professionals use for their data processing, MS Excel remains the output of choice for many users and whole industries.

In banking and finance, for example, I have seen many SAS users create quite sophisticated data queries and data analysis projects in SAS Enterprise Guide. Yet, at the end, when the final datasets have been produced and validated, comes the manual part when users export those tables into Excel, and then combine and rearrange them by copying-pasting into a desired workbook for storing and distributing.

However, this heavily manual process can be not just fully automated, but also enhanced compared with the point-and-click “export to Excel” and “copy-paste” interactive process. Here is how.

Creating a single simple Excel sheet

Suppose, we want to convert SASHELP.CLASS data table to Excel. Here is a bare-bone solution using SAS Output Delivery System:

ods excel file='C:\Projects\SAS_to_Excel\Single_sheet.xlsx';
 
proc print data=SASHELP.CLASS noobs;
run;
 
ods excel close;

This code is pretty much self-explanatory. It will produce Single_sheet.xlsx Excel workbook file in the folder C:\Projects\SAS_to_Excel. When opened in Excel, it will look as follows:

If you browse through the ODS EXCEL documentation you will find a variety of options that allow you to customize Excel output. Let’s get a little fancy by utilizing some of them.

Creating several customized sheets in Excel workbook

The following code example creates two sheets (tabs) in a single workbook. In addition, it demonstrates some other features to enhance data visualization.

/* -------------------------------------------- */
/* Two sheets workbook with enhanced appearance */
/* -------------------------------------------- */
 
/* Formats for background & foreground coloring */
proc format;
   value hbg 
      50 <- 60   = #66FF99
      60 <- 70   = #FFFF99
      70 <- high = #FF6666
      ;
   value hfg
      low -< 50, 70 <- high = white;
run;
 
/* Define custom font style for ODS TEXT */
proc template;
   define style styles.MyStyle;
   parent=styles.htmlblue;
   style usertext from usertext /
      foreground  = #FF33CC
      font_weight = bold
      ;
   end;
run;
 
/* ODS Excel output file destination */
ods excel file = 'C:\Projects\SAS_to_Excel\Two_sheets_fancy.xlsx';
 
   /* Excel options for 1st sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CLASS'
        frozen_headers  = 'on'
        embedded_titles = 'on' )
      style = styles.MyStyle;
 
   title justify=left color='#4D7EBF' 'This is TITLE for SASHELP.CLASS';
 
   ods text='This is TEXT for SASHELP.CLASS';
 
   proc print data=SASHELP.CLASS noobs;
      var NAME;
      var SEX AGE / style = {just=C};
      var HEIGHT  / style = {background=hbg. foreground=hfg.};
      var WEIGHT;
   run;
 
   /* Excel options for 2nd sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CARS'
        frozen_headers  = 'on'
        embedded_titles = 'on' );
 
   title 'This is TITLE for SASHELP.CARS';
 
   proc print data=SASHELP.CARS noobs;
   run;
 
ods excel close;

Here are the code highlights:

  • PROC FORMAT creates two formats, HBG and HFG, which are used in the first PROC PRINT to illustrate cell text and background coloring.

    NOTE: In SAS, colors are specified as hexadecimal RGB values (cxRRGGBB). However, I found (although it seems undocumented) that in PROC FORMAT and PROC TEMPLATE these colors can be written as quoted (double or single) or unquoted values, as well as prefixed with either ‘#’ or ‘cx’ (for hexadecimal). For example, the following are all valid values: #FFAADD, '#FFAADD', cxFFAADD, 'cxFFAADD'.
  • PROC TEMPLATE defines usertext custom font style for ODS TEXT, which used in ODS EXCEL along with and as alternative to the TITLE statement.
  • ODS EXCEL FILE=file-specification statement opens the EXCEL destination and specifies the output file for ODS EXCEL. The output file-specification can be either a physical file name (must be in quotes) or a fileref (without quotes) assigned with FILENAME statement. It can point to a location on the machine where SAS is run (SAS server), or a network drive accessible from the SAS server.

    This statement follows by two separate ODS EXCEL OPTIONS statements – one per corresponding sheet (tab).
  • ODS EXCEL OPTIONS statement specifies destination-specific suboptions with space-delimited name='value' pairs. In particular, we use the following options:

    • SHEET_NAME= specifies the name for the next worksheet (worksheet names can be up to 28 characters long).
    • FROZEN_HEADERS='ON' specifies that headers are not scrolled when table is vertically scrolled (default is OFF). It is very convenient feature that keeps the title(s) and column names in view while user scrolls through the table rows.
    • EMBEDDED_TITLES='ON' specifies whether titles should appear in the worksheet (default is OFF).

    There is a variety of other useful ODS EXCEL options allowing further customization of your Excel workbook appearance and functionality.

  • TITLE statement is highly customizable with Output Delivery System as shown in this custom TITLE with ODS example.
  • ODS TEXT= statement inserts text into your ODS output. Unlike TITLE for which ODS EXCEL merges several cells, ODS TEXT places its text in a single cell (see screenshot below). The UserText style element that we modified using PROC TEMPLATE controls the font style, font color, and other attributes of the text that the ODS TEXT= statement produces.
  • In PROC PRINT, we use multiple VAR statements to select variables, determine their order and apply styles (text and background colors) to the printed values. There are much more ODS styles with PROC PRINT available for further customizations.
  • The next section of the code ( /* Excel options for 2nd sheet (tab) */ ) creates the second sheet in the same Excel workbook. Similarly, you can create as many sheets/tabs as you wish.
  • The last statement ods excel close; closes the ODS Excel destination so nothing more is written to the output file.

The following are the two screenshots illustrating the two sheets in the produced Excel workbook:

SAS creates Excel workbook with several sheets/tabs

Questions

Do you find this post useful? Do you have questions, comments, suggestions, other tips or tricks about creating MS Excel workbooks in SAS? Please share with us below.

Additional Resources

TUNE IN NOW | LEARN HOW TO READ AND WRITE EXCEL FILES WITH SAS

Automating Excel workbooks creation using SAS was published on SAS Users.

3月 142022
 

Irony of using square wheelsEvery year on March 14, we celebrate Pi Day in recognition of the famed mathematical constant π. This date in the format MM/DD is 3/14 which coincides with the first three digits of the π value 3.14.

As you all know, π represents the ratio of a circle's circumference to its diameter. Therefore, we can derive a circle's circumference C from its diameter d using the following famous formula:

C = πd (or C = 2πr using radius r).

While it is impossible to overstate the significance of π in its fundamental and legendary role of being a constant representing the circumference/diameter ratio, its usage for area and volume measurements is not as much of a “settled science” as you may think. In this post, I dare to debunk the notion of π being an inherent component of the omnipresent circle area and sphere volumes formulas:

A = πr2 = π/4 ⋅ d2

V = 4π/3 ⋅ r3 =  π/6 ⋅ d3

Believe it or not,  we can significantly simplify these formulas by getting rid of π altogether. Wanna know how? Just sit down, relax, open your mind and fasten your seat belt.

Misnomer alert!

(Definition: A misnomer is a wrong or inaccurate use of a name or term.)

Let’s start with some questionable and overreaching semantics in algebra. Have you ever noticed that we call a number to the power of 2 as the number squared? Similarly, we call a number to the power of 3 as the number cubed.  However, besides 2 and 3, we treat all other exponents Np equally by saying “N to the power of p”.

Here is how Wikipedia explains this exponentiation phenomenon:

“The expression b2 = bb is called "the square of b" or "b squared", because the area of a square with side-length b is b2.

Similarly, the expression b3 = bbb is called "the cube of b" or "b cubed", because the volume of a cube with side-length b is b3.

Wait a minute! Exponents b2 and b3 are mathematical operations from the Algebra Department. They have nothing to do with geometrical shapes and exist independently from geometry and area/volume calculations.

The same semantical problem persists with the reverse terms square root and cube root. All natural numbers N have their corresponding root operations, which we call their Nth root. So why do we need this suggestive verbal association of the area/volume measurement units to squares and cubes?

Moreover, the very word “area” itself is often replaced by square footage, square mileage, or square something thus further implying that areas and squares are synonymous. To me this is like calling a round wheel “a rolling square”.

Overall, this square/cube-centric language insidiously conditions our minds into thinking only in terms of squares and cubes regarding the units of area and volume measurement.

Why do we measure area of a circle in squares?

We all know from middle school’s elementary geometry that area of a circle of radius r is A = πr2 (or A = πd2/4 using diameter d since r = d/2). Many famous ancient mathematicians from Archimedes to Eudoxus of Cnidus to Hippocrates of Chios have proven this formula. There is also a multitude of its proofs using modern methods rooted in trigonometry and calculus.

However, all these methods have one funny thing in common. It is the presumption that an area must be measured in squares (square inches, square meters, square miles, etc.). Think about it for a minute: for thousands of years we have been measuring round areas by the number of squares with a side=1 covering that round area. C’mon! We invented the wheel, electricity, engines, airplanes, telephone, radio, TV, computers, spaceships, artificial intelligence and more, but we are still measuring round areas in squares!

It makes perfect sense measuring rectilinear areas in square units – they are a good fit for each other. But measuring round areas in squares?! What would you say if we were measuring square areas by the number of round coins covering them? How is that not as good as that the opposite?

Introducing circular units of areas

Defining units of measurement is totally arbitrary. All we need is a well-defined gauge. For example, we have a variety of length units such as foot (derived from the size of the human foot), inch (derived from the width of the human thumb), meter (derived from the Greek verb μετρέω  - measure, count) and so on. They are all convertible to each other, but their definitions are quite different.

When it comes to the units of area, it can be a square, a rectangle, a circle, an oval, a star, even a snowflake or any other two-dimensional shape of a specified size as long as its definition is clear and unambiguous.

Without further ado let’s define a circular unit of area as an area of a circle with the diameter d=1 unit of length (inch, foot, meter, etc.).

Measuring circle areas in circular units

In order to measure areas of different sized circles, we do not need to count the number of these single unit circles within the measured circle. We just place these two circles concentrically and expand or shrink our gauge circle until they align.  Here is a visual to illustrate this:

For a circle with the diameter d, we can express its area measured in the circular units Ac using the following formula:Ac(d) = d ⋅ d = d2

I would not call d2 as “d squared” here. The power of 2 is an indication of two-dimensional area while d is a one-dimensional length.

Let’s prove this formula. Suppose As(d) is a circle area with diameter d in square units and Ac(d) is the circle area in circular units.

As we know As(d) = πr2 = π (d/2)2 =  (π/4)d2. For the same circle its area in circular units is Ac(d).

For d=1, we have As(1) = πr2 = π (d/2)2 =  (π/4)d2 = π/4. The same circle area represents 1 circular area unit Ac(1)=1.

Therefore, Ac(d) =  Ac(1) d2 =  d2, that is we effectively eliminated π from the circle area calculation.

Measuring squares in circular units

Circle and square of equal size
At the same time, π is not the kind of number to easily get rid of. Similar to the law of conservation of energy, there is probably a law of conservation of π: whenever we get rid of it in one place, it appears in another place.

Let’s suppose we have a square with side length of x. Then its area in square units will be As(x) = x2. If we take a circle of the same area we get the following equation As(x) = x2 = (π/4)d2. From here d2 = (4/π)x2. Since d2 represents the number of circular units Ac we can say that the area of a square with side length of x in circular units is Ac(x) = (4/π)x2.

Notice that π-wise these two formulas for area, Circle_A=(π/4)d2 [square units]  and Square_A=(4/π)x2 [circular units], are literally upside-down version of each other.

Spheric units for 3D geometry

We can apply exact same approach to measuring volumes of three-dimensional objects. In the cubic unit’s paradigm, we define one cubic unit of volume as a cube with a side of one unit of length. Then the volume of a cube with a side of x is Cube_V=x3 [cubic units]  and the volume of a sphere with a diameter of d=2r is Sphere_V = (4/3)πr3 = (π/6)d3 [cubic units].

Alternatively, in the spheric unit’s paradigm we define one spheric unit of volume as a sphere with a diameter of one unit of length. Then the volume of a sphere with a diameter of d is Sphere_V=d3 [spheric units]  and the volume of a cube with a side of x is Cube_V=(6/π)x3 [spheric units].

Again, here I would not call d3 and x3 as “d cubed” and “x cubed” since the power of 3 is indicative of three-dimensional space while d and x are one-dimensional units of length.

Similarly to area calculations, π-wise these two formulas for volume, Sphere_V=(π/6)d3 [cubic units]  and Cube_V=(6/π)x3 [spheric units], are literally upside-down version of each other.

Is this practical?

If you still doubt the validity and/or practicality of my geometric deductions, check out circular mil units of area adopted for wire size measurement by the US National Electrical Code (NEC) and the Canadian Electrical Code (CEC).

A circular mil (cmil) is a unit of area equal to the area of a circle with a diameter of one mil (one thousandth of an inch or approximately 0.0254 mm).

While applied just to one particular unit of length – mil, it is exactly the same circular units of area concept as described in the previous sections.

As you can see, electrical engineers have figured out a use for circular units, perhaps because they are dealing with electrical wires, which are predominantly round in their cross section. Since electrical conductivity of wires is directly proportional to their cross-section area, it is just more convenient to use circular units (without π) for calculating the cross section and ultimately electrical conductivity. The same is true for the throughput capacity while transporting any substance via pipes (water, gas, oil, etc.) or moving blood through the blood vessels.

There are many other applications dealing with the circular shapes and spherical objects. In all these cases, it makes more sense to measure areas in circular units rather than squares and measure object volumes in spheric units rather than cubes. Take for example physics (gravitational, electrical and magnetic fields), astronomy (planets, stars, comets, and orbits), geology (areas around earthquake epicenters), telephony (areas of cellular tower coverage), etc.

Your thoughts, comments

I would like to hear from you. What do you think about all this? Please share your thoughts in the Comments section below.

ACCESS FREE SAS® SOFTWARE | SAS® OnDemand for Academics

Calculating circle areas and sphere volumes without π was published on SAS Users.

2月 162022
 

Editor’s note: SASjs, while free and useful for anyone, is a gateway to services from 4GL Apps, a company founded and operated by longtime SAS user Allan Bowe.Open code sharing


I would like to start this post with a well-known quote by George Bernard Shaw: “If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”

SAS Software provides vast functionality delivered with it, but perhaps the biggest asset of the SAS ecosystem is its ever-growing community of SAS users around the globe readily willing to share their SAS code and solutions to various problems. Take for example, SAS Communities, SAS blog or SAS Software on GitHub.

With open-source code taking the world by storm, SAS maintains and nurtures a viable mix of its rock-solid backbone – commercial SAS Software backed by its unparalleled Technical Support and open-source SAS solutions developed and shared by SAS users.

An introduction to SASjs

Here, I am going to introduce you to one more publicly available open source treasure called SASjs (also available on GitHub), which stands out from many others because of its thought-provoking, multifaceted approach.

In a nutshell, the SASjs is a collection of tools aimed at enabling and accelerating development and operations (DevOps) for a broad range of SAS applications, including SAS web development.

SASjs consists of the following three major components:

  1. SAS macro library (MacroCore)
  2. JavaScript library (Adapter)
  3. Command line interface (CLI)

SASjs MacroCore library

The MacroCore library is a centerpiece of the SASjs framework. It includes several hundred SAS macros facilitating various aspects of SAS coding, application development and deployments. All these macros are production quality and free to use (MIT-licensed). The contents of the MacroCore library organized by the following sections:

  • BASE library (SAS9/Viya) - prefixes: mf (macro functions), mp (macro procedures - macros).
  • FCMP library (SAS9/Viya) - prefix: mcf; used to generate user-defined functions using PROC FCMP.
  • META library (SAS9 only) - prefix: mm; used in SAS EBI environments that connect to the metadata server.
  • SERVER library (@sasjs/server only) - prefix: ms; used for building applications using @sasjs/server - an open source REST API for Desktop SAS.
  • VIYA library (Viya only) - prefixes: mv, mvf; used for interfacing with SAS Viya.
  • METAX library (SAS9 only) - prefixes: mmw, mmu, mmx.
  • DDL Library - prefix: mddl; provides the structure for any permanent table used by the macros.
  • LUA library - prefix: ml

All these macros (except METAX) are Operating System independent and operate under NOXCMD option. That option prevents the SAS program from executing OS commands and is often implemented by SAS Administrators for security reasons.

DOCUMENTATION PAGE – here you can find detailed descriptions for all of the SASjs MacroCore macros.

DOWNLOAD PAGE – here you can grab all these macros FREE and deploy (save) anywhere in your SAS environment.

Alternatively, you can make all these macros available in your SAS program by %INCLUDE-ing them directly from the Web – just add this code snippet to the beginning of your SAS program:

filename mcore url "https://raw.githubusercontent.com/sasjs/core/main/all.sas";
%inc mcore;

To give you a taste of what these macros are here is a small sample of them with examples of usage and brief descriptions:

  • %mp_copyfolder(&rootdir,&copydir) - copies the whole directory structure including files and subdirectories from one location (&rootdir) to another (&copydir). Uses only internal SAS functions without OS commands (NOXCMD option).
  • %mp_dirlist(path=/some/location, outds=myTable, maxdepth=MAX) - returns SAS table myTable listing all files and sub-directories up to specified depth. It also uses only internal SAS functions without OS commands (NOXCMD option).
  • %mp_csv2ds(inref=mycsv,outds=myds,baseds=template_ds) - imports relevant columns from mycsv CSV file to myds data table using a template dataset template_ds to provide the types and lengths.
  • %mf_getvarcount(my_dataset) - this macro function returns the number of variables in a dataset.
  • %mf_getvarlist(sashelp.class,dlm=%str(,),quote=double) - this macro function returns the dataset variable list directly from its header. The list can be separated by blanks, commas or any other delimiter (dlm=), and be unquoted or quoted (quote=).

SASjs Javascript library (Adapter)

In my earlier posts about integrating SAS with Google Maps, I employed and implemented the idea of using SAS for generating data-driven HTML and Javascript code for creating web pages.

SASjs, on the other hand, turns on its head this SAS-to-JavaScript paradigm. In a total reverse, it uses JavaScript to generate SAS code for running on SAS servers as jobs or stored processes.

Previously, I naively believed that JavaScript is a web browser language that exists and used only within web browsers to enhance interactions with HTML web pages. SASjs proved me wrong. It turns out JavaScript is a general-purpose language that can be used everywhere outside of the web browser.

SASjs extensively employs this JavaScript functionality. Its second component, SASjs Adapter, is a server-side JavaScript library (and a set of SAS macros) to handle the communication between the frontend HTML5 or client application and the SAS 9 or Viya backend SAS services.

SASjs uses the following two client-side components:

  • Node.js JavaScript runtime environment.
  • Visual Studio Code (VS Code) source code editor.

Node.js is an asynchronous event-driven JavaScript runtime designed to build scalable network applications.

Visual Studio Code (VS Code) is a lightweight but powerful source code editor, which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich collection of extensions for other languages (such as C++, C#, Java, Python, PHP, Go, etc.) and runtimes (such as .NET and Unity). In essence, it is more than just a source code editor, it is an Integrated Development Environment (IDE) allowing you to perform programming along with development as well as debugging and testing the application.

SASjs command line interface (CLI)

The SASjs CLI is the third major component of SASjs that ties together and facilitates creating, building, deploying and executing various SAS web applications (for SAS 9 and SAS Viya).

It is a powerful and flexible tool covering a wide range of features and functionalities to support successful applications development and operations (DevOps). CLI’s primary actions include:

  • Create a template SASjs Git project repository.
  • Compile dependent macros, macro variables and pre/post code.
  • Create/build the master SAS deployment, including Jobs, Services and Tests.
  • Deploy/execute local scripts and remote SAS programs to create your app on the SAS Server.
  • Configure and generate Doxygen documentation for all SAS content.

There is also a feature to let you deploy your frontend as a service, bypassing the need to access the SAS Web Server.

SASjs support

Even though SASjs is an open source platform, it provides different levels of support for its products.

Report defects. Through this GitHub web page SAS users can raise issues with SASjs such as report defects, suggest improvements, etc.

Engage with SASjs developers. SAS users can also communicate with SASjs developers, receive free clarifications and consultations on various aspects of SASjs usage, or even hire them for involvement that is more extensive.

Questions? Thoughts? Comments?

Do you find this post useful? Have you learned anything new? Do you have any other SAS open-source assets you can share? Do you have questions, concerns, comments? Please share with us below.

SAS open-source treasures from around the world: SASjs was published on SAS Users.

1月 132022
 

3 ways of calculating length of overlapping of two integer intervalsIn this post, we tackle the problem of calculating the length of overlap of two date/time intervals. For example, we might have two events — each lasting several days — and need to calculate the number of shared days between both events. In other words, we need to calculate the length of the events’ overlap.

Such tasks are typical in clinical trials programming, health (and non-health) insurance claim processing, project management, planning and scheduling, etc.

The length of the overlap can be measured not only in days, but also in any other units of the date/time dimension: hours, minutes, seconds, weeks, months, years and so on. Moreover, the date/time application is just one special case of a broader task of calculating the overlap length of two integer intervals. An integer interval [x .. y] is a set of all consecutive integer numbers beginning with x and ending with y (boundaries included and x ≤ y).

Instead of suggesting a single “best” way of solving this problem, I offer three different strategies. This allows you to compare and weigh their pros and cons and decide for yourself which approach is most suitable for your circumstances. All three solutions presented in this blog post are applicable to the date/time use case, as well as its superset of integer intervals’ overlap length calculation.

Problem description

Suppose we have two events A [A1 .. A2] and B [B1 .. B2]. Event A lasts from date A1 until date A2 (A1 ≤ A2), and event B lasts from date B1 until date B2 (B1 ≤ B2).  Both starting and ending dates are included in the corresponding event intervals.

We need to calculate the overlap between these two events, defined as the number of days that belong to both events A and B.

Overlapping intervals: sample data

Before solving the problem, let’s create sample data to which we are going to apply our possible solutions.

The following data step creates data set EVENTS representing such a data sample:

data EVENTS;
   input A1 A2 B1 B2;
   format A1 A2 B1 B2 date9.;
   informat A1 A2 B1 B2 mmddyy10.;
   lines;
01/02/2022 01/05/2022 01/06/2022 01/10/2022
01/22/2022 01/30/2022 01/16/2022 01/18/2022
01/02/2022 01/05/2022 01/03/2022 01/10/2022
01/02/2022 01/05/2022 01/03/2022 01/04/2022
01/10/2022 01/15/2022 01/06/2022 01/14/2022
01/01/2022 01/05/2022 01/05/2022 01/09/2022
01/07/2022 01/13/2022 01/10/2022 01/13/2022
;

Now let’s go over the following three distinct possible solutions, each with its own merit and downside. At the end, you decide which solution you like the most.

Solution 1: Brute force

In general, brute force algorithms solve problems by exhaustive iteration through all possible choices until a solution is found. It rarely results in a clever, efficient solution. However, it is a straightforward, valid and important algorithm design strategy that is applicable to a wide variety of problems.

For integer intervals overlap calculation, a brute force approach would involve the following steps:

  1. Determine the earliest date in the definitions of our two intervals
  2. Determine the latest date in the definitions of our two intervals
  3. Create an iterative loop with index variable ranging from the earliest to the latest date
  4. Within this loop, increment a counter by one if index variable falls within both intervals
  5. At the end of this loop, the counter will equal the desired value for the overlap.

Here is the code implementation of the described above algorithm:

data RESULTS;
   set EVENTS;
   OVERLAP = 0;
   do i=min(A1,B1) to max(A2,B2);
      if (A1<=i<=A2) and (B1<=i<=B2) then OVERLAP + 1;
   end;
run;

As you can see, the code implementation of the brute force solution is quite simple. However, it is hardly efficient as it involves extensive looping.

Solution 2: Exhaustive logic

Exhaustive logic algorithms solve problems by splitting them into several smaller problems (logical units or classes). This way, instead of working on a problem in its entirety, you can work separately on smaller and simpler problems that are easier to comprehend, digest and solve.

Here is an illustration of various arrangements of integer intervals representing separate logical units (numbered 1...5) when calculating the intervals overlaps:

Different arrangements of date/time integer intervals

The following SAS code demonstrates an implementation of the exhaustive logic algorithm (each IF-THEN statement corresponds to the numbered logical unit in the above figure):

data RESULTS;
   set EVENTS;
   if A1<=B1 and A2>=B2    then OVERLAP = B2 - B1 + 1; else
   if A1>=B1 and A2<=B2    then OVERLAP = A2 - A1 + 1; else
   if A1<=B1<=A2 and B2>A2 then OVERLAP = A2 - B1 + 1; else
   if B1<=A1<=B2 and A2>B2 then OVERLAP = B2 - A1 + 1; else
   if B1>A2 or A1>B2       then OVERLAP = 0;
run;

Notice that we need to add 1 to each of the two days difference. That is because the number of days spanned is always 1 less than the number of days contained in an interval. Here is an illustration:

Days spanned vs. number of days

Solution 3: Holistic Math

Finally, a holistic math method looks at the problem in its entirety and condenses it to a comprehensive mathematical formula. While it is usually not the most obvious approach, and often requires some brain stretching and internalization, the effort is often rewarded by achieving the best possible solution.

The exhaustive logic solution may help in better understanding the problem and arriving at the generalized mathematical formula.

If you dwell on the exhaustive logic solution for a while, you may come to realization that all the variety of the logical units boils down to the following single simple formula:

Overlap = min(A2, B2) - max(A1, B1) + 1.

In other words, the overlap of two integer intervals is a difference between the minimum value of the two upper boundaries and the maximum value of the two lower boundaries, plus 1.

Positive OVERLAP value represents actual overlap days, while zero or negative value means that intervals do not overlap at all. The higher absolute value of the negative result – the farther apart the intervals are.  To correct this “negativity” effect we can just set all negative values to zero.

The following SAS code implements the described holistic math approach:

data RESULTS;
   set EVENTS;
   OVERLAP = min(A2,B2) - max(A1, B1) + 1;
   if OVERLAP<0 then OVERLAP = 0;
run;

As you can see, this in the most concise, elegant, and at the same time the most efficient solution with no iterative looping and complicated exhaustive (and exhausting) logic.

Output

Here is the output data table RESULTS showing our sample intervals (columns A1, A2, B1 and B2) and the resulting column OVERLAP (all three solutions described above should produce identical OVERLAP values):

Output data set with calculated days overlaps

Questions? Thoughts? Comments?

Which of the presented here three methods do you like most? Which one would you prefer under different circumstances? Can you come up with yet another way of calculating date intervals overlaps? Do you have questions, concerns, comments? Please share with us below.

Additional resources

Calculating the overlap of date/time intervals was published on SAS Users.

10月 142021
 

Trimming strings left and right

I am pretty sure you have never heard of the TRIMS function, and I would be genuinely surprised if you told me otherwise. This is because this function does not exist (at least at the time of this writing).

But don’t worry, the difference between "nonexistence" and "existence" is only a matter of time, and from now it is less than a blog away. Let me explain. Recently, I published two complementary blog posts:

[1] Removing leading characters from SAS strings

[2] Removing trailing characters from SAS strings

While working on these pieces and researching “prior art” I stumbled upon a multipurpose function in the SAS FedSQL Language that alone does either one or both of these things – remove leading or/and trailing characters from SAS strings.

FedSQL Language and Proc FedSQL

The FedSQL language is the SAS proprietary implementation of the ANSI SQL:1999 core standard. Expectedly, the FedSQL language is implemented in SAS by means of the FedSQL procedure (PROC FEDSQL). This procedure enables you to submit FedSQL language statements from a Base SAS session, and it is supported in both SAS 9.4 and SAS Viya.

Using the FEDSQL procedure, you can submit FedSQL language statements to SAS and third-party data sources that are accessed with SAS and SAS/ACCESS library engines. Or, if you have SAS Cloud Analytic Services (CAS) configured, you can submit FedSQL language statements to the CAS server.

FedSQL TRIM function

FedSQL language has its own vast FedSQL Functions library with hundreds of functions many of which replicate SAS 9.4 Functions. Many, but not all. Deep inside this FedSQL functions library, there is a unique treasure modestly called TRIM Function which is quite different from the BASE SAS Language TRIM() function.

While SAS 9.4 BASE TRIM() function capabilities are quite limited - it removes just trailing blanks from a character string, the FedSQL TRIM() function is way much more powerful. This triple-action function can remove not just trailing blanks, but also leading blanks, as well as both, leading and trailing blanks. On top of it, it can remove not just blanks, but any characters (although one character at a time). See for yourself, this function has the following pretty self-explanatory syntax:

TRIM( [BOTH | LEADING | TRAILING] [trim-character] FROM column)

Here trim-character specifies one character (in single quotations marks) to remove from column. If trim-character is not specified, the function removes blanks.

While being called a function, it does not look like a regular SAS function where arguments are separated by commas.  It looks more like an SQL statement (which it understandably is – it is part of the FedSQL language). However, this function is available only in PROC FEDSQL; it’s not available in SAS DATA steps or other PROC steps. Still, it gives us pretty good idea of what such a universal function may look like.

User-defined function TRIMS to remove leading or/and trailing characters in SAS strings

Let’s build such a function by means of the PROC FCMP for the outside the FedSQL usage (it is worth noticing that the FCMP procedure is not supported for FedSQL). To avoid confusion with the existing TRIM function we will call our new function TRIMS (with an ‘S’ at the end) which suits our purpose quite well denoting its plural purpose. First, we define what we are going to create.

User-defined TRIMS function

TRIMS Function

Removes leading characters, trailing characters, or both from a character string.

Syntax

TRIMS(function-modifier, string, trim-list, trim-list-modifier)

Required Arguments

  • function-modifier is a case-insensitive character constant, variable, or expression that specifies one of three possible operations:
    'L' or 'l' – removes leading characters.
    'T' or 't' – removes trailing characters.
    'B' or 'b' – removes both, leading and trailing characters.
  • string is a case-sensitive character constant, variable, or expression that specifies the character string to be trimmed.
  • trim-list is a case-sensitive character constant, variable, or expression that specifies character(s) to remove from the string.
  • trim-list-modifier is a case-insensitive character constant variable, or expression that supplements the trim-list.
    The valid values are those modifiers of the FINDC function that “add” groups of characters (e.g. 'a' or 'A', 'c' or 'C', 'd' or 'D', etc.) to the trim-list.

The following user-defined function implementation is based on the coding techniques described in the two previous posts, [1] and [2] that I mentioned above. Here goes.

 
libname funclib 'c:\projects\functions';
 
/* delete previous function definition during debugging */
options cmplib=funclib.userfuncs;
proc fcmp outlib=funclib.userfuncs.package1;
   deletefunc trims;
run;
 
/* new function defintion */
proc fcmp outlib=funclib.userfuncs.package1;
   function trims(f $, str $, clist $, mod $) $32767;
      from = 1;
      last = length(str);
      if upcase(f) in ('L', 'B') then from = findc(str, clist, 'K'||mod);
      if from=0 then return('');
      if upcase(f) in ('T', 'B') then last = findc(str, clist, 'K'||mod, -last); 
      if last=0 then return('');
      return(substr(str, from, last-from+1));      
   endfunc; 
run;

Code highlights

  • In the function definition, we first assign initial values of the target substring positions as from=1 and last=length(str).
  • Then for Leading or Both character removal, we calculate an adjusted value of from as a position of the first character in str that is not listed in clist and not defined by the mod
  • If from=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • Then for Trailing or Both character removal, we calculate an adjusted value of last as a position of the last character in str that is not listed in clist and not defined by the mod
  • If last=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • And finally, we return a substring of str starting at the from position and ending at the last position, that is with the length of last-from+1.

TRIMS function usage

Let’s define SAS data set SOURCE as follows:

data SOURCE;
   input X $ 1-30;
   datalines;
*00It's done*2*1**-
*--*1****9*55
94*Clean record-*00
;

In the following DATA step, we will create three new variables with removed leading (variable XL), trailing (variable XT) and both - leading and trailing (variable XB) characters '*' and '-' as well as any digits:

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length XB XL XT $30;
   XB = trims('b', X, '*-', 'd');
   XL = trims('L', X, '*-', 'd');
   XT = trims('t', X, '*-', 'd');
run;

In this code we use the TRIM function three times, each time with a different first argument to illustrate how this affects the outcome.

Arguments usage highlights

  • The first argument of the TRIMS function specifies whether we remove characters from both leading and trailing positions ('b'), from leading positions only ('L'), or from trailing positions only ('t'). This argument is case-insensitive. (I prefer using capital 'L' for clarity since lowercase 'l' looks like digit '1').
  • The second argument specifies the name of the variable (X) that we are going to remove characters from (variable X is coming from the dataset SOURCE).
  • The third argument '*-' specifies which character (or characters) to remove. In our example we are removing '*' and '-'. If you do not need to explicitly specify any character here, you still must supply a null value ('') since it is a required argument. In this case, the fourth argument (trim-list-modifier) will determine the set of characters to be removed.
  • And finally, the fourth argument (case-insensitive) of the TRIMS function specifies the FINDC function modifier(s) to remove certain characters in bulk (in our example 'd' will remove all digits). If such modifier is not needed, you still must supply a null value ('') since all four arguments of the TRIMS function are positional and required.

Here is the output data table TARGRET showing the original string X and the resulting strings XB (Both leading and trailing characters removed), XL (Leading characters removed) and XT (Trailing characters removed) side by side:

Result of leading and trailing characters trimming

Conclusion

The new TRIMS function presented in this blog post goes ways further the ubiquitous LEFT and TRIM functions that remove the leading (LEFT) or trailing (TRIM) blanks. The TRIMS function handles ANY characters, not just blanks. It also expands the character deletion functionality of the powerful  FedSQL TRIM function beyond just removing any single leading and/or trailing character. The TRIMS function single-handedly removes any number of explicitly specified characters from either leading, trailing or both (leading and trailing) positions. Plus, it removes in bulk many implicitly specified characters. For example 'd' modifier removes all digits, 'du' modifier removes all digits ('d') and all uppercase letters ('u'), 'dup' modifier removes all digits ('d'), all uppercase letters ('u') and all punctuation marks ('p'); and so on as described by the FINDC function modifiers. The order in which modifier characters are listed does not matter.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Please share your thoughts with us below.

Introducing TRIMS function to remove any leading and/or trailing characters from SAS strings was published on SAS Users.