Leonid Batkhan

11月 022017
 

The purpose of this blog post is to demonstrate a SAS coding technique that allows for calculations with multiple variables across a SAS dataset, whether or not their values belong to the same or different observations.

calculations across observations of a SAS data table

What do we want?

As illustrated in the picture on the right, we want to be able to hop, or jump, back and forth, up and down across observations of a data table in order to implement calculations not just on different variables, but with their values from different observations of a data table.

In essence, we want to access SAS dataset variable values similar to accessing elements of a matrix (aij), where rows represent dataset observations, and columns represent dataset variables.

Combine and Conquer

In the spirit of my earlier post Combine and Conquer with SAS, this technique combines the functionality of the LAG function, which allows us to retrieve the variable value of a previous observation from a queue, with an imaginary, non-existent in SAS, LEAD function that reads a subsequent observation in a data set while processing the current observation during the same iteration of the DATA step.

LAG function

LAG<n> function in SAS is not your usual, ordinary function. While it provides a mechanism of retrieving previous observations in a data table, it does not work “on demand” to arbitrarily read a variable value from a previous observation n steps back. If you want to use it conditionally in some observations of a data step, you still need to call it in every iteration of that data step. That is because it retrieves values from a queue that is built sequentially for each invocation of the LAG<n> function. In essence, in order to use the LAG function even just once in a data step, you need to call it every time in each data step iteration until that single use.

Moreover, if you need to use each of the LAG1, LAG2, . . . LAGn functions just once, in order to build these queues, you have to call each of them in every data step iteration even if you are going to use them in some subsequent iterations.

LEAD function

The LEAD function is implemented in Oracle SQL and it returns data from the next or subsequent row of a data table. It allows you to query more than one row in a table at a time without having to join the table to itself.

There is no such function in SAS. However, the POINT= option of the SET statement in a SAS data step allows retrieving any observation by its number from a data set using random (direct) access to read a SAS data set. This will allow us to simulate a LEAD function in SAS.

HOP function

But why do we need two separate functions like LAG and LEAD in order to retrieve non-current observations. In essence, these two functions do the same thing, just in opposite directions. Why can’t we get by with just one function that does both backwards and forwards “hopping?”

Let’s combine and conquer.

Ideally, we would like to construct a new single function - let’s call it HOP(x, j) - that combines the best qualities of both LAG and LEAD functions. The two arguments of the HOP function would be as follows:

x – SAS variable name (numeric or character) the value of we are retrieving;

j – hop distance (numeric) – an offset from the current observation; negative values being lagging (hopping back), positive values being leading (hopping forward), and a zero-value meaning staying within the current observation.

The sign of the second argument defines whether we lag (minus) or lead (plus). The absolute value of this second argument defines how far from the current observation we hop.

Alternatively, we could have the first argument as a column number, and the second argument as a row/observation number, if we wanted this function to deal with the data table more like with a matrix. But relatively speaking, the method doesn’t really matter as long as we can unambiguously identify a data element or a cell. To stay within the data step paradigm, we will stick with the variable name and offset from the current observation (_n_) as arguments.

Let’s say we have a data table SAMPLE, where for each event FAIL_FLAG=1 we want to calculate DELTA as the difference between DATE_OUT, one observation after the event, and DATE_IN, two observations before the event:

Calculations across observations of a SAS data table

That is, we want to calculate DELTA in the observation where FAIL_FLAG = 1 as

26MAR2017 18JAN2017 = 67 (as shown in light-blue highlighting in the above figure).

With the HOP() function, that calculation in the data step would look like this:

data SAMPLE;
   set SAMPLE;
   if FAIL_FLAG then DELTA = hop(DATE_OUT,1) - hop(DATE_IN,-2);
run;

It would be reasonable to suggest that the hop() function should return a missing value when the second argument produces an observation number outside of the dataset boundary, that is when

_n_ + j < 0 or _n_ + j > num, where _n_ is the current observation number of the data step iteration; num is the number of observations in the dataset; j is the offset argument value.

Do you see anything wrong with this solution? I don’t. Except that the HOP function exists only in my imagination. Hopefully, it will be implemented soon if enough SAS users ask for it. But until then, we can use its surrogate in the form of a %HOP macro.

%HOP macro

The HOP macro grabs the value of a specified variable in an offset observation relative to the current observation and assigns it to another variable. It is used within a SAS data step, but it cannot be used in an expression; each invocation of the HOP macro can only grab one value of a variable across observations and assign it to another variable within the current observation.

If you need to build an expression to do calculations with several variables from various observations, you would need to first retrieve all those values by invoking the %hop macro as many times as the number of the values involved in the expression.

Here is the syntax of the %HOP macro, which has four required parameters:

%HOP(d,x,y,j)

d – input data table name;

x – source variable name;

y – target variable name;

j – integer offset relative to the current observation. As before, a negative value means a previous observation, a positive value means a subsequent observation, and zero means the current observation.

Using this %HOP macro we can rewrite our code for calculating DELTA as follows:

 data SAMPLE (drop=TEMP1 TEMP2);
   set SAMPLE;
   if FAIL_FLAG then
   do;
      %hop(SAMPLE,DATE_OUT,TEMP1, 1)
      %hop(SAMPLE,DATE_IN, TEMP2,-2)
      DELTA = TEMP1 - TEMP2;
   end;
run;

Note that we should not have temporary variables TEMP1 and TEMP2 listed in a RETAIN statement, as this could mess up our calculations if the j-offset throws an observation number out of the dataset boundary.

Also, the input data table name (d parameter value) is the one that is specified in the SET statement, which may or may not be the same as the name specified in the DATA statement.

In case you are wondering where you can download the %HOP macro from, here it is in its entirety:

%macro hop(d,x,y,j);
   _p_ = _n_ + &j;
   if (1 le _p_ le _o_) then set &d(keep=&x rename=(&x=&y)) point=_p_ nobs=_o_;
%mend hop;

Of course, it is “free of charge” and “as is” for your unlimited use.

Your turn

Please provide your feedback and share possible use cases for the HOP function/macro in the Comment section below. This is your chance for your voice to be heard!

Hopping for the best - calculations across SAS dataset observations was published on SAS Users.

10月 202017
 

 

“The difference between style and fashion is quality.”

-Giorgio Armani

With an out-of-the-box SAS Enterprise Guide (EG) installation, when you build a report in SAS EG it is displayed in a nice-looking default style. If you like it, you can keep it, and continue reading. If you don’t quite like it, then stop, take a deep breath, and continue reading carefully – you are about to discover a wealth of styling options available in EG. In any case, you are not bound by the default style that is set during your SAS EG installation.

Changing your SAS EG report style on the fly

Let’s say we run the following SAS program in EG:

SAS code sample to run in SAS EG
When you run a SAS Program or a Process Flow that creates an output, it will open in the Results tab shown in a default style (HtmlBlue). For many it looks quite OK. However, SAS provides many other different styles you can choose from. To change your report style, just click on Properties of the workspace toolbar:
Results tab in SAS EG
This will open the Properties for SAS Report window where you can select any style in the Style drop-down list:

Properties for SAS Report
After you selected desired style, click OK; this will save your change and close the window. Your report will immediately be redrawn and displayed in your new style:

SAS report in new style

That new style will only apply to the Results element of the SAS Program or a Process Flow you ran. If you save the EG project and then re-open it in a new EG session and re-run, EG will still remember and use the style you previously selected for the Results element. All other elements will use the default style.

But how do you know what each style’s look and feel is before you select it? The following sections show how to browse different styles as well as how to change your default styles to new ones.

Browsing SAS EG styles

SAS Enterprise Guide interface provides a quick access to viewing different styles, whether built-in or external. Here is how to get an idea of different styles look and feel.

From the EG main menu click on Tools → Style Manager. In the opened Style Manager window you may browse through the Style List by clicking on each style listed in the left pane and also get a good idea of how each particular style looks by viewing it in the right Preview pane of the Style Manager window:

Style Manager window

From Style Manager window, you can also set up a default style by selecting a style you like from the Style List in the left pane and clicking Set as Default button. However, setting your default style using Style Manager will only affect SAS Report and HTML results formats. But what about other results formats? Not to worry, SAS EG interface has you covered.

Changing your default SAS EG report style

If you like a particular report style and don’t want to be stuck with a pre-set default style and necessity to change it every time you run a report, you may easily change your SAS EG default style for practically any results format.

From the EG main menu click on Tools → Options…, the Options window will open. I that window, under Results → Results General you may select (check) one or multiple Results Formats (SAS Report, HTML, PDF, RTF, Text Output, PowerPoint, and Excel) as well as choose your Default Result Format:

Options window to choose default report format

Let’s set up default styles for different results formats. First, let’s go to Results → SAS Report, you will see (under Appearance → Style) that your default style (set up during initial EG installation) is HtmlBlue. Click on the Style drop-down list to select a different style:

Select new report style

The same way you may set up default styles for other results formats (HTML, RTF, PDF, Excel, and PowerPoint). For Graph, you may select a Graph Format (ActiveX, Java, GIF, JPEG, etc.) When you are done, click OK button, the Options window will close and your selected styles become your new default. They are going to persist across EG sessions.

If you are a SAS Administrator, to ensure consistency across your organization, you may have all your SAS Enterprise Guide users set up the same Default styles for every Result format.

Server-side style templates

Server-side SAS style templates are created using the PROC TEMPLATE of the SAS Output Delivery System (ODS) and are stored in Template Stores within SAS libraries. By definition, a template store is an item store that stores items that were created by the TEMPLATE procedure. In particular, built-in server-side SAS style templates are stored in the SASHELP.TMPLMST item store.

Note, that you will not see these item stores / template stores in the EG Server→Library tree under the SASHELP library as it only shows data tables and views. While there is no access in EG to the Templates Window, you can access the Templates Window from SAS Display Manager.

In Enterprise Guide, in order to view a list of built-in server-side SAS styles in the SASHELP.TMPLMST item store, you may run the following code:

proc template;
   path sashelp.tmplmst;
   list styles;
run;

This will produce the following listing shown in the EG’s Results tab:

Report listing

If you want to view all the server-side styles including built-in and user-defined, you can do that in EG by running the following code:

proc template;
   list styles;
run;

Server-side templates are applied to ALL Results Formats.

CSS styles

Cascading Style Sheet (CSS) styles are available only for SAS Report and HTML result formats. The CSS stylesheet only styles the browser-rendered elements. It will not change a graph image style that is generated on the server.

In the SAS code generated by EG, CSS style is specified in STYLESHEET= option of the ODS statement. It can point to any local or network accessible CSS file, for example:

STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")

In addition, STYLESHEET= option can point to a .css file located on the Internet, for example:

STYLESHEET=(URL="https://www.sas.com/etc/designs/saswww/static.css")

Server-side styles vs. CSS styles

With SAS Enterprise Guide you create Projects, Process Flows, Programs, Tasks, Reports, etc. on your local Window machine. When you Run your Project (or any part of it), EG generates SAS code which gets sent to and executed on the SAS server, and then any visual results are sent back to EG and displayed there.

For every Result Format, a server-side style template is always applied when SAS output is generated on the SAS server.

When that SAS output is returned to SAS EG, for SAS Report and HTML result formats only, an additional optional styling is applied in a form of CSS styles that controls what your SAS Report or HTML output looks like. This CSS styling affects only HTML elements of the output and do not affect graph images that are always generated and styled on the server.

These two kinds of styles are reflected in the EG-generated SAS code that gets shipped to SAS server for execution. If you look at the Code Preview area (Program → Export → Export Program) or Log tab, you will always see ODS statement with STYLE= option that specifies the server-side style. If your selected Result Format is either SAS Report or HTML, then in addition to STYLE= option the ODS statement also contains STYLESHEET= option that specifies HTML CSS stylesheet (external file) accessible via the client.

If you select as default a built-in style (e.g. Harvest) EG will find both server version of it and CSS version of it; you will see this in the SAS log:

STYLE=Harvest
STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/Harvest.css")

However, if you select as default some custom CSS or external CSS style (e.g. ABC) that does not have a match in the server template store, the server style will be set to the default server-side style HTMLBlue; you will see in the SAS log the following WARNING:

WARNING: Style ABC not found; Default style will be used instead.

This warning relates to the STYLE= option specifying the server-side style.

Adding your custom SAS EG report style

Even though SAS supplies dozens of styles for you to choose from (Built-in Styles), you can still modify existing styles and create your own custom styles for SAS Report and HTML output types only. You can do this via Style Manager.

Open Style Manager with either one of the following ways:

Tools → Style Manager

Tools → Options… → Results/SAS Reports → Manage Styles

Tools → Options… → Results/HTML → Manage Styles

Note, that style customization via Style Manager is only available for SAS Report and HTML output types.

In the left pane of the Style Manager there are 3 columns:

  1. Style representing style name;
  2. Location indicating whether it is Built-in Style (SAS-supplied CSS), My Style (your custom CSS), or External Style (any CSS - Cascading Style Sheet - on your local machine or on the Web; or a style template on a SAS server);
  3. URL showing the location of the CSS file.

Find a style in the left pane list you wish to modify. Notice that SAS-supplied built-in styles are not editable (Edit button is grayed out). First, make a copy of this style by pressing Create a Copy button. You can also make a copy of a style by right-clicking on it and selecting Create a Copy from the pop-up menu.  This will open Save Style As window where you can give it a name and select a Save in location.

Your new style appears in the Style List of the Style Manager. Click on the new style name and then press Edit button (alternatively, you may right-click on the new style name and select Edit from the pop-up menu):

Style Editor window

This will open the Style Editor window where you can modify text and border attributes, specify background and banner images, as well as assign any custom CSS property name / property value pairs.

Click OK button when you are done to return to the Style Manager. There you may even set your custom style as default, by selecting it first and then pressing the Set as Default button.

Besides editing your new style in Style Manager, you may also open your-new-style.css file in a Text Editor and edit CSS there.

Adding an external style to Enterprise Guide

You can add external styles to your Style List in the Style Manager. While in the Style Manager, click on Add button, this will open the Add New Style window:

Adding an external style

Make sure Add new external style radio box is selected. Type in a Style name for your external style and Style URL, which can be a folder/directory path name on your local machine or your network (e.g. C:\your_folder\your_css_file_name.css) or a location on the Web (e.g. http://www.some_domain.com/styles/your_special_style.css).

To make your custom styles available to all SAS EG users in your organization, you may create them as a SAS style template using PROC TEMPLATE and place on a SAS server (server-side style), see this SAS Code Sample.  In this case, you can add your custom style to the Style Manager by selecting This is a SAS server style only check box in the above Add New Style window. The Style URL field will become disabled, as it is only used to specify CSS stylesheet:

Checking This is a SAS server style only checkbox

You would select this checkbox if you only want to use server-side style (the STYLE= option is always present) and do not want to also provide and apply an optional CSS stylesheet (STYLESHEET=).

Conclusion

In this post I tried to present a comprehensive guide on using styles in SAS Enterprise Guide. Please use the Comments section below to share your experience with Enterprise Guide as it relates to reports styling.

Resources

Little SAS Enterprise Guide bookThe Little SAS Enterpriser Guide Book

Point-and-Click Style Editing in SAS® Enterprise Guide®

I Didn’t  Know SAS®  Enterprise Guide®  Could Do  That!

Creating reports in style with SAS Enterprise Guide was published on SAS Users.

9月 152017
 

Whether you are a SAS code creator, a blogger, a technical writer, an editor-in-chief, an executive, a secretary, a developer or programmer in any programming language or simply someone who uses computer or hand-held device for writing, you need to read this blog post – your life is about to change forever!

Did you know that you can use a web browser as a SAS code editor? I’m not talking about browser-based SAS programming interfaces like SAS University Edition or SAS Studio; these are full-blown applications. I’m talking about converting a regular web browser into a “notepad” where you can type, display, and save your SAS code. Or non-SAS code. Or practically anything. And you don’t even have to be connected to the Internet to use this browser functionality.

Converting a web browser into a notepad

This trick works with most modern browsers:

  • Chrome
  • Firefox
  • Opera
  • Safari

It will not work on Internet Explorer 11.

Try this: open your web browser (I am using Firefox in the examples below) or a new tab in your browser and type the following in the URL field (case insensitive):

data:text/html,<html contentEditable>

Hit Enter. Then click anywhere in the browser body.

Your browser has just turned into a Notepad. You can now type anything in it, including SAS code:

Using web browser as SAS code editor

In order to save your SAS code in Firefox, click on File ⇒ Save Page As… and save it as type Text Files (*.txt;*.text):

Saving SAS code in a browser as text file

This functionality is possible thanks to HTML5’s contentEditable attribute and the browsers’ ability to handle data URL.

I don’t know about you, but I find this browser-notepad feature very cool and handy. Not only does it allow you to type SAS code in your browser, but it also gives you the capability to take notes and copy & paste excerpts or code snippets from other web pages on non-web applications. If you use WebEx or Skype or Lync to present one of the SAS web browser-based products such as SAS Visual Analytics, SAS Visual Statistics, etc., you can share your browser to your audience and make one of the tabs a typeable area. Then during your presentation you may switch between browser tabs depending on whether you are presenting SAS VA/VS or your own on-the-fly typing.

Bookmark notepad in a browser

If you like this Notepad browser feature, you can easily bookmark it by placing it on the Bookmarks toolbar. In this case, I suggest typing the following line in the URL filed:

data:text/html,<html contentEditable><title>Notepad</title>

and then dragging the image in front of this URL string and dropping it to the Bookmarks toolbar to create a button. Then, every time you need a Notepad it is at your fingertips; you just need to click the button:

Bookmark SAS code editor in a browser

Styling your new SAS editor in a browser

By default, your browser editor does not look pretty. However, you can apply CSS styles to it to make it look better. You can control font (style, size, color), margins, paddings, background and other CSS style attributes. For example, try the following URL:

data:text/html, <textarea style="width:100%; height:100%; padding:20px; font-size:2em; font-family: SAS Monospace; color:darkblue; border:none; border-left: 10px solid lightblue; margin-left: 30px;" autofocus/>

Your web browser editor becomes much more presentable:

Customize SAS code editor in a browser

The autofocus attribute places cursor immediately in the typing area of the browser notepad, without having to click on the browser body first.

I want to hear from you!

Do you like this editable browser feature? Would you use it to enhance your presentations? Do you envision yourself writing SAS code in a browser? An article, a blog post? What other usages can you envision using such a web browser transformation? Do you have any ideas to expand this notepad browser functionality beyond presenting, typing, taking notes, copying/pasting, and saving your SAS code? Can you apply SAS color syntax highlighting in a browser? Or a background image? How about submitting your SAS code from a browser?

Using a web browser as a SAS code editor was published on SAS Users.

8月 022017
 

In my prior posts (Data-driven SAS macro loops, Modifying variable attributes in all datasets of a SAS library, Automating the loading of multiple database tables into SAS tables), I presented various data-driven applications using SAS macro loops.

However, macro loops are not the only tools available in SAS for developing data-driven programs.

CALL EXECUTE is one of them. The CALL EXECUTE routine accepts a single argument that is a character string or character expression. The character expression is usually a concatenation of strings containing SAS code elements to be executed after they have been resolved. Components of the argument expression can be character constants, data step variables, macro variable reference, as well as macro references. CALL EXECUTE dynamically builds SAS code during DATA step iterations; that code executes after the DATA step’s completion outside its boundary. This makes a DATA step iterating through a driver table an effective SAS code generator similar to that of SAS macro loops.

However, the rather peculiar rules of the CALL EXECUTE argument resolution may make its usage somewhat confusing. Let’s straighten things out.

Argument string has no macro or macro variable reference

If an argument string to the CALL EXECUTE contains SAS code without any macro or macro variable references, that code is simply pushed out (of the current DATA step) and appended to a queue after the current DATA step. As the DATA step iterates, the code is appended to the queue as many times as there are iterations of the DATA step. After the DATA step completes, the code in the queue gets executed in the order of its creation (First In First Out).

The beauty of this process is that the argument string can be a concatenation of character constants (in single or double quotes) and SAS variables which get substituted with their values by CALL EXECUTE for each DATA step iteration. This will produce data-driven, dynamically generated SAS code just like an iterative SAS macro loop.

Let’s consider the following example. Say we need to load multiple Oracle tables into SAS tables.

Step 1. Creating a driver table

In order to make our process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* create a list of tables to extract & load */
libname parmdl '/sas/data/parmdata';
data parmdl.tablelist;
        length tname $8;
        input tname;
        datalines;
ADDRESS
ACCOUNT
BENEFIT
FINANCE
HOUSING
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

Then, you can use the following data-driven program that runs each time you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}ABCDEFG12345678RTUR" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
data _null_;
   set parmdl.tablelist;
   call execute(cats(
      'data sasdl.',tname,';',
         'set oralib.',tname,';',
      'run;'));
run;

In order to concatenate the components of the CALL EXECUTE argument I used the cats() SAS function which returns a concatenated character string removing leading and trailing blanks.

When this program runs, the SAS log indicates that after the data _null_ step the following statements are added and executed:

NOTE: CALL EXECUTE generated line.
1   + data sasdl.ADDRESS;set oralib.ADDRESS;run;
2   + data sasdl.ACCOUNT;set oralib.ACCOUNT;run;
3   + data sasdl.BENEFIT;set oralib.BENEFIT;run;
4   + data sasdl.FINANCE;set oralib.FINANCE;run;
5   + data sasdl.HOUSING;set oralib.HOUSING;run;

In this example we use data _null_ step to loop through the list of tables (parmdl.tablelist) and for each value of the tname column a new data step gets generated and executed after the data _null_ step. The following diagram illustrates the process:

Diagram explaining CALL EXECUTE for SAS data-driven programming

Argument string has macro variable reference in double quotes

If an argument to the CALL EXECUTE has macro variable references in double quotes, they will be resolved by the SAS macro pre-processor during the DATA step compilation. Nothing unusual. For example, the following code will execute exactly as the above, and macro variable references &olib and &slib will be resolved to oralib and sasdl before CALL EXECUTE takes any actions:

%let olib = oralib;
%let slib = sasdl;
 
data _null_;
   set parmdl.tablelist;
   call execute (
      "data &slib.."!!strip(tname)!!';'
         "set &olib.."!!strip(tname)!!';'!!
      'run;'
   );
run;

Argument string has macro or macro variable reference in single quotes

Here comes the interesting part. If the argument to CALL EXECUTE has macro or macro variable references in single quotes, they still will be resolved before the code is pushed out of the DATA step, but not by the SAS macro pre-processor during the DATA step compilation as it was in the case of double quotation marks. Macro or macro variable references in single quotes will be resolved by CALL EXECUTE itself. For example, the following code will execute exactly as the above, but macro variable references &olib and &slib will be resolved by CALL EXECUTE:

%let olib = oralib;
%let slib = sasdl;
 
data _null_;
   set parmdl.tablelist;
   call execute('data &slib..'!!strip(tname)!!';'!!
                'set &olib..'!!strip(tname)!!';'!!
                'run;'
               );
run;

Timing considerations

CAUTION: If your macro contains some non-macro language constructs for assigning macro variables during run time, such as a CALL SYMPUT or SYMPUTX statement (in a DATA step) or an INTO clause (in PROC SQL), resolving those macro variable references by CALL EXECUTE will happen too soon, before your macro-generated code gets pushed out and executed. This will result in unresolved macro variables. Let’s run the following code that should extract Oracle tables into SAS tables as above, but also re-arrange column positions to be in alphabetical order:

%macro onetable (tblname);
   proc contents data=oralib.&tblname out=one(keep=name) noprint;
   run;
 
   proc sql noprint;
      select name into :varlist separated by ' ' from one;
   quit;
   %put &=varlist;
 
   data sasdl.&tblname;
      retain &varlist;
      set oralib.&tblname end=last nobs=n;
      if last then call symput('n',strip(put(n,best.)));
   run;
   %put Table &tblname has &n observations.;
%mend onetable;
 
data _null_;
   set parmdl.tablelist;
   call execute('%onetable('!!strip(tname)!!');');
run;

Predictably, the SAS log will show unresolved macro variable references, such as:

WARNING: Apparent symbolic reference VARLIST not resolved.
WARNING: Apparent symbolic reference N not resolved.
Table ADDRESS has &n observations.

SOLUTION: To avoid the timing issue when a macro reference gets resolved by CALL EXECUTE too soon, before macro variables have been assigned during macro-generated step execution, we can strip CALL EXECUTE of the macro resolution privilege. In order to do that, we can mask & and % characters using the %nrstr macro function, thus making CALL EXECUTE “macro-blind,” so it will push the macro code out without resolving it. In this case, macro resolution will happen after the DATA step where CALL EXECUTE resides. If an argument to CALL EXECUTE has a macro invocation, then including it in the %nrstr macro function is the way to go. The following code will run just fine:

data _null_;
   set parmdl.tablelist;
   call execute('%nrstr(%onetable('!!strip(tname)!!'));');
run;

When this DATA step runs, the SAS log indicates that the following statements are added and executed:

NOTE: CALL EXECUTE generated line.
1   + %onetable(ADDRESS);
2   + %onetable(ACCOUNT);
3   + %onetable(BENEFIT);
4   + %onetable(FINANCE);
5   + %onetable(HOUSING);

CALL EXECUTE argument is a SAS variable

The argument to CALL EXECUTE does not necessarily have to contain or be a character constant. It can be a SAS variable, a character variable to be precise. In this case, the behavior of CALL EXECUTE is the same as when the argument is a string in single quotes. It means that if a macro reference is part of the argument value it needs to be masked using the %nrstr() macro function in order to avoid the timing issue mentioned above.

In this case, the argument to the CALL EXECUTE may look like this:

arg = '%nrstr(%mymacro(parm1=VAL1,parm2=VAL2))';
call execute(arg);

Making CALL EXECUTE totally data-driven

In the examples above we used the tablelist driver table to retrieve values for a single macro parameter for each data step iteration. However, we can use a driver table not only to dynamically assign values to one or more macro parameters, but also to control which macro to execute in each data step iteration. The following diagram illustrates the process of totally data-driven SAS program:

Diagram explaining using CALL EXECUTE for SAS data-driven programming

Conclusion

CALL EXECUTE is a powerful tool for developing dynamic data-driven SAS applications. Hopefully, this blog post demonstrates its benefits and clearly explains how to avoid its pitfalls and use it efficiently to your advantage. I welcome your comments, and would love to hear your experiences with CALL EXECUTE.

CALL EXECUTE made easy for SAS data-driven programming was published on SAS Users.

6月 072017
 

Combine and conquer with SASThe ancient political maxim “divide and conquer” (Lat. “dīvide et īmpera”) has been used for millennia in politics, sociology, and psychology, mainly to manipulate people. In the last two centuries, however, this principle has also been adopted by computer science. It is widely used in computer programming, primarily in computer algorithms such as binary search, quick sort, Fast Fourier Transform (FFT), parallel processing algorithms – to name a few.

A typical Divide and Conquer algorithm consists of the following steps:

  1. Divide – break the given problem into smaller pieces (sub-problems) of the same type;
  2. Conquer – solve these smaller pieces (sub-problems);
  3. Combine – aggregate the results of the solved sub-problems to deduce the result of the initial problem.

However, in this blog post we’ll reverse the “divide and conquer” principle and go backwards from step three to step two, to combine and then conquer.

In our day-to-day programming activities, we may not need to divide a problem in order to solve it. In many cases, we deal with small pieces of a puzzle (language elements) provided by the SAS programming language, and we solve some larger problems by combining those elements.

Let’s consider some examples.

Combining SAS Formats

Suppose you have a variable x in a SAS table on which you need to report with the following requirement. Depending on the variable value, you will need to apply different SAS formats:

Different SAS formats based on variable values

In other words, you need to display percentages that are greater than 10% as whole numbers, percentages between 1% and 10% as numbers with one decimal point, and percentages that are less than 1% as numbers with two decimal points.

You may get away with proc format’s picture statement with a proper multiplier – but good luck with that. Fortunately, there is a much better way of solving this problem. Did you know that you can use already existing formats as labels in proc format’s value statement? In other words, we can combine several SAS formats while defining another SAS format.

The above problem is easily solved by defining a new format as a combination of different PERCENTw.d formats as follows:

proc format;
   value pctmod
      -0.01 <-< 0.01 = [percent10.2] 
      -0.1 - -0.01, 0.01 - 0.1 = [percent10.1]
      low -< -0.1, 0.1 <- high = [percent10.0]
      ;
run;

Here is what this format does:

Unformatted and formatted values

You can combine any other existing formats as needed, including mixing and matching SAS-supplied formats and user-defined formats. You can even use existing formats within the PICTURE statement of the PROC FORMAT. For example, the following format will format values between 0 and 1 as cents (e.g. ¢75) while values greater than 1 as dollars (e.g. $9,340):

proc format;
   picture dolcent
	  0 <-< 1 = '99' (prefix='¢' mult=100)
	  1 - high = [dollar32.0]
	  ;
run;

Here is what this format does:

Formatted values vs. unformatted values

When using an existing format as a label, make sure to enclose the format name in square brackets, e.g. [date9.], and do not enclose the name of the existing format in single or double quotes.

Using an existing format in the format definition can be construed as format nesting. A nested level of one means that you are creating the format A with the format B as a formatted value. However, if you also create format B with the format C as a formatted value, then you have nested level of two. Avoid nesting formats with a level higher than one, as the resource requirements may increase dramatically with each additional level.

Combining SAS Libraries

Suppose you have defined several SAS data libraries, such as

libname apples 'c:\projects\garden\apples';
libname oranges 'c:\projects\garden\oranges';
libname tomatoes 'c:\projects\garden\tomatoes';

You can then logically combine (concatenate) them into a single library:

libname combolib (apples oranges tomatoes);

or

libname combolib ('c:\projects\garden\apples' 'c:\projects\garden\oranges' 'c:\projects\garden\tomatoes');

Yes, you can combine apples with oranges, and even with tomatoes, and that will enable you to access the SAS data sets in several libraries with one libref.

As you can see, you can combine two or more libraries by their librefs, physical names, or even a combination of librefs and physical names, such as:

libname combolib (apples oranges 'c:\projects\garden\tomatoes');

SAS will even allow using commas between the concatenated libraries:

libname combolib (apples, oranges, tomatoes);
libname combolib (apples, oranges, 'c:\projects\garden\tomatoes');

It is important to know that while SAS has no problem combining multiple libraries that contain same-named files, when you reference such a file, the concatenated libraries are searched in order and the first occurrence of the specified file is used.
This proves to be extremely useful when you have multiple versions of the same tables that you promote between different environments during a system development cycle. For example, you might have a SAS data table named SALES in either some or all of the development, testing, and production environments. You can define librefs for each environment as well as a libref for the combined environment:

libname devllib 'c:\projects\proj_x\devl';
libname testlib 'c:\projects\proj_x\test';
libname prodlib 'c:\projects\proj_x\prod';
 libname lastlib (prodlib testlib devllib);

Then when you reference your table as lastlib.SALES SAS will return the SALES table from production environment if it exists there; if not then from testing environment if it exists there; if not then from development environment it exists there. That logical hierarchy means that SAS will return the latest and greatest version of that SALES table. If you want that table from a specific environment, then you reference it using a specific libref: prodlib.SALES, testlib.SALES, or devllib.SALES.

Combining SAS Format Libraries

You can combine SAS format libraries to achieve similar version prioritization as in the above case of SAS data libraries using FMTSEARCH= System Option:

options fmtsearch=(prodlib testlib devllib);

This implies that your SAS formats are stored in the catalogs named FORMATS in the above specified libraries: prodlib.FORMATS, testlib.FORMATS, devllib.FORMATS.

One must remember that if not specified, WORK and LIBRARY libraries by default take precedence in the order in which SAS formats are searched, making the above options statement effectively look like:

options fmtsearch=(work library prodlib testlib devllib);

If WORK and LIBRARY libraries are specified in the fmtsearch= then the default is not applied and the search order is as specified.

Your turn

An unlimited space below is reserved for your contribution to the fascinating topic of combine and conquer with SAS. This is your chance to contribute to the list of coding techniques that you use to combine and conquer solving your SAS coding problems.

 

Combine and conquer with SAS was published on SAS Users.

4月 282017
 

ETL automationWhen developing SAS applications, you can feed database tables into your application by using the libname access engine either by directly referring a database table, or via SAS or database views that themselves refer to one or more of the database tables.

However, such on-the-fly data access may not be efficient enough, especially for interactive SAS applications when system response time is critical. In case of distributed systems where a database might reside on one server, and the SAS Application server – on a different physical server, user wait time caused by delays in data access across networks and databases might become intolerable. In such cases, it makes perfect sense to extract database tables and load them into SAS tables in advance, preferably placing them on (or close to) the SAS Application server. That will ensure higher operational efficiency (responsiveness) of the interactive SAS application.

SAS Data Integration Studio provides a powerful visual design tool for building, implementing, and managing data integration processes across different data sources, applications, and platforms.

However, in case of multiple tables, loading them one by one using even such a powerful tool might become a bit tedious. In an automation paradigm, being “visual” is not necessarily a good thing; when we automate it implies that we want to get things done without visualizing or even seeing them.

Here is a SAS coding approach, which I recently used for a customer project, to automate the repetitive process.

SAS code example of loading multiple Oracle tables into SAS tables

There were several Oracle tables that needed to be extracted and loaded into SAS data tables on a different physical server. The Oracle tables contained multi-year data, but we were only interested in the latest year’s data. That is those tables that have datetime stamp EVENT_DT column needed to be subset to the latest rolling 365 days.

Step 1. Creating a driver table

In order to make this process data-driven, let’s first create a driver table containing a list of the table names that needed to be extracted and loaded:

/* --------------------------------------------------- */
/* create table list to extract &amp; load,                */
/* dt_flag indicates whether EVENT_DT variable present */
/* --------------------------------------------------- */
 
libname parmdl '/sas/data/parmdata';
 
data parmdl.tablelist;
        length tname $8 dt_flag $1;
        input tname dt_flag;
        datalines;
ADDRESS N
ACCOUNT Y
BENEFIT Y
EXCLUSN N
FINANCE Y
HOUSING Y
;

This program runs just once to create the driver table parmdl.tablelist.

Step 2. Loading multiple tables

The following data-driven program runs every time when you need to reload Oracle tables into SAS:

/* source ORACLE library */
libname oralib oracle path="xxx" schema="yyy" user="uuu"
 	PASSWORD="{SAS002}D2DF612A161F7F874C4EF97F" access=readonly;
 
/* target SAS library */
libname sasdl '/sas/data/appdata';
 
/* driver table SAS library */
libname parmdl '/sas/data/parmdata';
 
options symbolgen mprint mlogic;
%let cutoff_date = %eval(%sysfunc(today()) - 365);
 
/* --------------------------------------------------- */
/* loop through table list to extract & load           */
/* --------------------------------------------------- */
%macro ETL;
   %let dsid = %sysfunc(open(parmdl.tablelist));
   %syscall set(dsid);
   %do %while(%sysfunc(fetch(&dsid)) eq 0);
      data sasdl.&tname;
         set oralib.&tname;
         %if &dt_flag eq Y %then %str(where datepart(EVENT_DT) ge &cutoff_date;);
      run;
   %end;
   %let rc = %sysfunc(close(&dsid));
%mend ETL;
%ETL

In this code, we use a similar coding technique to that described in my earlier blog post Modifying variable attributes in all datasets of a SAS library. We loop through the initially created table parmdl.tablelist. in a macro %do-loop, and for each observation containing a single table name, implement a data step extracting one table at a time.

If macro variable &dt_flag=Y, then we generate and insert a where statement sub-setting the extracted table to the latest year’s data.

Note a very useful feature here %syscall set(dsid) that links SAS data set variables to macro variables that have the same name and data type. Since we have not defined those macro variables, in essence, it creates a namesake macro variable for each SAS data set variable.

Implementing data-driven load in SAS Data Integration Studio

The above SAS code is run on the SAS Application server – the same server where the SAS target library resides. To fully automate the ETL process, this code may be embedded into a SAS job in SAS Data Integration Studio and then scheduled via either Platform Suite for SAS or your operating system scheduler. Alternately, it can be run in batch mode or via script that that in turn can be scheduled by the operating system scheduler.

In SAS DI Studio:

1.  Create a New job and give it a meaningful name (e.g. Loading Oracle Tables into SAS Tables.)

2.  Drag & drop User Written Code transformation onto the SAS Job canvas and give it a meaningful name (e.g. ETL macro):

SAS Data Integration Studio Job

3.  Go to the Properties of this transformation and click on the Code tab, then paste your SAS code in the text entry area:

SAS DI job properties

4.  Click OK and Save your SAS Job.

Note, that we do not add Source and Target tables to the Job canvas, only User Written Code transformation. Also note, that the %ETL macro has no direct references to the database table names being extracted and loaded into SAS datasets. Instead, it receives the names of the tables to be loaded from the PARMDL.TABLELIST dataset. This makes it extremely flexible to the task of processing a completely different set of tables; all you need to do is update the table list in the driver table. No change to your SAS DI job is needed.

Now it's your turn

What is your experience with automating ETL processes? Please feel free to share it here in the Comments section.

 

Automating the loading of multiple database tables into SAS tables was published on SAS Users.

4月 132017
 
The Penitent Magdalene

Titian (Tiziano Vecellio) (Italian, about 1487 - 1576) The Penitent Magdalene, 1555 - 1565, Oil on canvas 108.3 × 94.3 cm (42 5/8 × 37 1/8 in.) The J. Paul Getty Museum, Los Angeles; Digital image courtesy of the Getty's Open Content Program.

Even if you are a traditional SAS programmer and have nothing to do with cybersecurity, you still probably have to deal with this issue in your day-to-day work.

The world has changed, and what you do as a SAS programmer is not just between you and your computer anymore. However, I have found that many of us are still careless, negligent or reckless enough to be dangerous.

Would you scotch-tape your house key to the front door next to the lock and go on vacation? Does anybody do that? Still, some of us have no problem explicitly embedding passwords in our code.

That single deadly sin, the thing that SAS programmers (or any programmers) must not do under any circumstances, is placing unmasked passwords into their code. I must confess, I too have sinned, but I have seen the light and hope you will too.

Password usage examples

Even if SAS syntax calls for a password, never type it or paste it into your SAS programs. Ever.

If you connect to a database using a SAS/ACCESS LIBNAME statement, your libname statement might look like:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password=mypwd1;

If you specify the LIBNAME statement options for the metadata engine to connect to the metadata server, it may look like:

libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw=abcdefg;

If you use LIBNAME statement options for the metadata engine to connect to a database, it may look like:

libname oralib meta library=oralib dbuser=orauser dbpassword=orapw;

In all of the above examples, some password is “required” to be embedded in the SAS code. But it does not mean you should put it there “as is,” unmasked. SAS program code is usually saved as a text file, which is stored on your laptop or somewhere on a server. Anyone who can get access to such a file would immediately get access to those passwords, which are key to accessing databases that might contain sensitive information. It is your obligation and duty to protect this sensitive data.

Hiding passwords in a macro variable or a macro?

I’ve seen some shrewd SAS programmers who do not openly put passwords in their SAS code. Instead of placing the passwords directly where the SAS syntax calls for, they assign them to a macro variable in AUTOEXEC.SAS, an external SAS macro file, a compiled macro or some other SAS program file included in their code, and then use a macro reference, e.g.

/* in autoexec.sas or external macro */
%let opw = mypwd1;
 
/* or */
 
%macro opw;
	mypwd1
%mend opw;
/* in SAS program */
libname mydblib oracle user=myusr1 password=&opw
        path=airdb_remote schema=hrdept;
 
/* or */
 
libname mydblib oracle user=myusr1 password=%opw
        path=airdb_remote schema=hrdept;

Clever! But it’s no more secure than leaving your house key under the door mat. In fact it is much less secure. One who wants to look up your password does not even need to look under the door mat, oh, I mean look at your program file where the password is assigned to a macro variable or a macro. For a macro variable, a simple %put &opw; statement will print the password’s actual value in the SAS log. In case of a macro, one can use %put %opw; with the same result.

In other words, hiding the passwords do not actually protect them.

What to do

What should you do instead of openly placing or concealing those required passwords into your SAS code? The answer is short: encrypt passwords.

SAS software provides a powerful procedure for encrypting passwords that renders them practically unusable outside of the SAS system.

This is PROC PWENCODE, and it is very easy to use. In its simplest form, in order to encrypt (encode) your password abc123 you would need to submit just the following two lines of code:

proc pwencode in="abc123";
run;

The encrypted password is printed in the SAS log:

1 proc pwencode in=XXXXXXXX;
2 run;

{SAS002}3CD4EA1E5C9B75D91A73A37F

Now, you can use this password {SAS002}3CD4EA1E5C9B75D91A73A37F in your SAS programs. The SAS System will seamlessly take care of decrypting the password during compilation.

The above code examples can be re-written as follows:

libname mydblib oracle path=airdb_remote schema=hrdept
	user=myusr1 password="{SAS002}9746E819255A1D2F154A26B7";
 
libname myeng meta library=mylib
	repname=temp metaserver='a123.us.company.com' port=8561 
 		user=idxyz pw="{SAS002}9FFC53315A1596D92F13B4CA";
 
libname oralib meta library=oralib dbuser=orauser
dbpassword="{SAS002}9FFC53315A1596D92F13B4CA";

Encryption methods

The {SAS002} prefix indicates encoding method. This SAS-proprietary encryption method which uses 32-bit key encryption is the default, so you don’t have to specify it in the PROC PWENCODE.

There are other, stronger encryption methods supported in SAS/SECURE:

{SAS003} – uses a 256-bit key plus 16-bit salt to encode passwords,

{SAS004} – uses a 256-bit key plus 64-bit salt to encode passwords.

If you want to encode your password with one of these stronger encryption methods you must specify it in PROC PWENCODE:

proc pwencode in="abc123" method=SAS003;
run;

SAS Log: {SAS003}50374C8380F6CDB3C91281FF2EF57DED10E6

proc pwencode in="abc123" method=SAS004;
run;

SAS Log: {SAS004}1630D14353923B5940F3B0C91F83430E27DA19164FC003A1

Beyond encryption

There are other methods of obscuring passwords to protect access to sensitive information that are available in the SAS Business Intelligence Platform. These are the AUTHDOMAIN= SAS/ACCESS option supported in LIBNAME statements, as well as PROC SQL CONNECT statements, SAS Token Authentication, and Integrated Windows Authentication. For more details, see Five strategies to eliminate passwords from your SAS programs.

Conclusion

Never place unencrypted password into your SAS program. Encrypt it!

Place this sticker in front of you until you memorize it by heart.

PROC PWENCODE sticker

 

One deadly sin SAS programmers should stop committing was published on SAS Users.

4月 012017
 

Solar farm on SAS campus The full text of Fermat's statement, written in Latin, reads "Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet."

The English translation is: "It is impossible for a cube to be the sum of two cubes, a fourth power to be the sum of two fourth powers, or in general for any number that is a power greater than the second to be the sum of two like powers. I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain."

Here at SAS, we don’t take challenges lightly. After a short but intensive brainstorming, we came up with a creative and powerful SAS code that effectively proves this long-standing theorem. And it is so simple and short that not only can it be written on the margins of this blog, it can be tweeted!

Drum roll, please!

Here is the SAS code:

data _null_; 
	do n=3 by 1; 
		do a=1 by 1; 
			do b=1 by 1; 
				do c=1 by 1; 
					e = a**n + b**n = c**n;
					if e then stop; 
				end;
			end; 
		end;
	end;
run;

Or written compactly, without unnecessary spaces:

data _null_;do n=3 by 1;do a=1 by 1;do b=1 by 1;do c=1 by 1;e=a**n+b**n=c**n;if e then stop;end;end;end;end;run;

which is exactly 112 character long – well below the Twitter 140-character threshold.

Don’t be fooled by the utter simplicity and seeming unfeasibility of this code.  For the naysayers, let me clarify that we run this code in a distributed multithreaded environment where each do-loop runs as a separate thread.

We also use some creative coding techniques:

1.     Do-loop with just two options, count= and by=, but without the to= option (e.g. do c=1 by 1;). It is a valid syntax in SAS and serves the purpose of creating infinite loops when they are necessary (like in this case). You can easily test it by running the following SAS code snippet:

data _null_;
	start = datetime();
	do i=1 by 1;
		if intck('sec',start,datetime()) ge 20 then leave;
	end;
run;

The if-statement here is added solely for the purpose of specifying a wait time (e.g. 20) sufficient for persuading you in the loop’s infiniteness. Skeptics may increase this number to their comfort level or even remove (or comment out) the if-statement and enjoy the unconstrained eternity.

2.     Expression with two “=” signs in it (e.g. e = a**n + b**n = c**n;) Again, this is a perfectly valid expression in SAS and serves the purpose of assigning a variable the value of 0 or 1 resulting from a logical comparison operation. This expression can be rewritten as

e = a**n + b**n eq c**n;

or even more explicitly as

e = (a**n + b**n eq c**n);

As long as the code runs, the theorem is considered proven. If it stops, then the theorem is false.

You can try running this code on your hardware, at your own risk, of course.

We have a dedicated 128-processor UNIX server powered by an on-campus solar farm that has been autonomously running the above code for 40 years now, and there was not a single instance when it stopped running. Except pausing for the scheduled maintenance and equipment replacements.

During the course of this historic experience, we have accumulated an unprecedented amount of big data (all in-memory), converted it into event stream processing, and become a leader in data mining and business analytics.

This leads us to the following scientific conclusion: whether you are a pure mathematician or an empiricist, you can rest assured that Fermat's Last Theorem has been proven with a probability asymptotic to 1 beyond a reasonable doubt.

Have a happy 91-st day of the year 2017!

 

SAS code to prove Fermat's Last Theorem was published on SAS Users.

3月 292017
 

There is a well-known Russian saying that goes “Если нельзя, но очень хочется, то можно.” The English translation of it can span anywhere from “If you can’t, but want it badly, then you can” to “If you shouldn’t, but want it badly, then you should” to “If you may not, but want it badly, then you may.” Depending on your situation, any possible combination of “may,” “can,” or “should” may apply. You can even replace “want” with “need” to get a slightly different flavor.

There are known means of modifying variable attributes with PROC DATASETS, but they are limited to variable name, format, informat, and label. But what if we want/need to modify a variable length, or change a variable type? And I am not talking about creating a new variable with a different length or converting a numeric variable value into a character value of another variable. We want to change variable type and/or variable length in place, without adding new variables. If you believe it can’t be done, read the first paragraph again.

Imagine that we have two data tables that we need to concatenate into one table. However, there is one common variable that is of different type in each table – in the first table it is numeric, but in the second table it is character.

Sample data

Let’s create some sample data to emulate our situation by running the following SAS code:

libname sasdl 'C:PROJECTS_BLOG_SASchanging_variable_type_and_length_in_sas_datasets';
 
/* create study2016 data table */
data sasdl.study2016;
	length subjid dob 8 state $2 start_date end_date 8;
	infile datalines truncover;
	input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.;
	format dob start_date end_date mmddyy10.;
	datalines;
123456 08/15/1960 MD 01/02/2016
234567 11/13/1970 AL 05/12/2016 12/30/2016
;
 
/* create study2017 data table */
data sasdl.study2017;
	length subjid $6 dob 4 state $2 start_date end_date 4;
	infile datalines truncover;
	input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.;
	format dob start_date end_date mmddyy10.;
	datalines;
987654 03/15/1980 VA 02/13/2017
876543 11/13/1970 NC 01/11/2017 01/30/2017
765432 12/15/1990 NY 03/14/2017
;

The produced data tables will look as follows:

Table STUDY2016:
SAS data table 1

Table STUDY2017:
SAS data table 2

If we look at the tables’ variable properties, we will see that the subjid variable is of different type in these two data tables: it is of type Numeric (length of 8) in STUDY2016 and of type Character (length of 6) in STUDY2017:

SAS variables properties

Also, notice that variables dob, start_date, and end_date, although of the same Numeric type, have different length attributes - 8 in the STUDY2016 table, and 4 in the STUDY2017 table.

Data table concatenating problem

If we try to concatenate these two tables using PROC APPEND, SAS will generate an ERROR:

proc append base=sasdl.study2016 data=sasdl.study2017;
run;
 
NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016.
WARNING: Variable subjid not appended because of type mismatch.
WARNING: Variable dob has different lengths on BASE and DATA
         files (BASE 8 DATA 4).
WARNING: Variable start_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
WARNING: Variable end_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
ERROR: No appending done because of anomalies listed above.
       Use FORCE option to append these files.
NOTE: 0 observations added.

Even if we do use the FORCE option as the ERROR message suggests, the result will be disappointing:

proc append base=sasdl.study2016 data=sasdl.study2017 force;
run;
 
NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016.
WARNING: Variable subjid not appended because of type mismatch.
WARNING: Variable dob has different lengths on BASE and DATA
         files (BASE 8 DATA 4).
WARNING: Variable start_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
WARNING: Variable end_date has different lengths on BASE and
         DATA files (BASE 8 DATA 4).
NOTE: FORCE is specified, so dropping/truncating will occur.

The resulting data table will have missing values for the appended subjid:

Missing values in SAS table after PROC APPEND

Solution

In order to concatenate these tables, we must make the mismatching variable subjid of the same type in both data tables, either Character or Numeric. Making them both of Character type seems more robust, since it would allow for the values to contain both digit and non-digit characters. But if you know for sure that the value contains only digits, making them both Numeric works just as well.

Let’s say we decide to make them of Character type. Also note that our numeric variables representing dates (dob, start_date and end_date) are of different lengths: they are length 8 in STUDY2016 and length 4 in STUDY2017. Let’s make them the same length as well. From the standpoint of numerical accuracy in SAS for the dates, a length of 4 seems to be quite adequate to represent them accurately.

Let’s apply all our modifications to the STUDY 2016 dataset. Even though we are going to re-build the dataset in order to modify variable type and length, we are going to preserve the variable order so it feels like we just modified those variable attributes.

Here is how it can be done.

/* create macrovariable varlist containing a list of variable names */
proc sql noprint;
	select name into :varlist separated by ' '
	from sashelp.vcolumn
	where upcase(libname) eq 'SASDL' and upcase(memname) eq 'STUDY2016';
quit;
 
/* modify variable type and length */
data sasdl.study2016 (drop=v1-v4);
	retain &varlist; *<-- preserve variable order ;
	length subjid $6 dob start_date end_date 4; *<-- define new types/lengths ;
	format dob start_date end_date mmddyy10.;   *<-- recreate formats ;
	set sasdl.study2016 (rename=(subjid=v1 dob=v2 start_date=v3 end_date=v4));
	subjid = put(v1,6.); *<-- redefine subjid variable ;
	dob = v2;            *<-- redefine dob variable ;
	start_date = v3;     *<-- redefine start_date variable ;
	end_date = v4;       *<-- redefine end_date variable ;
run;
 
/* make sure new concatenated file (study_all) does not exist */
proc datasets library=sasdl nolist;
	delete study_all;
quit;
 
/* append both (study2016 and study2017) to study_all */
proc append base=sasdl.study_all data=sasdl.study2016;
run;
proc append base=sasdl.study_all data=sasdl.study2017;
run;

In this code, first, using proc sql and SAS view sashelp.vcolumn, we create a macro variable varlist to hold the list of all the variable names in our table, sasdl.study2016.

Then in the data step, we use a retain statement to preserve the variable order. When we read the sasdl.study2016 dataset using the set statement, we rename our variables-to-be-modified to some temporary names (e.g. v1 – v4) which we eventually drop in the data statement.

Then we re-assign the values of those temporary variables to the original variable names, thereby essentially creating new variables with new type and length. Since these new variables are named exactly as the old ones, the effect is as if their type and length attributes where modified, while in fact the whole table was rebuilt and replaced. Problem solved.

When we concatenate the data tables we create a new table sasdl.study_all. Before concatenating our two tables using proc append twice, we use proc datasets to delete that new table first. Even if the table does not exist, proc datasets will at least attempt to delete it. With all the seeming redundancy of this step, you will definitely appreciate it when you try running this code more than one time.

Changing variable type and variable length in SAS datasets was published on SAS Users.

2月 102017
 

Small matters matter. Imagine saving (or spending wisely) just 1 second of your time every hour. One measly second! During your lifespan you would save or spend wisely (1 sec-an-hour * 24 hours-a-day * 365 days-a-year x 100 years) / (3600 seconds-an-hour * 24 hours-a-day) = 10 days, a whole two week vacation!

While truncation vs rounding may seem to be insignificant in a given instance, the cumulative effect of either could be truly enormous, whether it’s truncation vs rounding of decimal numbers or of the SAS time values presented below.

From my prior post Truncating decimal numbers in SAS without rounding, we know that SAS formats such as w.d, DOLLARw.d, and COMMAw.d do not truncate decimal numbers, but rather round them.

However, SAS time value formats are somewhat different. Let’s take a look.

Suppose we have a SAS time value of '09:35:57't. As a reminder, a SAS time value is a value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400.

TIMEw.d Format

Let’s apply the TIMEw.d format to our time value and see what it does.

If you run the following SAS code:

data _null_;
	t = '09:35:57't;
	put t= time5.;
	put t= time2.;
run;

you will get in the SAS log:

t=9:35
t=9

which means that this format does truncate both seconds and minutes. Conversely, if rounding were taking place we would have gotten:

t=9:36
t=10

HHMMw.d Format

Let’s run the same SAS code with HHMMw.d format:

data _null_;
	t = '09:35:57't;
	put t= hhmm5.;
	put t= hhmm2.;
run;

SAS log will show:

t=9:36
t=9

What does that mean? It means that HHMMw.d format rounds seconds (in case of truncating I would expect to get t=9:35), but truncates minutes (in case of rounding I would expect to get t=10, as 35 minutes are closer to 10 than to 9). A bit inconsistent, at least for our purposes.

Truncating SAS time values

This little research above shows that out of the two formats, TIMEw.d and HHMMw.d, it is perfectly fine to use the TIMEw.d format for the purpose of SAS time value truncation, for both minutes and seconds.

Regardless of the format used, you can also truncate your time value computationally, before applying a format, by subtracting from that value a remainder of division of that value by 60 (for seconds truncation) or by 3600 (for minutes truncation). For example, the following code:

data _null_;
	t = '09:35:57't;
	t_m = t - mod(t,60);
	t_h = t - mod(t,3600);
	put t= hhmm5.;
	put t_m= hhmm5.;
	put t_h= hhmm5.;
run;

produces the following SAS log:

t=9:36
t_m=9:35
t_h=9:00

Rounding SAS time values

Now that we’ve learned both the computational method and the TIMEw.d format method of truncation, how do we go about rounding? As long as the format behavior is consistent we can use its truncating functionality to convert it into the rounding functionality. In order to do that we just need to increase the original time value by 60 (seconds) for seconds rounding, and by 3600 (seconds) for minutes rounding. Truncation of that new value is equivalent to rounding of the original value.

Let’s run the following SAS code:

data _null_;
	t = '09:35:57't;
	t_m = t + 60;
	t_h = t + 3600;
	put t_m= time5.;
	put t_h= time2.;
run;

SAS log will show:

t_m=9:36
t_h=10

which means that our original time value '09:35:57't was rounded in both cases – seconds rounding and minutes rounding.

Now you know how to truncate and how to round SAS time values. And don’t forget about your lifetime 2-week vacation opportunity by saving a second every hour; or make it 2 seconds per hour and enjoy the full month off.

tags: SAS Professional Services, SAS Programmers, tips & techniques

Truncating vs rounding SAS time values was published on SAS Users.