Here I share with you a data preparation approach and SAS coding technique that will significantly simplify, unclutter and streamline your SAS programming life by using data templates.
Dictionary.com defines template as “anything that determines or serves as a pattern; a model.” However, I was flabbergasted when my “prior art research” for the topic of this blog post ended rather abruptly: “No results found for data template.”
What do you mean “no results?!” (Yes, sometimes I talk to the Internet. Do you?) We have templates for everything in the world: MS Word templates, C++ templates, Photoshop templates, website templates, holiday templates, we even have our own PROC TEMPLATE. But no templates for data?
At that point I paused, struggling to accept reality, but then felt compelled to come up with my own definition:
A data template is a well-defined data structure containing a data descriptor but no data.
Therefore, a SAS data template is a SAS dataset (data table) containing the descriptor portion with all necessary attributes defined (variable types, labels, lengths, formats, and informats) and empty (zero observations) data portion.
Less clutter = greater efficiency
When you construct SAS data tables using SAS code or data management tools such as using data design documentation as a feed to code-generating SAS program.
Unfortunately, despite all these benefits the data template concept is not explicitly and consistently employed and is noticeably absent from data development methodologies and practices.
Let’s try to change that!
How to create SAS data templates from scratch
It is very easy to create SAS data template. Here is an example:
libname PARMSDL 'c:\projects\datatemplates'; data PARMSDL.MYTEMPLATE; label newvar1 = 'Label for new variable 1' newvar2 = 'Label for new variable 2' /* ... */ newvarN = 'Label for new variable N' ; length newvar1 newvar2 $40 newvarN 8 ; format newvarN mmddyy10.; informat newvarN date9.; stop; run;
First, you need to assign a permanent library (e.g. PARMSDL) where you are going to store your SAS dataset template. I usually do not store data templates in the same library as data. Nor do I store it in the same directory/folder where you store your SAS code. Ordinarily, I store data templates in a so-called parameter data library (that is why I use PARMSDL as a libref), along with other data defining SAS code structure.
In the data step, the very first statement LABEL defines all variables’ labels as well as variable position determined by the order in which they are listed.
Statement LENGTH defines variables’ types (numeric or character) and their length in bytes. Here you may group variables of the same length to shorten your code or define them individually to be more explicit.
Statement FORMAT defines variables’ formats as needed. You don’t have to define formats for all the variables; define them only if necessary.
Statement INFORMAT (also optional) defines informats that come handy if you use this data template for creating SAS datasets by reading external raw files. With informats defined on the data template, you won’t have to specify informats in your INPUT statement while reading external file, as the informats will be inherently associated with the variable names. That is why SAS data sets have informat attribute for its variables in the first place (if you ever wondered why.)
Finally, don’t forget the STOP statement at the end of your data step, just before the RUN statement. Otherwise, instead of zero observations, you will end up with a data table that has a single observation with all missing variable values. Not what we want.
It is worth noting that obs=0 system option will not work instead of the STOP statement as it is applied only to the data being read, but we read no data here. For the same reason, (obs=0) data set option will not work either. Try it, and SAS log will dispel your doubts:
data PARMSDL.MYTEMPLATE (obs=0); --- 70 WARNING 70-63: The option OBS is not valid in this context. Option ignored.
How to create SAS data templates by inheritance
If you already have some data table with well-defined variable attributes, you may easily create a data template out of that data table by inheriting its descriptor portion:
data PARMSDL.MYTEMPLATE; set SASDL.MYDATA (obs=0); run;
Option (obs=0) does work here as it is applied to the dataset being read, and therefore STOP statement is not necessary.
You can also combine inheritance with defining new variables, as in the following example:
data MYTEMPLATE; set SASDL.MYDATA (obs=0); *<-- inherited template; * variables definition: ; label newvar1 = 'Label for new variable 1' newvarN = 'Label for new variable N' oldvar = 'New Label for OLD variable’ ; length newvar1 $40 newvarN 8 oldvar $100 /* careful here, see notes below */ ; format newvarN mmddyy10.; informat newvarN date9.; run;
A word of warning
Be careful when your new variable definition type and length contradicts inherited definition.
You can overwrite/re-define inherited variable attributes such as labels, formats and informats with no problem, but you cannot overwrite type and in some cases length. If you do need to have a different variable type for a specific variable name on your data template, you should first drop that variable on the SET statement and then re-define it in the data step.
With the length attribute the picture is a bit different. If you try defining a different length for some variable, SAS will produce the following WARNING in the LOG:
WARNING: Length of character variable has already been set. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of a character variable.
You can either use the advice of the WARNING statement and place the LENGTH statement as the very first statement or at least before the SET statement. In this case, you will find that you can increase the length without a problem, but if you try to reduce the length relative to the one on the parent dataset SAS will produce the following WARNING in the LOG:
WARNING: Multiple lengths were specified for the variable by input data set(s). This can cause truncation of data.
In this case, a cleaner way will also be to drop that variable on the SET statement and redefine it with the LENGTH statement in the data step.
Keep in mind that when you drop these variables from the parent data set, besides losing their type and length attributes, you will obviously lose the rest of the attributes too. Therefore, you will need to re-define all the attributes (type, length, label, format, and informat) for the variables you drop. At least, this technique will allow you to selectively inherit some variables from a parent data set and explicitly define others.
How to use SAS data templates
One way to apply your data template to a newly created dataset is to: 1) Copy your data template in that new dataset; 2) Append your data table to that new data set. Here is an example:
/* copying data template into dataset */ data SASDL.MYNEWDATA set PARMSDL.MYTEMPLATE; run; /* append data to your dataset with descriptor */ proc append base=SASDL.MYNEWDATA data=WORK.MYDATA; run;
Your variable types and lengths should be the same on the BASE= and DATA= tables; labels, formats and informats will be carried over from the BASE= dataset/template.
It is simple, but could be simplified even more to reduce your code to just a single data step:
data SASDL.MYNEWDATA; if 0 then set PARMSDL.MYTEMPLATE; set WORK.MYDATA; /* other statements */ run;
Even though set PARMSDL.MYTEMPLATE; statement has never executed because of the explicitly FALSE condition (0 means FASLE) in the IF statement, the resulting dataset SASDL.MYDATA gets all its variable attributes carried over from the PARMSDL.MYTEMPLATE data template during data step compilation.
This same coding technique can be used to implicitly apply data variable attributes from a well-defined data set by inheritance even though that data set is not technically a data template (has more than 0 observations.) Run the following code to make sure MYDATA table has all the variables and attributes of the SASHELP.CARS data table while data values come from the ABC data set:
data ABC; make='Toyota'; run; data MYDATA; if 0 then set SASHELP.CARS; set ABC; run;
Perhaps the benefits of SAS data templates are best demonstrated when you read external data into SAS data table. Here is an example for you to run (of course, in real life MYTEMPLATE should be a permanent data set and instead of datalines it should be an external file):
data MYTEMPLATE; label fdate = 'Flight Date' count = 'Flight Count' fdesc = 'Flight Description' reven = 'Revenue, $'; length fdate count reven 8 fdesc $22; format fdate date9. count comma12. reven dollar12.2; informat fdate mmddyy10. count comma8. fdesc $22. reven comma10.; stop; run; data FLIGHTS; if 0 then set MYTEMPLATE; input fdate count fdesc & reven; datalines; 12/05/2018 500 Flight from DCA to BOS 120,034 10/01/2018 1,200 Flight from BOS to DCA 90,534 09/15/2018 2,234 Flight from DCA to MCO 1,350 ;
Here is how the output data set looks:
Notice how simple the last data step is. No labels, no lengths, no formats, no informats – no clutter. Yet, the raw data is read in nicely, with proper informats applied, and the resulting data set has all the proper labels and variable formatting. And when you repeat this process for another sample of similar data you can still use the same data template, and your read-in data step stays the same – simple and concise.
Do you find SAS data templates useful? Do you use them in any shape or form in your SAS data development projects? Please share your thoughts.
Simplify data preparation using SAS data templates was published on SAS Users.