Data Management

2月 252021
 

The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.

Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!

Go ahead and save the date(s)…you don’t want to miss this event!

Content, content, content

We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.

Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).

Networking and entertainment

You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.

A big thank you

We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!

Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.

What makes SAS Global Forum great? was published on SAS Users.

12月 172020
 

There’s nothing worse than being in the middle of a task and getting stuck. Being able to find quick tips and tricks to help you solve the task at hand, or simply entertain your curiosity, is key to maintaining your efficiency and building everyday skills. But how do you get quick information that’s ALSO engaging? By adding some personality to traditionally routine tutorials, you can learn and may even have fun at the same time. Cue the SAS Users YouTube channel.

With more than 50 videos that show personality published to-date and over 10,000 hours watched, there’s no shortage of learning going on. Our team of experts love to share their knowledge and passion (with personal flavor!) to give you solutions to those everyday tasks.

What better way to round out the year than provide a roundup of our most popular videos from 2020? Check out these crowd favorites:

Most viewed

  1. How to convert character to numeric in SAS
  2. How to import data from Excel to SAS
  3. How to export SAS data to Excel

Most hours watched

  1. How to import data from Excel to SAS
  2. How to convert character to numeric in SAS
  3. Simple Linear Regression in SAS
  4. How to export SAS data to Excel
  5. How to Create Macro Variables and Use Macro Functions
  6. The SAS Exam Experience | See a Performance-Based Question in Action
  7. How it Import CSV files into SAS
  8. SAS Certification Exam: 4 tips for success
  9. SAS Date Functions FAQs
  10. Merging Data Sets in SAS Using SQL

Latest hits

  1. Combining Data in SAS: DATA Step vs SQL
  2. How to Concatenate Values in SAS
  3. How to Market to Customers Based on Online Behavior
  4. How to Plan an Optimal Tour of London Using Network Optimization
  5. Multiple Linear Regression in SAS
  6. How to Build Customized Object Detection Models

Looking forward to 2021

We’ve got you covered! SAS will continue to publish videos throughout 2021. Subscribe now to the SAS Users YouTube channel, so you can be notified when we’re publishing new videos. Be on the lookout for some of the following topics:

  • Transforming variables in SAS
  • Tips for working with SAS Technical Support
  • How to use Git with SAS

2020 roundup: SAS Users YouTube channel how to tutorials was published on SAS Users.

12月 072020
 

In my previous blog post, I talked about using PROC CAS to accomplish various data preparation tasks. Since then, my colleague Todd Braswell and I worked through some interesting challenges implementing an Extract, Transform, Load (ETL) process that continuously updates data in CAS. (Todd is really the brains behind getting this to work – I am just along for the ride.)

In a nutshell, the process goes like this:

  1. PDF documents drop into a "receiving" directory on the server. The documents have a unique SubmissionID. Some documents are very large – thousands of pages long.
  2. A Python job runs and converts PDFs into plain text. Python calls an API that performs Optical Character Recognition (OCR) and saves off the output as a CSV file, one row per page, in the PDF document.
  3. A SAS program, running in batch, loads the CSV file with new records into a CAS table. SubmissionID is passed to the batch program as a macro variable, which is used as part of the CAS table name.
  4. Records loaded from the CSV file are appended to the Main table. If records with the current SubmissionID already exist in the Main table, they are deleted and replaced with new records.
    The Main table is queried by downstream processes, which extract subsets of data, apply model score code, and generate results for the customer.

Continuously update data process flow

Due to the volume of data and the amount of time it takes to OCR large PDFs, the ETL process runs in multiple sessions simultaneously. And here is a key requirement: the Main table is always available, in a promoted state, with all up-to-date records, in order for the model score code to pick up the needed records.

What does "promoted state" mean?

The concept of table scope, which was introduced with the first release of SAS Viya, presents a challenge. CAS tables are in-memory tables that can have one of two "scopes":

  • Session scope – the table exists within the scope of your current CAS session and drops from memory as soon as the session ends. Functionally, this is somewhat similar to the data you write to the WORK library in SAS 9 – once you disconnect, the data drops from the WORK library.
  • Global scope – the table is available to all sessions. If your session ends, you will still have access to it when you start a new session. Your colleagues also maintain access, assuming they have the necessary permissions. For the table to assume global scope, it needs to be promoted.

Common promotion techniques for a table are the DATA STEP, PROC CAS, or PROC CASUTIL. For example:

/*promote a table using DATA STEP*/
*this snippet copies a SAS 9 table into CAS and promotes in one step;
data mycaslib.my_table (promote=yes);
     set mylib.my_table;
     run;
 
/*promote using PROC CASUTIL*/
*this snippet promotes a session-scope CAS table to global scope;
proc casutil incaslib='mycaslib' outcaslib='mycaslib';
     promote casdata='my_table';
     quit;
 
/*Promote using PROC CAS*/
*same as above, this snippet promotes a session-scope table to global scope;
proc cas;
table.promote / 
     caslib='mycaslib'
     targetcaslib='mycaslib' 
     name='my_table' 
     target='my_table';
    quit;

Fun Facts About Table Promotion

You cannot promote a table that has already been promoted. If you need to promote a new version of the same table, you need to first drop the existing table and promote the new version.

To discover the current table state, use the

proc cas;
     table.tableinfo / 
     caslib='mycaslib' 
     name='main';
     quit;

If you append rows to a promoted table using the DATA STEP append option, the new rows are automatically promoted. For example, in this snippet the mycaslib.main table, which is promoted, remains promoted when the rows from mycaslib.new_rows are appended to it:

data mycaslib.main(append=yes);
     set mycaslib.new_rows;
     run;

When you manipulate a promoted table using the DATA STEP apart from appending rows, it creates a new, session-scope version of the same table. You will have two versions of the table: the global-scope table, which remains unchanged, and the session-scope version which has the changes you implemented. Even if you don't change anything in the table and simply run:

data mycaslib.my_table;
     set mycaslib.my_table;
     run;

in which mycaslib.my_table is promoted, you end up with a promoted and an unpromoted version of this table in the mycaslib library – a somewhat unexpected and hardly desired result. Appendix 1 walks through a quick exercise you can try to verify this.

As you probably guessed, this is where we ran into trouble with our ETL process: the key requirement was for the Main table to remain promoted, yet we needed to continuously update it. The task was simple if we just needed to append the rows; however, we also needed to replace the rows if they already existed. If we tried to delete the existing rows using the DATA STEP, we would have to deal with the changes applied to a session-scope copy of the global-scope table.

Initially, we designed the flow to save off the session-scope table with changes, then drop the (original) global-scope version, and finally reload the up-to-date version. This was an acceptable workaround, but errors started to pop up when a session looked for the Main table to score data, while a different concurrent session reloaded the most up-to-date data. We were uncertain how this would scale as our data grew.

PROC CAS to the rescue!

After much research, we learned the deleteRows, which allows you to delete rows directly from a global-scope table. The data is never dropped to session-scope – exactly what we needed. Here's an example:

proc cas;
     table.deleteRows /
     table={caslib="mycaslib", name="Main", where="SubmissionID = 12345"};
     quit;

In case you are wondering, the Tables action set also has an

/*1. Load new rows. SubmissionID macro variable is a parameter passed to the batch program*/
/*New rows are written to the casuser library, but it does not really matter which caslib you choose – 
   we are not persisting them across sessions*/
proc casutil;
       load file="/path_to_new_rows/New_rows_&SubmissionID..csv" outcaslib="casuser" casout="new_rows";
       quit;
/*2. Delete rows with the current SubmissionID */
proc cas;
       table.deleteRows /
       table={caslib="prod", name="Main", where="SubmissionID = &SubmissionID."};
       quit;
/*3. Append new rows*/
data mycaslib.main(append=yes);
	set mycaslib.new_rows;
	run;
/*4. Save the main table to ensure we have a disk backup of in-memory data*/
proc casutil incaslib="prod" outcaslib="prod";
	save casdata="main" replace;
	quit;

Conclusion

We learned how to continuously update data in CAS while ensuring the data remains available to all sessions accessing it asynchronously. We learned the append option in DATA STEP automatically promotes new rows but manipulating the data in a global-scope table through DATA STEP in other ways leads to data being copied to session-scope. Finally, we learned that to ensure the table remains promoted while it is updated, we can fall back on PROC CAS.

Together, these techniques enabled implementation of a robust data flow that overcomes concurrency problems due to multiple processes updating and querying the data.

Acknowledgement

We thank Brian Kinnebrew for his generous help in investigating this topic and the technical review.

Appendix 1

Try the following exercise to verify that manipulating a promoted table in DATA STEP leads to two copies of the table – session- AND global-scope.

/*copy and promote a sample SAS 9 table*/
data casuser.cars(promote=yes);
     set sashelp.cars;
     run;
/*check the number of rows and confirm that the table is promoted*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit; /*The table is promoted and has 428 rows*/
 
/*delete some rows in the promoted table*/
data casuser.cars;
     set casuser.cars;
     if make='Acura' then delete;
     run;
/*check again – how may rows does the table have? Is it promoted?*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit;  /*The table is has 421 rows but it is no longer promoted*/
 
/*reset your CAS session*/
/*kill your current CAS session */
cas _all_ terminate;
/*start a new CAS session and assign caslibs*/
cas; 
caslib _all_ assign;
 
/*check again – how may rows does the table have? Is it promoted?*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit;  /*The table is promoted and has 428 rows*/

What we see here is, manipulating a global-scope table in DATA STEP leads to duplication of data. CAS copies the table to session-scope and applies the changes there. The changes go away if you terminate the session. One way to get around this issue is instead of trying to overwrite the promoted table, create a new table, then drop the old table and promote the new table under the old table's name. Otherwise, use table.Update actions to update/delete rows in place, as described in this post.

Append and Replace Records in a CAS Table was published on SAS Users.

11月 202020
 

If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.

Everything is virtual these days, so what do we mean by virtual?

Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.

Call for content is open

One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.

The Student Symposium is back

Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.

Learn more

I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.

Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.

11月 042020
 

Removing duplicate charactersIn this blog post we are going to tackle a data cleansing task of removing unwanted repeated characters in SAS character variables.

Character repetition can stem from various stages of data life cycle: from data collection, to data transmission, to data transformation. It can be accidental or intentional by design. It can be sporadic or consistent. In either case, it needs to be addressed by robust data cleansing processes to ensure adequate data quality that is imperative for the data usability.

Character repetition examples

Example 1. Data entry, especially manual data entry, can be a high-risk factor for accidental character duplication. Have you ever pressed a key on your computer keyboard for a bit longer than intended, so it started automatically typing multiple characters???????????????

Keyboard properties adjustment

 Tip: You can adjust your Keyboard Properties to control “Repeat delay” and “Repeat rate” settings (on Windows computer, search for “Keyboard” and click on Keyboard in Control Panel).

Example 2. Recently, I had to deal with the data that contained multiple consecutive double quotation marks all over the character string values. Even though we don’t know the exact cause of it, still for each of these duplicated quotation marks occurrences we needed to replace them with a single quotation mark.

Removing repeated blanks

There is a very useful Removing unwanted characters from text strings by Amadeus Software we developed a prototype using

data D;
   c = ','; *<- character to un-duplicate;
   cc = c||c; *<- double character;
   string = 'Many,,,,,, commas,,,,, in,,, this,, sentence.,'; *<- source string;
   put 'BEFORE:' string=; *<- output initial string;
   do while (find(string,cc)); *<- loop through while there are doubles;
      string = tranwrd(string,cc,c); *<- replace double with a single character;
   end;
   put 'AFTER: ' string=; *<- output unduplicated string;
run;

This code will produce the following in the SAS log:

BEFORE:string=Many,,,,,, commas,,,,, in,,, this,, sentence.,
AFTER: string=Many, commas, in, this, sentence.,

which shows that this approach correctly un-duplicates the source string removing and replacing all repeated characters (commas in our example) with a single one.

User-defined SAS function for removing any repeated characters

Let’s use

libname funclib 'c:\projects\functions';
proc fcmp outlib=funclib.userfuncs.package1;
   function undupc(str $, clist $) $;
      length x $32767 c $1 cc $2;
      x = str; 
      do i=1 to length(clist);
         c = char(clist,i);
         cc = c||c;
         do while (find(trim(x),cc));
            x = tranwrd(x,cc,c);
         end;
      end;
      return (x); 
   endfunc; 
run;

Code highlights

  • We introduce an interim variable x to which we will iteratively apply replacing double characters with a single one.
  • We assign length attribute of this variable to be maximum allowable character length of 32767 bytes to accommodate any character length used in the calling program.
  • Outer do-loop loops through the clist containing characters we want to unduplicate.
  • Variable c is assigned a single character from clist, variable cc is assigned double of the cc value.
  • Inner do-loop iterates through trimmed characters in x while doubles are found; using trim(x) is essential as it not only speeds up processing while searching through a shorter string (without trailing blanks), it also prevents from falling into an infinite loop in case clist contains blank character to unduplicate (cc equals to double blanks which are always going to be found among trailing blanks).

Let’s test our newly minted UNDUPC function on the following data:

data SOURCE;
   infile datalines truncover;
   input str $50.;
   datalines;
"""Repeated "double quotes""""
Repeated,,,,,commas,,,,,,,,,,,
[[[""Mixed""]]   characters,,,
;

Since our user-defined function is permanently stored in the location specified in the

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length new_str $50;
   new_str = undupc(str, ' ,"][');
run;

This code will remove and replace all repeated sequences of characters ' ',',', '"', ']', and '['. The order of these characters listed in the second argument doesn’t matter. Here is what we get:

Duplicate characters removal results
As you can see, we get what we wanted including the functionality of the COMPBL function.

User-defined CALL routine for removing any repeated characters

As much as I love user-defined functions, I have an issue with the above undupc user-defined function implementation. It has to do with how the PROC FCMP handles interim character variables length attribute assignment. It does not implicitly inherit their length attribute from another variable as SAS data step does. For example, if you run the following data step:

data a;
   length x $99;
   y = x;
run;

variable y will have the length attribute $99 implicitly inherited from the x variable.

In PROC CFMP function, you can either assign the length attribute to a character variable explicitly with LENGTH or ATTRIB statement (as we did by using length x $32767 ), or it will be set to $33 if you use any other way of implicit assignment. (I leave it up to you guessing why 33 and why not any other number.) Since we wanted to accommodate SAS character strings of any length, we had to explicitly assign our interim variable x length attribute the maximum valid value of $32767. This will inevitably take tall on the function performance as we will have to process longer strings.

However, we can avoid this issue by using CALL routine instead:

libname funclib 'c:\projects\functions';
proc fcmp outlib=funclib.usercalls.package1;
   subroutine undupc(str $, clist $, x $);
      outargs x;
      length c $1 cc $2;
      x = str;
      do i=1 to length(clist);
         c = char(clist,i);
         cc = c||c;
         do while (find(trim(x),cc));
            x = tranwrd(x,cc,c);
         end;
      end;
   endsub; 
run;

This code is very similar to the user-defined function above with a slight difference. Here, x variable is listed as an argument in the subroutine definition and refers to a SAS data step variable whose length attribute is assigned in the calling data step. Unlike SAS function, SAS subroutine does not return a value; instead, it uses

options cmplib=funclib.usercalls;
data TARGET;
   set SOURCE;
   length new_str $50;
   call undupc(str, ' ,"][', new_str);
run;

And we will get the same results as with the UNDUPC function above.

Store user-defined functions and subroutines separately

You can create and have both, user-defined function and call routine with the same name. However, to avoid confusion (and errors) do not store their definitions in the same data table (outlib= option of the PROC FCMP). If they are stored in the same data table, then when used in a DATA step, SAS will pull the latest definition by its name only and that may not be the entity you want.

Performance benchmarking

To compare performances of the UNDUPC function vs UNDUPC subroutine we created a rather large data table (1 Million observations) with randomly generated strings (1000 characters long):

libname SASDL 'C:\PROJECTS\TESTDATA';
 
data SASDL.TESTDATA (keep=str);
   length str $1000;
   do i=1 to 1000000;
      str = '';
      do j=1 to 1000;
         str = cats(str,byte(int(rank(' ')+38*rand('UNIFORM'))));
      end;
      output;
   end;
run;

Then we ran the following 2 data steps, one using the undupc() function, and the other using undupc() call routine:

options cmplib=funclib.userfuncs;
 
data SASDL.TESTDATA_UNDUPC_FUNC;
   set SASDL.TESTDATA;
   length new_str $1000;
   new_str = undupc(str, '#+');
run;
 
options cmplib=subrlib.usercalls;
 
data SASDL.TESTDATA_UNDUPC_CALL;
   set SASDL.TESTDATA;
   length new_str $1000;
   call undupc(str, '#+', new_str);
run;

A quick SAS log inspection reveals that CALL UNDUPC works as much as 3 times faster than UNDUPC function (10 seconds vs. 30 seconds). The time savings may vary depending on your data composition and computing environment, but in any case, if you process high volumes of data you may consider using CALL routine over function. This is not a blanket statement, as it only pertains to this particular algorithm of eliminating character repetitions where we had to explicitly assign the highest possible length attribute value to the interim variable in the function, but not in the CALL routine.

When we reduced declared length of x from $32767 to $1000 within the user-defined function definition its performance became on par with the CALL routine.

Additional Resources for SAS character strings processing

Your thoughts?

Have you found this blog post useful? Would you vote for implementing UNDUPC as a native built-in SAS function? Please share your thoughts and feedback in the comments section below.

Removing repeated characters in SAS strings was published on SAS Users.

9月 172020
 

Unquote by removing matching quotesBefore we delve into unquoting SAS character variables let’s briefly review existing SAS functionality related to the character strings quoting/unquoting.

%QUOTE and %UNQUOTE macro functions

Don’t be fooled by these macro functions’ names. They have nothing to do with quoting or un-quoting character variables’ values. Moreover, they have nothing to do with quoting or un-quoting even macro variables’ values. According to the %QUOTE Macro Function documentation it masks special characters and mnemonic operators in a resolved value at macro execution.  %UNQUOTE Macro Function unmasks all special characters and mnemonic operators so they are interpreted as macro language elements instead of as text. There are many other SAS “macro quoting functions” (%SUPERQ, %BQUOTE, %NRBQUOTE, all macro functions whose name starts with %Q: %QSCAN, %QSUBSTR, %QSYSFUNC, etc.) that perform some action including masking.

Historically, however, SAS Macro Language uses terms “quote” and “unquote” to denote “mask” and “unmask”. Keep that in mind when reading SAS Macro documentation.

QUOTE function

Most SAS programmers are familiar with the QUOTE function that adds quotation marks around a character value. It can add double quotation marks (by default) or single quotation marks if you specify that in its second argument.

This function goes even further as it doubles any quotation mark that already existed within the value to make sure that an embedded quotation mark is escaped (not treated as an opening or closing quotation mark) during parsing.

DEQUOTE function

There is also a complementary DEQUOTE function that removes matching quotation marks from a character string that begins with a quotation mark. But be warned that it also deletes all characters to the right of the first matching quotation mark. In my view, deleting those characters is overkill because when writing a SAS program, we may not know what is going to be in the data and whether it’s okay to delete its part outside the first matching quotes. That is why you need to be extra careful if you decide to use this function. Here is an example of what I mean. If you run the following code:

data a;
   input x $ 1-50;
   datalines;
'This is what you get. Let's be careful.'
;
 
data _null_;
   set a;
   y = dequote(x);
   put x= / y=;
run;

you will get the following in the SAS log:

y=This is what you get. Let

This is hardly what you really wanted as you have just lost valuable information – part of the y character value got deleted: 's be careful. I would rather not remove the quotation marks at all than remove them at the expense of losing meaningful information.

$QUOTE informat

The $QUOTE informat does exactly what the DEQUOTE() function does, that is removes matching quotation marks from a character string that begins with a quotation mark. You can use it in the example above by replacing

y = dequote(x);

with the INPUT() function

y = input(x, $quote50.);

Or you can use it directly in the INPUT statement when reading raw data from datalines or an external file:

input x $quote50.;

Both, $QUOTE informat and DEQUOTE() function, in addition to removing all characters to the right of the closing quotation mark do the following unconventional, peculiar things:

  • Remove a lone quotation mark (either double or single) when it’s the only character in the string; apparently, the lone quotation mark is matched to itself.
  • Match single quotation mark with double quotation mark as if they are the same.
  • Remove matching quotation marks from a character string that begins with a quotation mark; if your string has one or more leading blanks (that is, a quotation mark is not the first character), nothing gets removed (un-quoted).

If the described behavior matches your use case, you are welcome to use either $QUOTE informat or DEQUOTE() function. Otherwise, please read on.

UNQUOTE function definition

Up to this point such a function did not exist, but we are about to create one to justify the title. Let’s keep it simple and straightforward. Here is what I propose our new unquote() function to do:

  • If first and last non-blank characters of a character string value are matching quotation marks, we will remove them. We will not consider quotation marks matching if one of them is a single quotation mark and another is a double quotation mark.
  • We will remove those matching quotation marks whether they are both single quotation marks OR both double quotation marks.
  • We are not going to remove or change any other quotation marks that may be present within those matching quotation marks that we remove.
  • We will remove leading and trailing blanks outside the matching quotation marks that we delete.
  • However, we will not remove any leading or trailing blanks within the matching quotation marks that we delete. You may additionally apply the STRIP() function if you need to do that.

To summarize these specifications, our new UNQUOTE() function will extract a character substring within matching quotation marks if they are the first and the last non-blank characters in a character string. Otherwise, it returns the character argument unchanged.

UNQUOTE function implementation

Here is how such a function can be implemented using PROC FCMP:

libname funclib 'c:\projects\functions';
 
proc fcmp outlib=funclib.userfuncs.v1; /* outlib=libname.dataset.package */
   function unquote(x $) $32767;
      pos1 = notspace(x); *<- first non-blank character position;
      if pos1=0 then return (x); *<- empty string;
 
      char1 = char(x, pos1); *<- first non-blank character;
      if char1 not in ('"', "'") then return (x); *<- first non-blank character is not " or ' ;
 
      posL = notspace(x, -length(x)); *<- last non-blank character position;
 
      if pos1=posL then return (x); *<- single character string;
 
      charL = char(x, posL); *<- last non-blank character;
      if charL^=char1 then return (x); *<- last non-blank character does not macth first;
 
      /* at this point we should have matching quotation marks */
      return (substrn(x, pos1 + 1, posL - pos1 - 1)); *<- remove first and last quotation character;
   endfunc; 
run;

Here are the highlights of this implementation:

We use multiple RETURN statements: we sequentially check for different special conditions and if one of them is met we return the argument value intact. The RETURN statement does not just return the value, but also stops any further function execution.

At the very end, after making sure that none of the special conditions is met, we strip the argument value from the matching quotation marks along with the leading and trailing blanks outside of them.

NOTE: SAS user-defined functions are stored in a SAS data set specified in the outlib= option of the PROC FCMP. It requires a 3-level name (libref.datsetname.packagename) for the function definition location to allow for several versions of the same-name function to be stored there.

However, when a user-defined function is used in a SAS DATA Step, only a 2-level name can be specified (libref.datasetname). If that data set has several same-name functions stored in different packages the DATA Step uses the latest function definition (found in a package closest to the bottom of the data set).

UNQUOTE function results

Let’s use the following code to test our newly minted user-defined function UNQUOE():

libname funclib 'c:\projects\functions';
options cmplib=funclib.userfuncs;
 
data A;
   infile datalines truncover;
   input @1 S $char100.;
   datalines;
'
"
How about this?
    How about this?
"How about this?"
'How about this?'
"How about this?'
'How about this?"
"   How about this?"
'      How about this?'
'      How "about" this?'
'      How 'about' this?'
   "     How about this?"
   "     How "about" this?"
   "     How 'about' this?"
   '     How about this?'
;
 
data B;
   set A;
   length NEW_S $100;
   label NEW_S = 'unquote(S)';
   NEW_S = unquote(S);
run;

This code produces the following output table:

Example of character string unquoting
As you can see it does exactly what we wanted it to do – removing matching first and last quotation marks as well as stripping out blanks outside the matching quotation marks.

DSD (Delimiter-Sensitive Data) option

This INFILE statement’s option is particularly and extremely useful when using LIST input to read and un-quote comma-delimited raw data. In addition to removing enclosing quotation marks from character values, the DSD option specifies that when data values are enclosed in quotation marks, delimiters within the value are masked, that is treated as character data (not as delimiters). It also sets the default delimiter to a comma and treats two consecutive delimiters as a missing value.

In contrast with the above UNQUOTE() function, the DSD option will not remove enclosing quotation marks if there are same additional quotation marks present inside the character value.  When DSD option does strip enclosing quotation marks it also strips leading and trailing blanks outside and within the removed quotation marks.

Additional Resources

Your thoughts?

Have you found this blog post useful? Please share your use cases, thoughts and feedback in the comments below.

How to unquote SAS character variable values was published on SAS Users.

9月 172020
 

Unquote by removing matching quotesBefore we delve into unquoting SAS character variables let’s briefly review existing SAS functionality related to the character strings quoting/unquoting.

%QUOTE and %UNQUOTE macro functions

Don’t be fooled by these macro functions’ names. They have nothing to do with quoting or un-quoting character variables’ values. Moreover, they have nothing to do with quoting or un-quoting even macro variables’ values. According to the %QUOTE Macro Function documentation it masks special characters and mnemonic operators in a resolved value at macro execution.  %UNQUOTE Macro Function unmasks all special characters and mnemonic operators so they are interpreted as macro language elements instead of as text. There are many other SAS “macro quoting functions” (%SUPERQ, %BQUOTE, %NRBQUOTE, all macro functions whose name starts with %Q: %QSCAN, %QSUBSTR, %QSYSFUNC, etc.) that perform some action including masking.

Historically, however, SAS Macro Language uses terms “quote” and “unquote” to denote “mask” and “unmask”. Keep that in mind when reading SAS Macro documentation.

QUOTE function

Most SAS programmers are familiar with the QUOTE function that adds quotation marks around a character value. It can add double quotation marks (by default) or single quotation marks if you specify that in its second argument.

This function goes even further as it doubles any quotation mark that already existed within the value to make sure that an embedded quotation mark is escaped (not treated as an opening or closing quotation mark) during parsing.

DEQUOTE function

There is also a complementary DEQUOTE function that removes matching quotation marks from a character string that begins with a quotation mark. But be warned that it also deletes all characters to the right of the first matching quotation mark. In my view, deleting those characters is overkill because when writing a SAS program, we may not know what is going to be in the data and whether it’s okay to delete its part outside the first matching quotes. That is why you need to be extra careful if you decide to use this function. Here is an example of what I mean. If you run the following code:

data a;
   input x $ 1-50;
   datalines;
'This is what you get. Let's be careful.'
;
 
data _null_;
   set a;
   y = dequote(x);
   put x= / y=;
run;

you will get the following in the SAS log:

y=This is what you get. Let

This is hardly what you really wanted as you have just lost valuable information – part of the y character value got deleted: 's be careful. I would rather not remove the quotation marks at all than remove them at the expense of losing meaningful information.

$QUOTE informat

The $QUOTE informat does exactly what the DEQUOTE() function does, that is removes matching quotation marks from a character string that begins with a quotation mark. You can use it in the example above by replacing

y = dequote(x);

with the INPUT() function

y = input(x, $quote50.);

Or you can use it directly in the INPUT statement when reading raw data from datalines or an external file:

input x $quote50.;

Both, $QUOTE informat and DEQUOTE() function, in addition to removing all characters to the right of the closing quotation mark do the following unconventional, peculiar things:

  • Remove a lone quotation mark (either double or single) when it’s the only character in the string; apparently, the lone quotation mark is matched to itself.
  • Match single quotation mark with double quotation mark as if they are the same.
  • Remove matching quotation marks from a character string that begins with a quotation mark; if your string has one or more leading blanks (that is, a quotation mark is not the first character), nothing gets removed (un-quoted).

If the described behavior matches your use case, you are welcome to use either $QUOTE informat or DEQUOTE() function. Otherwise, please read on.

UNQUOTE function definition

Up to this point such a function did not exist, but we are about to create one to justify the title. Let’s keep it simple and straightforward. Here is what I propose our new unquote() function to do:

  • If first and last non-blank characters of a character string value are matching quotation marks, we will remove them. We will not consider quotation marks matching if one of them is a single quotation mark and another is a double quotation mark.
  • We will remove those matching quotation marks whether they are both single quotation marks OR both double quotation marks.
  • We are not going to remove or change any other quotation marks that may be present within those matching quotation marks that we remove.
  • We will remove leading and trailing blanks outside the matching quotation marks that we delete.
  • However, we will not remove any leading or trailing blanks within the matching quotation marks that we delete. You may additionally apply the STRIP() function if you need to do that.

To summarize these specifications, our new UNQUOTE() function will extract a character substring within matching quotation marks if they are the first and the last non-blank characters in a character string. Otherwise, it returns the character argument unchanged.

UNQUOTE function implementation

Here is how such a function can be implemented using PROC FCMP:

libname funclib 'c:\projects\functions';
 
proc fcmp outlib=funclib.userfuncs.v1; /* outlib=libname.dataset.package */
   function unquote(x $) $32767;
      pos1 = notspace(x); *<- first non-blank character position;
      if pos1=0 then return (x); *<- empty string;
 
      char1 = char(x, pos1); *<- first non-blank character;
      if char1 not in ('"', "'") then return (x); *<- first non-blank character is not " or ' ;
 
      posL = notspace(x, -length(x)); *<- last non-blank character position;
 
      if pos1=posL then return (x); *<- single character string;
 
      charL = char(x, posL); *<- last non-blank character;
      if charL^=char1 then return (x); *<- last non-blank character does not macth first;
 
      /* at this point we should have matching quotation marks */
      return (substrn(x, pos1 + 1, posL - pos1 - 1)); *<- remove first and last quotation character;
   endfunc; 
run;

Here are the highlights of this implementation:

We use multiple RETURN statements: we sequentially check for different special conditions and if one of them is met we return the argument value intact. The RETURN statement does not just return the value, but also stops any further function execution.

At the very end, after making sure that none of the special conditions is met, we strip the argument value from the matching quotation marks along with the leading and trailing blanks outside of them.

NOTE: SAS user-defined functions are stored in a SAS data set specified in the outlib= option of the PROC FCMP. It requires a 3-level name (libref.datsetname.packagename) for the function definition location to allow for several versions of the same-name function to be stored there.

However, when a user-defined function is used in a SAS DATA Step, only a 2-level name can be specified (libref.datasetname). If that data set has several same-name functions stored in different packages the DATA Step uses the latest function definition (found in a package closest to the bottom of the data set).

UNQUOTE function results

Let’s use the following code to test our newly minted user-defined function UNQUOE():

libname funclib 'c:\projects\functions';
options cmplib=funclib.userfuncs;
 
data A;
   infile datalines truncover;
   input @1 S $char100.;
   datalines;
'
"
How about this?
    How about this?
"How about this?"
'How about this?'
"How about this?'
'How about this?"
"   How about this?"
'      How about this?'
'      How "about" this?'
'      How 'about' this?'
   "     How about this?"
   "     How "about" this?"
   "     How 'about' this?"
   '     How about this?'
;
 
data B;
   set A;
   length NEW_S $100;
   label NEW_S = 'unquote(S)';
   NEW_S = unquote(S);
run;

This code produces the following output table:

Example of character string unquoting
As you can see it does exactly what we wanted it to do – removing matching first and last quotation marks as well as stripping out blanks outside the matching quotation marks.

DSD (Delimiter-Sensitive Data) option

This INFILE statement’s option is particularly and extremely useful when using LIST input to read and un-quote comma-delimited raw data. In addition to removing enclosing quotation marks from character values, the DSD option specifies that when data values are enclosed in quotation marks, delimiters within the value are masked, that is treated as character data (not as delimiters). It also sets the default delimiter to a comma and treats two consecutive delimiters as a missing value.

In contrast with the above UNQUOTE() function, the DSD option will not remove enclosing quotation marks if there are same additional quotation marks present inside the character value.  When DSD option does strip enclosing quotation marks it also strips leading and trailing blanks outside and within the removed quotation marks.

Additional Resources

Your thoughts?

Have you found this blog post useful? Please share your use cases, thoughts and feedback in the comments below.

How to unquote SAS character variable values was published on SAS Users.

8月 252020
 

Analytics is playing an increasingly strategic role in the ongoing digital transformation of organizations today. However, to succeed and scale your digital transformation efforts, it is critical to enable analytics skills at all tiers of your organization. In a recent blog post covering 4 principles of analytics you cannot ignore, SAS COO Oliver Schabenberger articulated the importance of democratizing analytics. By scaling your analytics efforts beyond traditional data science teams and involving more people with strong business domain knowledge, you can gain more valuable insights and make more significant impacts.

SAS Viya was built from the ground up to fulfill this vision of democratizing analytics. At SAS, we believe analytics should be accessible to everyone. While SAS Viya offers tremendous support and will continue to be the tool of choice for many advanced users and programmers, it is also highly accessible for business analysts and insights team who prefer a more visual approach to analytics and insights discovery.

Self-service data management

First of all, SAS Viya makes it easy for anyone to ingest and prepare data without a single line of code. The integrated data preparation components within SAS Viya support ad-hoc, agile-oriented data management tasks where you can profile, cleanse, and join data easily and rapidly.

Automatically Generated Data Profiling Report

You can execute complex joins, create custom columns, and cleanse your data via a completely drag-and-drop interface. The automation built into SAS Viya eases the often tedious task of data profiling and data cleansing via automated data type identification and transform suggestions. In an area that can be both complex and intimidating, SAS Viya makes data management tasks easy and approachable, helping you to analyze more data and uncover more insights.

Data Join Using a Visual Interface

A visual approach supporting low-code and no-code programming

Speaking of no-code, SAS Viya’s visual approach and support extend deep into data exploration and advanced modeling. Not only can you quickly build charts such as histograms and box plots using a drag and drop interface, but you can also build complex machine learning models using algorithms such as decision trees and logistic regression on the same visual canvas.

Building a Decision Tree Model Using SAS Viya

By putting the appropriate guard rails and providing relevant and context-rich help for the user, SAS Viya empowers users to undertake data analysis using other advanced analytics techniques such as forecasting and correlation analysis. These techniques empower users to ask more complex questions and can potentially help uncover more actionable and valuable insights.

Correlation Analysis Using the Correlation Matrix within SAS Viya

Augmented analytics

Augmented analytics is an emerging area of analytics that leverages machine learning to streamline and automate the process of doing analytics and building machine learning models. SAS Viya leverages augmented analytics throughout the platform to automate various tasks. My favorite use of augmented analytics in SAS Viya, though, is the hyperparameters autotuning feature.

In machine learning, hyperparameters are parameters that you need to set before the learning processing can begin. They are only used during the training process and contribute significantly to the model training process. It can often be challenging to set the optimal hyperparameter settings, especially if you are not an experienced modeler. This is where SAS Viya can help by making building machine learning models easier for everyone one hyperparameter at a time.

Here is an example of using the SAS Viya autotuning feature to improve my decision tree model. Using the autotuning window, all I needed to do was tell SAS Viya how long I want the autotuning process to run for. It will then work its magic and determine the best hyperparameters to use, which, in this case, include the Maximum tree level and the number of Predictor bins. In most cases, you get a better model after coming back from getting a glass of water!

Hyperparameters Autotuning in SAS Viya

Under the hood, SAS Viya uses complex optimization techniques to try to find the best hyperparameter combinations to use all without you having to understand how it manages this impressive feat. I should add that hyperparameters autotuning is supported with many other algorithms in SAS Viya, and you have even more autotuning options when using it via the programmatic interface!

By leveraging a visually oriented framework and augmented analytics capabilities, SAS Viya is making analytics easier and machine learning models more accessible for everyone within an organization. For more on how SAS Viya enables everyone to ask more complex questions and uncover more valuable insights, check out my book Smart Data Discovery Using SAS® Viya®.

Analytics for everyone with SAS Viya was published on SAS Users.

8月 122020
 

CVP engine as a magnifying glassIn my earlier blog post, Changing variable type and variable length in SAS datasets, I showed how you can effectively change variables lengths in a SAS data set. That approach works fine when you need to change length attribute for few variables, on a case by case basis. But what if you need to change lengths for all character variables in a data set? Or if you need to do this for all data sets in a data library? For example, you need to expand (increase) all your character variables lengths by 50%. Well, then the case-by-case approach becomes too laborious and inefficient.

What is a character variable’s length attribute?

Before reading any further, let’s take a quick quiz:

Q: A character variable length attribute represents a number of:

  1. Bits
  2. Bytes
  3. Centimeters
  4. Characters

If your answer is anything but B, it’s incorrect. According to the SAS documentation, length refers to the number of bytes used to store each of the variable's values in a SAS data set. You can use a LENGTH statement to set the length of both numeric and character variables.

It is true though that for some older encoding systems (ASCII, ISO/IEC 8859, EBCIDIC, etc.) there was no difference between the number of bytes and the number of characters as those systems were based on exactly one byte per character encoding. They are even called Single Byte Character Sets (SBCS) for that reason. The problem is they can accommodate only a maximum of 28=256 symbols which is not nearly enough to cover all the variety of natural languages, special characters, emojis etc.

Why would we want to expand character variable lengths?

Use case 1. Expanding character values range

For this scenario, let’s consider Internet traffic analysis where your data contains multiple character columns for Internet Protocol addresses (IP addresses) in 32-bit version 4 (IPv4, e.g. ‘125.255.501.780’). You transition to a newer 128-bit IPv6 standard (e.g. ‘2001:0000:3238:DFE1:0063:0000:0000:FEFB’) and need to modify your data structure to accommodate the new standard with longer character values.

Use case 2. Migrating SAS data to multi-byte encoding environment

In this scenario, you migrate/move SAS data sets from older SBCS environments to newer Multi-Byte-Character Set (MBCS) encoding environments. For such a case, the ability to increase character variables lengths in bulk with a simple action becomes especially significant and critical.

Currently, the most commonly used MBCS is Unicode which is supported by all modern operating systems, databases and web browsers. Out of different flavors of Unicode (UTF-8, UTF-16, UTF-32) the most popular is UTF-8. UTF-8 (8-bit Unicode Transformation Format) is a variable-width character set that uses from 1 to 4 one-byte (8-bit) code units per character; it is capable of encoding 1,112,064 various characters that covers most modern languages, including Arabic and Hebrew characters, hieroglyphs, emojis as well as many other special characters.

Since each UTF-8 encoded character may require somewhere between one and four bytes, and not all SBCS characters are represented by one byte in UTF-8, data migration from SBCS to UTF-8 may cause data truncation and subsequently data loss.

When SAS reads an SBCS-encoded data set and writes its records into UTF-8-encoded data set it throws an ERROR message in the log and stops execution:

ERROR: Some character data was lost during transcoding in the dataset LIBREF.DSNAME. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.

When SAS reads an SBCS-encoded data set and produces a UTF-8-encoded printed report only (without generating a UTF-8-encoded output data set) it generates a WARNING message (with identical description as the above ERROR message) while continuing execution:

WARNING: Some character data was lost during transcoding in the dataset LIBREF.DSNAME. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.

Either ERROR or WARNING is unacceptable and must be properly addressed.

How to expand all character variables lengths?

Regardless of character transcoding, SAS’ CVP Engine is short and effective answer to this question. CVP stands for Character Variable Padding which is exactly what this special-purpose engine does – it pads or expands, increases character variables by a number of bytes. CVP engine is part of Base SAS and does not require any additional licensing.

The CVP engine is a read-only engine for SAS data sets only. You can think of it as of a magnifying glass: it creates an expanded view of the character data descriptors (lengths) without changing them. Still we can use the CVP Engine to actually change a data set or a whole data library to their expanded character variables version. All we need to do is to define our source library as CVP library, for example:

libname inlib cvp 'c:\source_folder';

Then use PROC COPY to create expanded versions of our original data sets in a target library:

libname outlib 'c:\target_folder';
proc copy in=inlib out=outlib noclone;
   select dataset1 dataset2;
run;

Or, if we need to expand character variable lengths for the whole library, then we use the same PROC COPY without the SELECT statement:

proc copy in=inlib out=outlib noclone;
run;

It’s that easy. And the icing on the cake is that CVP engine automatically adjusts the variables format widths to meet the expanded byte lengths for all converted character variables.

Avoiding character data truncation by using the CVP Engine

CVP Engine is a near-perfect SAS solution to the problem of potential data truncation when data is transcoded during migration or move from SBCS-based to MBCS-based systems.

To avoid data loss from possible data truncation during transcoding we can use the above code with a slight but important modification – define the target library with outencoding='UTF-8' option. It will result in our target data not only expanded lengthwise but properly encoded as well. Then we run this modified code in the old SBCS environment before moving/migrating our data sets to the new MBCS environment:

libname inlib cvp 'c:\source_folder';
libname outlib 'c:\utf8_target_folder' outencoding='UTF-8';
proc copy in=inlib out=outlib noclone;
   select dataset1 dataset2;
run;

Again, if you need to expand character variable lengths for the whole library, then you can use the same PROC COPY without the SELECT statement:

proc copy in=inlib out=outlib noclone;
run;

After that we can safely move our expanded, UTF-8-encoded data to the new UTF-8 environment.

Code notes

  • The code above will create a different version of your original data sets with desired encoding and expanded by 50% (default) character variables lengths. As shown below, this default behavior can be changed by using CVPBYTES= or CVPMULTIPLIER= options which explicitly define bytes expansion rate.
  • It is important to note that CVP option is specified on the input library since the CVP engine is read-only engine, thus available for input (read) processing only.
  • For the output library you specify your desired encoding option, in this case outencoding='UTF-8'.
  • The noclone option specifies not to copy data set attributes. This is needed to make sure the attributes are recreated rather than duplicated.
  • If you want to migrate your data sets using PROC MIGRATE, you should expand column lengths before using PROC COPY as shown above since the CVP engine is not currently supported with PROC MIGRATE.
  • The CVP engine supports only SAS data files (no SAS views, catalogs, item stores, and so on).

CVP Engine options

There are several options available with the CVP Engine. Here are the most widely used:

CVPBYTES=bytes - specifies the number of bytes by which to expand character variable lengths. The lengths of character variables are increased by adding the specified bytes value to the current length.

Example: libname inlib 'SAS data-library' cvpbytes=5;

The CVPBYTES= option implicitly specifies the CVP engine, that is if you specify the CVPBYTES= option you don’t have to specify CVP engine explicitly as SAS will use it automatically.

CVPMULTIPLIER=multiplier - specifies a multiplier value that expands character variable. The lengths of character variables are increased by multiplying the current length by the specified multiplier value. You can specify a multiplier value from 1 to 5, or you can specify 0 and then the CVP engine determines the multiplier automatically.

Example: libname inlib 'SAS data-library' cvpmultiplier=2.5;

The CVPMULTIPLIER= option also implicitly specifies the CVP engine, that is if you specify the CVPMULTIPLIER= option, you don’t have to specify CVP engine explicitly as SAS will use it automatically.

Note:

  • You cannot specify both the CVPMULTIPLIER= option and the CVPBYTES= option. Specify only one of these options.
  • If you explicitly assign the CVP engine but do not specify either CVPBYTES= or CVPMULTIPLIER=, then SAS defaults to using CVPMULTIPLIER=1.5 to increase the lengths of the character variables.

Additional Resources

Your thoughts?

Have you found this blog post useful? Please share your use cases, thoughts and feedback in the comments section below.

Expanding lengths of all character variables in SAS data sets was published on SAS Users.

2月 052020
 

One of the first and most important steps in analyzing data, whether for descriptive or inferential statistical tasks, is to check for possible errors in your data. In my book, Cody's Data Cleaning Techniques Using SAS, Third Edition, I describe a macro called %Auto_Outliers. This macro allows you to search for possible data errors in one or more variables with a simple macro call.

Example Statistics

To demonstrate how useful and necessary it is to check your data before starting your analysis, take a look at the statistics on heart rate from a data set called Patients (in the Clean library) that contains an ID variable (Patno) and another variable representing heart rate (HR). This is one of the data sets I used in my book to demonstrate data cleaning techniques. Here is output from PROC MEANS:

The mean of 79 seems a bit high for normal adults, but the standard deviation is clearly too large. As you will see later in the example, there was one person with a heart rate of 90.0 but the value was entered as 900 by mistake (shown as the maximum value in the output). A severe outlier can have a strong effect on the mean but an even stronger effect on the standard deviation. If you recall, one step in computing a standard deviation is to subtract each value from the mean and square that difference. This causes an outlier to have a huge effect on the standard deviation.

Macro

Let's run the %Auto_Outliers macro on this data set to check for possible outliers (that may or may not be errors).

Here is the call:

%Auto_Outliers(Dsn=Clean.Patients,
               Id=Patno,
               Var_List=HR SBP DBP,
               Trim=.1,
               N_Sd=2.5)

This macro call is looking for possible errors in three variables (HR, SBP, and DBP); however, we will only look at HR for this example. Setting the value of Trim equal to .1 specifies that you want to remove the top and bottom 10% of the data values before computing the mean and standard deviation. The value of N_Sd (number of standard deviations) specifies that you want to list any heart rate beyond 2.5 trimmed standard deviations from the mean.

Result

Here is the result:

After checking every value, it turned out that every value except the one for patient 003 (HR = 56) was a data error. Let's see the mean and standard deviation after these data points are removed.

Notice the Mean is now 71.3 and the standard deviation is 11.5. You can see why it so important to check your data before performing any analysis.

You can download this macro and all the other macros in my data cleaning book by going to support.sas.com/cody. Scroll down to Cody's Data Cleaning Techniques Using SAS, and click on the link named "Example Code and Data." This will download a file containing all the programs, macros, and data files from the book.  By the way, you can do this with any of my books published by SAS Press, and it is FREE!

Let me know if you have questions in the comments section, and may your data always be clean! To learn more about SAS Press, check out up-and-coming titles, and to receive exclusive discounts make sure to subscribe to the newsletter.

Finding Possible Data Errors Using the %Auto_Outliers Macro was published on SAS Users.