learn sas

5月 112021
 

It’s safe to say that SAS Global Forum is a conference designed for users, by users. As your conference chair, I am excited by this year’s top-notch user sessions. More than 150 sessions are available, many by SAS users just like you. Wherever you work or whatever you do, you’ll find sessions relevant to your industry or job role. New to SAS? Been using SAS forever and want to learn something new? Managing SAS users? We have you covered. Search for sessions by industry or topic, then add those sessions to your agenda and personal calendar.

Creating a customizable agenda and experience

Besides two full days of amazing sessions, networking opportunities and more, many user sessions will be available on the SAS Users YouTube channel on May 20, 2021 at 10:00am ET. After you register, build your agenda and attend the sessions that most interest you when the conference begins. Once you’ve viewed a session, you can chat with the presenter. Don’t know where to start? Sample agendas are available in the Help Desk.

For the first time, proceedings will live on SAS Support Communities. Presenters have been busy adding their papers to the community. Everything is there, including full paper content, video presentations, and code on GitHub. It all premiers on “Day 3” of the conference, May 20. Have a question about the paper or code? You’ll be able to post a question on the community and ask the presenter.

Want training or help with your code?

Code Doctors are back this year. Check out the agenda for the specific times they’re available and make your appointment, so you’ll be sure to catch them and get their diagnosis of code errors. If you’re looking for training, you’ll be quite happy. Training is also back this year and it’s free! SAS instructor-led demos will be available on May 20, along with the user presentations on the SAS Users YouTube channel.

Chat with attendees and SAS

It is hard to replicate the buzz of a live conference, but we’ve tried our best to make you feel like you’re walking the conference floor. And we know networking is always an important component to any conference. We’ve made it possible for you to network with colleagues and SAS employees. Simply make your profile visible (by clicking on your photo) to connect with others, and you can schedule a meeting right from the attendee page. That’s almost easier than tracking down someone during the in-person event.

We know the exhibit hall is also a big draw for many attendees. This year’s Innovation Hub (formerly known as The Quad) has industry-focused booths and technology booths, where you can interact in real-time with SAS experts. There will also be a SAS Lounge where you can learn more about various SAS services and platforms such as SAS Support Communities and SAS Analytics Explorers.

Get started now

I’ve highlighted a lot in this blog post, but I encourage you to view this 7-minute Innovation Hub video. It goes in depth on the Hub and all its features.

This year there is no reason not to register for SAS Global Forum…and attend as few or as many sessions as you want. Why? Because the conference is FREE!

Where else can you get such quality SAS content and learning opportunities? Nowhere, which is why I encourage you to register today. See you soon!

SAS Global Forum: Your experience, your way was published on SAS Users.

2月 222021
 

Removing a piece from character string In my previous post, we addressed the problem of inserting substrings into SAS character strings. In this post we will solve a reverse problem of deleting substrings from SAS strings.

These two complementary tasks are commonly used for character data manipulation during data cleansing and preparation to transform data to a shape suitable for analysis, text mining, reporting, modeling and decision making.

As in the previous case of substring insertion, we will cover substring deletion for both, character variables and macro variables as both data objects are strings.

The following diagram illustrates what we are going to achieve by deleting a substring from a string:

Removing a substring from SAS string illustration

Have you noticed a logical paradox? We take away a “pieceof” cake and get the whole thing as result! 😊

Now, let’s get serious.

Deleting all instances of a substring from a character variable

Let’s suppose we have a variable STR whose values are sprinkled with some undesirable substring ‘<br>’ which we inherited from some HTML code where tag <br> denotes a line break. For our purposes, we want to remove all instances of those pesky <br>’s. First, let’s create a source data set imitating the described “contaminated” data:

data HAVE;
   infile datalines truncover;
   input STR $100.;
   datalines;
Some strings<br> have unwanted sub<br>strings in them<br>
<br>A s<br>entence must not be cont<br>aminated with unwanted subs<br>trings
Several line<br> breaks<br> are inserted here<br><br><br>
<br>Resulting st<br>ring must be n<br>eat and f<br>ree from un<br>desirable substrings
Ugly unwanted substrings<br><br> must <br>be<br> removed
<br>Let's remove them <br>using S<br>A<br>S language
Ex<br>periment is a<br>bout to b<br>egin
<br>Simpli<br>city may sur<br>prise you<br><br>
;

This DATA step creates WORK.HAVE data set that looks pretty ugly and is hardly usable:
Source data to be cleansed
The following code, however, cleans it up removing all those unwanted substrings ‘<br>’:

data WANT (keep=NEW_STR);
   length NEW_STR $100;
   SUB = '<br>';
   set HAVE;
   NEW_STR = transtrn(STR,'<br>',trimn(''));
run;

After this code runs, the data set WANT will look totally clean and usable:
Cleaned data

Code highlights

  • We use .

The TRANSTRN function is similar to TRANWRD function which replaces all occurrences of a substring in a character string. While TRANWRD uses a single blank when the replacement string has a length of zero, TRANSTRN does allow the replacement string to have a length of zero which essentially means removing.

  • TRIM() function which removes trailing blanks from a character string and returns one blank if the string is missing. However, when it comes to removing (which is essentially replacement with zero length substring) the ability of TRIMN function to return a zero-length string makes all the difference.

Deleting all instances of a substring from a SAS macro variable

For macro variables, I can see two distinct methods of removing all occurrences of undesirable substring.

Method 1: Using SAS data step

Here is a code example:

%let STR = Some strings<br> have unwanted sub<br>strings in them<br>;
%let SUB = <br>;
 
data _null_;
   NEW_STR = transtrn("&STR","&SUB",trimn(''));
   call symputx('NEW',NEW_STR);
run;
 
%put &=STR;
%put &=NEW;

In this code, we stick our macro variable value &STR in double quotes in the transtrn() function as the first argument (source). The macro variable value &SUB, also double quoted, is placed as a second argument. After variable NEW_STR is produced free from the &SUB substrings, we create a macro variable NEW using

%let STR = Some strings<br> have unwanted sub<br>strings in them<br>;
%let SUB = <br>;
 
%let NEW = %sysfunc(transtrn(&STR,&SUB,%sysfunc(trimn(%str()))));
 
%put &=STR;
%put &=NEW;

Deleting selected instance of a substring from a character variable

In many cases we need to remove not all substring instances form a string, but rather a specific occurrence of a substring. For example, in the following sentence (which is a quote by Albert Einstein) “I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.” the second word “sometimes” was added by mistake. It needs to be removed. Here is a code example presenting two solutions of how such a deletion can be done:

data A;
   length STR STR1 STR2 $250;
   STR = 'I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.';
   SUB = 'sometimes';
   STR_LEN = length(STR);
   SUB_LEN = length(SUB);
   POS = find(STR,SUB,-STR_LEN);
   STR1 = catx(' ', substr(STR,1,POS-1), substr(STR,POS+SUB_LEN)); /* solution 1 */
   STR2 = kupdate(STR,POS,SUB_LEN+1);                              /* solution 2 */
   put STR1= / STR2=;
run;

The code will produce two correct identical values of this quote in the SAS log (notice, that the second instance of word “sometimes” is gone):

STR1=I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.
STR2=I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.

Code highlights

  • FIND() function determines position POS of the substring SUB to be deleted in the string STR. In this particular example, we used the fact, that the second occurrence of word “sometimes” is the first occurrence of this word when counted from right to left. That is indicated by the negative 3-rd argument (-STR_LEN) which means that FIND function searches STR for SUB starting from position STR_LEN from right to left.

Solution 1

This is the most traditional solution that cuts out two pieces of the string – before and after the substring being deleted – and then concatenates them together thus removing that substring:

  • substr(STR,1,POS-1) extracts the first part of the source string STR before the substring to be deleted: from position 1 to position POS-1.
  • substr(STR,POS+SUB_LEN) extracts the second part of the source string STR after the substring to be deleted: from position POS+SUB_LEN till the end of STR value (since the third argument, length, is not specified).
  • Solution 2

    Finding n-th instance of a substring within a string .

Deleting selected instance of a substring from a SAS macro variable

Here is a code example of how to solve the same problem as it relates to SAS macro variables. For brevity, we provide just one solution using %sysfunc and KUPDATE() function:

%let STR = I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.;
%let SUB = sometimes;
%let POS = %sysfunc(find(&STR,&SUB,-%length(&STR)));
%let STR2 = %sysfunc(kupdate(&STR,&POS,%eval(%length(&SUB)+1)));
%put "&STR2";

This should produce the following corrected Einstein’s quote in the SAS log:

"I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am."

Additional Resources for SAS character strings processing

Your thoughts?

Have you found this blog post useful? Please share your thoughts and feedback in the comments section below.

Deleting a substring from a SAS string was published on SAS Users.

2月 162021
 

Insert a piece into a stringSAS provides an extensive set of tools for data cleansing and preparation – transforming data to a shape suitable for analysis, text mining, reporting, modeling and ultimately decision making.

In this post we will cover one of the common tasks of character data manipulation – inserting a substring into a SAS character string.

A diagram below illustrates what we are going to achieve:

Illustration for inserting a substring into a string

SAS character strings come in two different incarnations: character variables and macro variables. Since these two are quite different SAS language objects, let’s cover them one by one separately.

Inserting a substring into a character variable

Here is our task: we have a SAS character variable (string) and we want to insert in it a value of another character variable (substring) starting at a particular specified position.

Let’s say we have a string BASE in which we want to insert a COUNTRY name right before word "stays" to make different variation of the resultant phrase. Here is an example of how this can be easily done:

data COUNTRIES;
   length COUNTRY $20;
   input COUNTRY;
   datalines;
Spain
Argentina
Slovenia
Romania
USA
Luxembourg
Egypt
Switzerland
;
 
data NEW (keep=COUNTRY PHRASE);
   BASE = 'The rain in stays mainly in the plain';
   INSPOS = find(BASE,'stays');
   set COUNTRIES;
   length PHRASE $50;
   PHRASE = catx(' ',substr(BASE,1,INSPOS-1),COUNTRY,substr(BASE,INSPOS));
run;

This code dynamically creates variable PHRASE out of values of variable BASE and the values of variable COUNTRY, thus making it data-driven.

After this code runs, the data set NEW will look like this:

Results after inserting a substring into a character string

Here are the code highlights:

  • maximum length of varying-length character variables is 536,870,911 characters (UTF-8 encoding).

    Inserting a substring into a SAS macro variable

    Let’s solve a similar task, but now instead of SAS variables we will operate with SAS macro variables, since they are strings too.

    Here is our problem to solve: we have a SAS macro variable (string) and we want to insert in it a value of another macro variable (substring) starting at a particular specified position.

    Let’s say we have a macro variable BASE with value of The rain in stays mainly in the plain in which we want to insert a country name defined by macro variable COUNTRY with value of Spain right before word stays. Here is an example of how this can be done:

    %let BASE = The rain in stays mainly in the plain;
    %let COUNTRY = Spain;
    %let W = stays;
     
    %let INSPOS = %index(&BASE,&W);
    %let PHRASE = %substr(&BASE,1,%eval(&INSPOS-1))&COUNTRY %substr(&BASE,&INSPOS);
    %put ***&PHRASE***;

    This code will insert the country name in the appropriate place within the BASE macro variable which will be printed in the SAS log by %put statement:

    ***The rain in Spain stays mainly in the plain***

    Here are the code highlights:

    • %substr() macro function to extract two parts of its first argument (&BASE) - before and after insertion:
      • %substr(&BASE,1,%eval(&INSPOS-1))captures the first part of &BASE (before insertion): substring of &BASE starting from the position 1 with a length of %eval(&INSPOS-1).
      • %substr(&BASE,&INSPOS) captures the second part of &BASE (after insertion): substring of &BASE starting from the position &INSPOS till the end of &BASE (since the third argument is not specified).
    • In case of macro variables, we don’t need any concatenation functions – we just list the component pieces of the macro variable value in a proper order with desired separators (blanks in this case).

    NOTE: Unlike for SAS variables, you don’t need to assign the length of SAS macro variables which are automatically defined by their assigned values. The maximum length of SAS macro variables is 65,534 bytes.

    Inserting multiple instances of a substring into a SAS character string

    Sometimes you need to insert a substring into several places (positions p1, p2, …, pn) of a character string. In this case you can use the above strategy repeatedly or iteratively with one little caveat: start inserting from the highest position and moving backwards to the lowest position. This will preserve your pre-determined positions because positions are counted from left to right and inserting a substring at a higher position won’t change the lower position number. Otherwise, after insertion of a substring into lower position, all your higher positions will shift by the length of the inserted substring.

    Additional Resources for SAS character strings processing

    Your thoughts?

    Have you found this blog post useful? Please share your thoughts and feedback in the comments section below.

    Inserting a substring into a SAS string was published on SAS Users.

  • 1月 272021
     

    SASensei logoSAS offers myriad ways to level up your SAS skills (scroll to the bottom to see a list of SAS-provided learning paths and channels). In this post, I introduce you to SASensei, an independent, third-party online SAS learning resource that I enjoy a lot.

    Learning: dull or fun?

    Learning is not always associated with fun. Sometimes it feels difficult and exhausting. New concepts often contradict our prior knowledge and experience, compelling us to rethink, adjust, change and adapt to new paradigms.

    Learning new ideas, skills and technologies can be intimidating, challenging and demanding. While learning, you are stretching out of your comfort zone. But that feeling is only transient. As a matter of attitude, learning is not about pushing yourself out of your comfort zone, it’s about expanding your comfort zone. And that is long lasting. The more you learn, the more comfortable and self-confident you become.

    Learning does not have to be tedious. Look at pre-school kids. They learn basic life skills like walking (rolling, crawling), talking (in one or more languages), asking questions (a lot) – all without taking classes, just through their natural curiosity and ... playing games.

    What is SASensei? Gamified SAS learning

    When I first discovered the SASensei online SAS learning game/application I was pleasantly surprised by its non-traditional approach to learning such a serious and well-established platform as SAS.

    As stated on its website, “Sasensei is a question based learning system. You must demonstrate your command of SAS® to earn Tokens - which should be wisely invested, to enable you to unlock new levels within the game...”

    The following screenshot shows the main page of the SASensei website that displays a dashboard of the top players (they call it leaderboard). You can filter it geographically - by Country, Continent, or World, as well as by the timeline – by Past Month, Past Year, or All Time.

    SASensei leaderboard

    Privacy or prominence

    Users have full control of their privacy or prominence. As you can see in the screenshot above, registered users are displayed by their screen names. This allows the users to either remain anonymous by selecting some fictitious obscure screen name or use their real name. Users can change their screen name at any time.

    Rules of the game

    In this blog post I provide just an overview of the main functionality and features of the SASensei learning platform. For detailed rules of the game, see SASensei Documentation.

    Play and learn

    Users are offered a variety of learning activities:

    • Viewing, reviewing and submitting SAS-learning flashcards;
    • Playing, reviewing and submitting questions by different SAS-related topics;
    • Taking and creating public, private, multi-player and custom quizzes;
    • Providing feedback on questions and flashcards by voting and commenting.

    Users can challenge themselves by delving into different topics. Your successes and failures will provide you an honest and objective estimation of your SAS strengths as well as weaknesses. A healthy competition with other users encourages you to learn more and hone your SAS skills. When you fail a question, you can review the explanation of the correct answer and thus learn why you failed and acquire new knowledge, tips and tricks quickly and efficiently.

    Invest, score, win and build a reputation

    To play you will need to earn and spend tokens which are essentially the game’s currency. To motivate you further, you also earn reputation points, which is your ultimate score, a level of achievement in demonstrating SAS skills. Your reputation score is prominently displayed in your public profile. As you progress in the game and your reputation grows, additional functionality unlocks and becomes available to you. Your reputation score determines your SASensei standing level which is derived from those used in martial arts:

    Sasensei title Sasamurai title

    • White Belt (new players)
    • Yellow Belt (reputation ≥ 50)
    • Green Belt (reputation ≥ 100)
    • Black Belt (reputation ≥ 200)
    • Sasamurai (reputation ≥ 500)
    • Assassin (reputation ≥ 1000)
    • Sasensei (reputation ≥ 5000)

    Sample SASensei question

    When you play a question, you select a topic, and then you are presented with a randomly selected multiple-choice question of a specified time limit (30, 60, 90 or 120 seconds). Here is a sample of such question:

    Question:

    What is wrong with the following LIBNAME statement?
    libname fruits (apples oranges tomatoes);

    Answers:

    • Incorrect syntax
    • You cannot mix apples and oranges in LIBNAME statement
    • Nothing is wrong, valid LIBNAME statement
    • Tomatoes are not fruits, therefore the statement is not correct

    Correct answer:

    Nothing is wrong, valid LIBNAME statement

    Explanation:

    According to Combine and conquer with SAS for examples of usage.

    Question
    Try tackling a question on your own in the SASensei environment to get real life experience: Sample Question.

    Take a SASensei sample quiz

    There are various quizzes available at SASensei: public quizzes, multiplayer quiz games, private quizzes (tests) for students.

    A public quiz contains 12 questions with a total time cap of 12 minutes, and costs eight tokens to play. You can choose a single topic (sas statements, sas macro, procedures, options, etc.), and if you pass (75% or more) you get 12 tokens back, plus 20 point to your reputation. If you get 100%, you get 30 reputation points plus Top Student badge. A count of passed sessions (by topic) is displayed on your public profile.

    QuizAlthough public quizzes are unlocked at the SASamurai level, for the readers of this blog, I have created a special custom quiz sample so you can experience it firsthand right here, right now. Just click on this link, Sample Quiz, register, and enjoy your ride.

    See you at the top of the SASensei dashboard!

    Credit

    Big THANKS to Allan Bowe (United Kingdom) – SAS innovator and entrepreneur who created and founded SASensei learning platform.

    Other SAS learning resources

    Game on! SASensei: a fun way to learn SAS was published on SAS Users.

    1月 132021
     

    Running SAS programs in parallel reduces run time

    As earth completes its routine annual circle around the sun and a new (and hopefully better) year kicks in, it is a perfect occasion to reflect on the idiosyncrasy of time.

    While it is customary to think that 3+2=5, it is only true in sequential world. In parallel world, however, 3+2=3. Think about it: if you have two SAS programs one of which runs 3 hours, and the second one runs 2 hours, their total duration will be 5 hours if you run them one after another sequentially, but it will take only 3 hours if you run them simultaneously, in parallel.

    I am sure you remember those “filling up a swimming pool” math problems from elementary school. They clearly and convincingly demonstrate that two pipes will fill up a swimming pool faster than one. That’s the power of running water in parallel.

    The same principle of parallel processing (or parallel computing) is applicable to SAS programs (or non-SAS programs) by running their different independent pieces in separate SAS sessions at the same time (in parallel).  Divide and conquer.

    You might be surprised at how easily this can be done, and at the same time how powerful it is. Let’s take a look.

    SAS/CONNECT

    MP CONNECT) parallel processing functionality was added to SAS/CONNECT enabling you to execute multiple SAS sessions asynchronously. When a remote SAS session kicks off asynchronously, a portion of your SAS program is sent to the server session for execution and control is immediately returned to the client session. The client session can continue with its own processing or spawn one or more additional asynchronous remote server sessions.

    Running programs in parallel on a single machine

    Sometimes, what comes across as new is just well forgotten old. They used to be Central Processing Units (CPU), but now they are called just processors. Nowadays, practically every single computer is a “multi-machine” (or to be precise “multi-processor”) device. Even your laptop. Just open Task Manager (Ctrl-Alt-Delete), click on the Performance tab and you will see how many physical processors (or cores) and logical processors your laptop has:

    Parallel processing on a single machine

    That means that this laptop can run eight independent SAS processes (sessions) at the same time. All you need to do is to say nicely “Dear Mr. & Mrs. SAS/CONNECT, my SAS program consists of several independent pieces. Would you please run each piece in its own SAS session, and run them all at the same time?” And believe me, SAS/CONNECT does not care how many logical processors you have, whether your logical processors are far away from each other “remote machines” or they are situated in a single laptop or even in a single chip.

    Here is how you communicate your request to SAS/CONNECT in SAS language.

    Spawning multiple SAS sessions using MP Connect

    Suppose you have a SAS code that consists of several pieces – DATA or PROC steps that are independent of each other, i.e. they do not require to be run in a specific sequence. For example, each of the two pieces generates its own data set.

    Then we can create these two data sets in two separate “remote” SAS sessions that run in parallel. Here is how you do this.  (For illustration purposes, I just create two dummy data sets.)

    options sascmd="sas";
     
    /* Current datetime */
    %let _start_dt = %sysfunc(datetime());
     
    /* Prosess 1 */
    signon task1;
    rsubmit task1 wait=no;
     
       libname SASDL 'C:\temp';
     
       data SASDL.DATA_A (keep=str);
          length str $1000;
          do i=1 to 1150000;
             str = '';
             do j=1 to 1000;
                str = cats(str,'A');
             end;
             output;
          end;
       run;
     
    endrsubmit;
     
    /* Process 2 */
    signon task2;
    rsubmit task2 wait=no;
     
       libname SASDL 'C:\temp';
     
       data SASDL.DATA_B (keep=str);
          length str $1000;
          do i=1 to 750000;
             str = '';
             do j=1 to 1000;
                str = cats(str,'B');
             end;
             output;
          end;
       run;
     
    endrsubmit;
     
    waitfor _all_;
    signoff _all_;
     
    /* Print total duration */
    data _null_;
       dur = datetime() - &_start_dt;
       put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
    run;

    In this code, the key elements are:

    SIGNON Statement - initiates a connection between a client session and a server session.

    ENDRSUBMIT statement - marks the end of a block of statements that a client session submits to a server session for execution.

    SIGNOFF Statement - ends the connection between a client session and a server session.

    Parallel processing vs. threaded processing

    There is a distinction between parallel processing described above and threaded processing (aka multithreading). Parallel processing is achieved by running several independent SAS sessions, each processing its own unit of SAS code.

    Threaded processing, on the other hand, is achieved by developing special algorithms and implementing executable codes that run on multiple processors (threads) within the same SAS session. Amdahl's law, which provides theoretical background and estimation of potential time saving achievable by parallel computing on multiple processors.

    Passing information to and from “remote” SAS sessions

    Besides passing pieces of SAS code from client sessions to server sessions, MP CONNECT allows you to pass some other SAS objects.

    Passing data library definitions

    For example, if you have a data library defined in your client session, you may pass that library definition on to multiple server sessions without re-defining them in each server session.

    Let’s say you have two data libraries defined in your client session:

    libname SRCLIB oracle user=myusr1 password=mypwd1 path=mysrv1;
    libname TGTLIB '/sas/data/datastore1';

    In order to make these data libraries available in the remote session all you need is to add

    rsubmit task1 wait=no inheritlib=(SRCLIB TGTLIB);

    This will allow libraries that are defined in the client session to be inherited by and available in the server session. As an option, each client libref can be associated with a libref that is named differently in the server session:

    rsubmit task1 wait=no inheritlib=(SRCLIB=NEWSRC TGTLIB=NEWTGT);

    Passing macro variables from client to server session

    options sascmd="sas";
    %let run_dt = %sysfunc(datetime());
    signon task1;
    %syslput rem_run_dt=&run_dt / remote=task1;
    rsubmit task1 wait=no;
       %put &=rem_run_dt;
    endrsubmit;
     
    waitfor task1;
    signoff task1;

    Passing macro variables from server to client session

    Similarly,

  • %SYSRPUT_USER_ </LIKE=‘character-string’>;
  • (/LIKE=<‘character-string’ >specifies a subset of macro variables whose names match a user-specified character sequence, or pattern.)

    Here is a code example that passes two macro variables, rem_start and rem_year from the remote session and outputs them to the SAS log in the client session:

    options sascmd="sas";
    signon task1;
    rsubmit task1 wait=no;
       %let start_dt = %sysfunc(datetime());
       %sysrput rem_start=&start_dt;
       %sysrput rem_year=2021;
    endrsubmit;
     
    waitfor task1;
    signoff task1;
    %put &=rem_start &=rem_year;

    Summary

    SAS’ Multi-Process Connect is a simple and efficient tool enabling parallel execution of independent programming units. Compared to sequential processing of time-intensive programs, it allows to substantially reduce overall duration of your program execution.

    Additional resources

    Running SAS programs in parallel using SAS/CONNECT® was published on SAS Users.

    11月 042020
     

    Removing duplicate charactersIn this blog post we are going to tackle a data cleansing task of removing unwanted repeated characters in SAS character variables.

    Character repetition can stem from various stages of data life cycle: from data collection, to data transmission, to data transformation. It can be accidental or intentional by design. It can be sporadic or consistent. In either case, it needs to be addressed by robust data cleansing processes to ensure adequate data quality that is imperative for the data usability.

    Character repetition examples

    Example 1. Data entry, especially manual data entry, can be a high-risk factor for accidental character duplication. Have you ever pressed a key on your computer keyboard for a bit longer than intended, so it started automatically typing multiple characters???????????????

    Keyboard properties adjustment

     Tip: You can adjust your Keyboard Properties to control “Repeat delay” and “Repeat rate” settings (on Windows computer, search for “Keyboard” and click on Keyboard in Control Panel).

    Example 2. Recently, I had to deal with the data that contained multiple consecutive double quotation marks all over the character string values. Even though we don’t know the exact cause of it, still for each of these duplicated quotation marks occurrences we needed to replace them with a single quotation mark.

    Removing repeated blanks

    There is a very useful Removing unwanted characters from text strings by Amadeus Software we developed a prototype using

    data D;
       c = ','; *<- character to un-duplicate;
       cc = c||c; *<- double character;
       string = 'Many,,,,,, commas,,,,, in,,, this,, sentence.,'; *<- source string;
       put 'BEFORE:' string=; *<- output initial string;
       do while (find(string,cc)); *<- loop through while there are doubles;
          string = tranwrd(string,cc,c); *<- replace double with a single character;
       end;
       put 'AFTER: ' string=; *<- output unduplicated string;
    run;

    This code will produce the following in the SAS log:

    BEFORE:string=Many,,,,,, commas,,,,, in,,, this,, sentence.,
    AFTER: string=Many, commas, in, this, sentence.,

    which shows that this approach correctly un-duplicates the source string removing and replacing all repeated characters (commas in our example) with a single one.

    User-defined SAS function for removing any repeated characters

    Let’s use

    libname funclib 'c:\projects\functions';
    proc fcmp outlib=funclib.userfuncs.package1;
       function undupc(str $, clist $) $;
          length x $32767 c $1 cc $2;
          x = str; 
          do i=1 to length(clist);
             c = char(clist,i);
             cc = c||c;
             do while (find(trim(x),cc));
                x = tranwrd(x,cc,c);
             end;
          end;
          return (x); 
       endfunc; 
    run;

    Code highlights

    • We introduce an interim variable x to which we will iteratively apply replacing double characters with a single one.
    • We assign length attribute of this variable to be maximum allowable character length of 32767 bytes to accommodate any character length used in the calling program.
    • Outer do-loop loops through the clist containing characters we want to unduplicate.
    • Variable c is assigned a single character from clist, variable cc is assigned double of the cc value.
    • Inner do-loop iterates through trimmed characters in x while doubles are found; using trim(x) is essential as it not only speeds up processing while searching through a shorter string (without trailing blanks), it also prevents from falling into an infinite loop in case clist contains blank character to unduplicate (cc equals to double blanks which are always going to be found among trailing blanks).

    Let’s test our newly minted UNDUPC function on the following data:

    data SOURCE;
       infile datalines truncover;
       input str $50.;
       datalines;
    """Repeated "double quotes""""
    Repeated,,,,,commas,,,,,,,,,,,
    [[[""Mixed""]]   characters,,,
    ;

    Since our user-defined function is permanently stored in the location specified in the

    options cmplib=funclib.userfuncs;
    data TARGET;
       set SOURCE;
       length new_str $50;
       new_str = undupc(str, ' ,"][');
    run;

    This code will remove and replace all repeated sequences of characters ' ',',', '"', ']', and '['. The order of these characters listed in the second argument doesn’t matter. Here is what we get:

    Duplicate characters removal results
    As you can see, we get what we wanted including the functionality of the COMPBL function.

    User-defined CALL routine for removing any repeated characters

    As much as I love user-defined functions, I have an issue with the above undupc user-defined function implementation. It has to do with how the PROC FCMP handles interim character variables length attribute assignment. It does not implicitly inherit their length attribute from another variable as SAS data step does. For example, if you run the following data step:

    data a;
       length x $99;
       y = x;
    run;

    variable y will have the length attribute $99 implicitly inherited from the x variable.

    In PROC CFMP function, you can either assign the length attribute to a character variable explicitly with LENGTH or ATTRIB statement (as we did by using length x $32767 ), or it will be set to $33 if you use any other way of implicit assignment. (I leave it up to you guessing why 33 and why not any other number.) Since we wanted to accommodate SAS character strings of any length, we had to explicitly assign our interim variable x length attribute the maximum valid value of $32767. This will inevitably take tall on the function performance as we will have to process longer strings.

    However, we can avoid this issue by using CALL routine instead:

    libname funclib 'c:\projects\functions';
    proc fcmp outlib=funclib.usercalls.package1;
       subroutine undupc(str $, clist $, x $);
          outargs x;
          length c $1 cc $2;
          x = str;
          do i=1 to length(clist);
             c = char(clist,i);
             cc = c||c;
             do while (find(trim(x),cc));
                x = tranwrd(x,cc,c);
             end;
          end;
       endsub; 
    run;

    This code is very similar to the user-defined function above with a slight difference. Here, x variable is listed as an argument in the subroutine definition and refers to a SAS data step variable whose length attribute is assigned in the calling data step. Unlike SAS function, SAS subroutine does not return a value; instead, it uses

    options cmplib=funclib.usercalls;
    data TARGET;
       set SOURCE;
       length new_str $50;
       call undupc(str, ' ,"][', new_str);
    run;

    And we will get the same results as with the UNDUPC function above.

    Store user-defined functions and subroutines separately

    You can create and have both, user-defined function and call routine with the same name. However, to avoid confusion (and errors) do not store their definitions in the same data table (outlib= option of the PROC FCMP). If they are stored in the same data table, then when used in a DATA step, SAS will pull the latest definition by its name only and that may not be the entity you want.

    Performance benchmarking

    To compare performances of the UNDUPC function vs UNDUPC subroutine we created a rather large data table (1 Million observations) with randomly generated strings (1000 characters long):

    libname SASDL 'C:\PROJECTS\TESTDATA';
     
    data SASDL.TESTDATA (keep=str);
       length str $1000;
       do i=1 to 1000000;
          str = '';
          do j=1 to 1000;
             str = cats(str,byte(int(rank(' ')+38*rand('UNIFORM'))));
          end;
          output;
       end;
    run;

    Then we ran the following 2 data steps, one using the undupc() function, and the other using undupc() call routine:

    options cmplib=funclib.userfuncs;
     
    data SASDL.TESTDATA_UNDUPC_FUNC;
       set SASDL.TESTDATA;
       length new_str $1000;
       new_str = undupc(str, '#+');
    run;
     
    options cmplib=subrlib.usercalls;
     
    data SASDL.TESTDATA_UNDUPC_CALL;
       set SASDL.TESTDATA;
       length new_str $1000;
       call undupc(str, '#+', new_str);
    run;

    A quick SAS log inspection reveals that CALL UNDUPC works as much as 3 times faster than UNDUPC function (10 seconds vs. 30 seconds). The time savings may vary depending on your data composition and computing environment, but in any case, if you process high volumes of data you may consider using CALL routine over function. This is not a blanket statement, as it only pertains to this particular algorithm of eliminating character repetitions where we had to explicitly assign the highest possible length attribute value to the interim variable in the function, but not in the CALL routine.

    When we reduced declared length of x from $32767 to $1000 within the user-defined function definition its performance became on par with the CALL routine.

    Additional Resources for SAS character strings processing

    Your thoughts?

    Have you found this blog post useful? Would you vote for implementing UNDUPC as a native built-in SAS function? Please share your thoughts and feedback in the comments section below.

    Removing repeated characters in SAS strings was published on SAS Users.

    7月 292020
     

    Students in my classes often want to know about how to discover what's going on in their SAS environment. They get auditing questions from others in their organizations and don’t know how to find the answers. I honestly get a bit giddy when I can point them to SAS® software's Report Center, so it seems fitting to tout it now, as Friday is System Administrator Appreciation Day.

    The Report Center, available since SAS 9.4M3, is a collection of stored processes that produce reports from data in the SAS Environment Manager Data Mart. These reports are a window into the performance and status of your SAS environment and its resources. They're samples of the types of reports you can produce using available metric data. You can also create your own reports to meet your individual requirements.

    How Report Center works

    The stored processes in the Report Center are created when you initialize SAS Environment Manager Extended Monitoring. However, the stored processes operate only on data that was stored in the SAS Environment Manager Data Mart by the APM or ACM ETL processes. Unless you initialize and enable one of those packages, no reports are produced.

    The information in SAS Environment Manager Data Mart is the storage area for the Audit Performance Metrics (APM) and Agent Collected Metrics (ACM). APM scans the components in your SAS system for SAS server logs, SAS job logs, SAS Metadata and HTTP access logs; basically, everything SAS. The ACM collects information such as workload, CPU usage, and memory.

    The current SAS Administration Fast Track course, which includes information about the Report Center, is taught on SAS 9.4M6. Before I decided to write this post, I searched and found others have provided information about the Report Center. There are several other blog posts and videos about it, so it surprises me how many I teach are unaware of the Report Center.

    After generating reports in class, I have a whole new band of converts for the SAS Report Center. The great news is that it is never too late to initialize the Report Center in your SAS 9 environment.

    Learn more about how to use the Report Center in the SAS Environment manager 2.5 User's Guide. Training for the SAS Report Center is found in the SAS® Platform Administration: Fast Track. Happy SysAdmin Day a couple of days early! I look forward to the opportunity to meet you in one of our upcoming classes!

    TAKE THE CLASS | SAS® PLATFORM ADMINISTRATION: FAST TRACK

    Report Center: SAS SysAdmin's secret weapon was published on SAS Users.

    7月 282020
     

    I spend my days helping SAS Platform Administrators develop their skills. The best part of my job? Helping them understand the power and importance of SAS as an analytics platform. Watching them move from no knowledge of SAS to feeling confident in their newfound abilities brings me joy.

    Most of the future SAS Administrators I teach function as “SAS Administrator and <fill in the blank>.” Their dilemma inspired this post describing a typical day in the life of a SAS Administrator.

    Customer needs vary, and SAS adapts with them, so there's really no by-the-book, play-by-play list of the daily activities. Every SAS Administrator's day is a little different, but below are some of the common tasks.

    SAS server management

    The day starts with checking your servers. As the admin, I check my servers every day. Without the ability to process their code, your users will not be happy. I know, I know, making users happy might not be high on your bucket list, but it’s a part of the job. Besides, don’t you want the satisfaction of knowing that your finely tuned SAS environment remained finely tuned over night?

    Which SAS components you have and the operating system determine how you check your servers.

    SAS® Viya provides scripts in /etc/init.d that you use to stop, start, restart, and check the status of an individual SAS Viya server and service.

    How you run the individual server and service scripts depends on your operating system:

    To check status of all servers and services:

    sudo /etc/init.d/sas-viya-all-services status

    For SAS® 9, there are scripts in <config-directory>/config/Lev1 to control the servers.

    To check status of all servers and services:

    <config-directory>/config/Lev1/sas.servers {start|stop|restart|status}

    The good news is you do not have to run these scripts manually every day. You can schedule them using third party tools so that when you arrive at work, server status greets you.

    User management

    Many administrators prefer dealing with the servers more than the users. Like I mentioned earlier, users are really our primary reason for being here. Not a day goes by that you will not have to perform some task associated with users. Users come, so you must give access. Users go, so you must remove access. User A is having trouble with his report. User B is having trouble with her code. Solve the issue and you are a genius. If they stump you, no worries! There is always SAS Technical Support for help. With SAS, you are never alone.

    Data management

    I often tell my students to become friends with their database administrators (DBAs). Unless, the <fill in the blank> part of their “SAS and…” is DBA. Then they should be their own best friend because SAS is all about accessing data of all sizes. If I had a nickel for every story I’ve heard about turf wars between SAS Administrators and DBAs….

    The SAS Administrator must make sure the data is accessible. It may not be your job, but it is your responsibility. It doesn’t matter that you had nothing to do with the database password changing overnight and now this morning no one can access tables they could access yesterday. Today you solve everyone’s data access issues.

    People often ask me if I get tired of teaching the same administration classes over and over again. I tell them, "No Way!" The content may be the same, but because every customer can use SAS differently, every class is different. I think the same is true for administering SAS. Every day brings the opportunity for learning, fun and excitement. There will be smooth days, when all is well. There will be challenging days, when you wonder what in the world is going on. Regardless, as the SAS Administrator you have many resources to help. There’s a whole SAS Administrator Community if you feel alone, SAS Technical Support if you run into trouble, and SAS Education training if you need a bump or reset to your administration skills. There's also a way to be certified as a SAS Platform Administrator.

    BEGIN YOUR JOURNEY | TRAINING FOR SAS PLATFORM ADMINISTRATOR

    A day in the life of the SAS Platform Administrator was published on SAS Users.

    6月 242020
     

    A lookup table is a programming technique where one or more values can be used to retrieve another value. For example, many years ago, I had benzene exposure estimates for 10 years (1940 to 1949) for each of five locations in a factory. Given a year and a job location, I needed to know the benzene concentration.

    I would be terribly embarrassed today if anyone saw the first program I wrote to solve the problem! This blog shows a better way that uses temporary arrays to create an n-way lookup table. To keep the example simple, let's use five years of data (1944 to 1948) and four locations (1 to 4).

    Temporary arrays

    Before we get into the program, let's discuss temporary arrays, one of my favorite SAS tools. Here is an example of a one-dimensional temporary array:

    Data Pass_Fail;
       input ID $ Grade1 - Grade5;
       array PF[5] _temporary_ (65 70 55 65 55);
       array Grade[5]; *If you leave off the variable list SAS will use the
                        array name with numbers 1-5 added. In this example
                        the variables will be Grade1, Grade2, etc.;
       array Pass_or_Fail[5] $ 4;
       do i = 1 to 5;
          if Grade[i] ge PF[i] then Pass_or_Fail[i] = 'Pass';
          else if not missing(Grade[i]) then Pass_or_Fail[i] = 'Fail';
       end;
       drop i;
    datalines;
    001 90 68 52 70 72
    002 56 69 72 75 88
    ;
    Title "Listing of Data Set Pass_Fail";
    Proc print data=Pass_Fail noobs;
    Run;

    In this example, the temporary array is called PF (pass fail values), and it has 5 elements. There are no actual variables PF1, PF2, and so on, only array elements PF[1], PF[2], and so on. The initial values of the five passing grades are placed in parentheses following the key word _temporary_. In many situations, you load the values of the temporary array from a data file.

    To keep this first example easy to understand, we will put the initial values in the array statement. You can now compare each student's grade for every test and assign a value of "Pass" or "Fail."

    Here is the output:

    Note: You can read a blog that I wrote years ago on temporary arrays for another example. 

    Example

    Now for the two-way table lookup example.

    *Two-dimensional table lookup using a temporary array;
    data Lookup;
       array Benzene[1944:1948,4] _temporary_; ①
     
       /* Populate the array */
       if _n_ = 1 then do Year = 1944 to 1948; ②
           do Location = 1 to 4;
    	      input Benzene[Year,Location] @; ③
    	   end;
    	end;
     
       input Subj $ Year Location;
       Benzene_Level = Benzene[Year, Location]; ④
    datalines;
    250 200 150 130
    90 180 155 90
    95 35 170 140 
    80 50 45 100 
    40 50 25 15
    001 1944 3
    002 1948 1
    003 1945 4
    ;
    title "Listing od Data Set Lookup";
    proc print data=Lookup noobs;
    run;

    ① This ARRAY statement creates an array with two dimensions (you use a comma to create multiple dimensions). To make programming easier to understand, the first dimension of the array uses subscripts 1944 to 1948, rather than 1 to 5 (the colon enables you to specify the lower and upper bounds of an array). Also notice that there are no initial values in this statement—they will be read from data.
    ② This section of code populates the values in the Benzene temporary array. You use the statement if _n_ = 1 to ensure that this section of code executes only once.
    ③ The INPUT statement reads in a value for Year and Location. The single trailing @ sign prevents SAS from going to a new line each time to DO loop iterates.
    ④ Notice how easy it is to retrieve an exposure value, given a value of Year and Location. The first five lines of data are the values used to populate the temporary array.

    You can read more about temporary arrays in my book, Learning SAS by Example: A Programmers Guide, Second Edition.

    Comments on this blog are welcome.

    Multi-way lookup tables was published on SAS Users.

    6月 172020
     

    Photo by AbsolutVision on Unsplash

    As a SAS user for most of my professional career and SAS employee for eight years, I thought I had a pretty good idea of what SAS offers and how SAS® Analytics can solve problems. Yet even I can experience an "aha" moment when I learn something new about SAS and I wish I had known before.

    My most recent "aha" moment came a few months ago, right as the Covid-19 pandemic started to unfold. I learned about the SASEFRED interface engine, a component of SAS/ETS® software that allows you to retrieve a wide range of economic and financial data from the Federal Reserve Economic Data (FRED) site. Hosted by the Federal Reserve Bank of St. Louis, FRED is a treasure trove of about 765,000 US and international times series data reported at the national, state and county levels.

    Being able to retrieve data from FRED directly into SAS has been extremely handy as I've developed interactive dashboards to capture key economic and financial indicators, such as unemployment claims, confidence sentiment, stock market index, volatility, etc. My SAS US Public Sector team's focus on this is part of a bigger effort to come alongside customers since the pandemic began. Our goal is to help all levels of government quickly set up analytical environments and provide timely situational awareness and analytical services. We're glad we're positioned to help public officials address public health and economic consequences of the pandemic.

    How to get FRED data

    SAS FRED documentation is self-explanatory. The first step is to obtain a unique FRED API key on the FRED site: https://api.stlouisfed.org/api_key.html. Once that’s done, you are off to the race.

    Below is the snippet of the code that I used to pull unemployment data into SAS with SASEFRED.

    options validvarname=any
       sslcalistloc="/opt/sas/viya/config/etc/SASSecurityCertificateFramework/cacerts/trustedcerts.pem";
     
    libname _all_ clear;
    libname mylib "/opt/sas/viya/config/data/cas/default/public"; /** Folder for final datasets **/
    libname fred "/opt/sas/viya/config/data/cas/default/public/fred"; /** Folder for intermediate datasets **/
     
    /** Ingest FRED Data **/
     
    libname fred sasefred "/opt/sas/viya/config/data/cas/default/public/fred"
       OUTXML=UnemploymentClaims
       AUTOMAP=replace
       APIKEY='XXXXXXXXXXXXXXXXXXXX'  /** please request your API at this site https://api.stlouisfed.org/api_key.html **/
       IDLIST='ICSA,ICNSA,IC4WSA,CCSA,CCNSA'
       START='2008-01-01'
       END='2020-06-30'
       freq='w'
       ;
    data mylib.UnemploymentClaims;
       set fred.UnemploymentClaims;
    run;
    proc print data=mylib.UnemploymentClaims;
    run;

    The output of the above code execution is the table ‘UnemploymentClaims’ with information on weekly reported initial unemployment claims, four-week moving average claims, and continued claims (see Figure 1 below).

    Figure 1

    An interactive dashboard with FRED data

    The work was done on SAS® Viya, the next generation of SAS Analytics. SAS Viya offers a wide range of robust analytical capabilities, including visual data exploration and reporting. With data ingested into SAS Viya, I am now able to quickly develop an interactive economic dashboard with relevant indicators that will automatically update as the new economic data is reported – all thanks to SASEFRED (see image below).

    Check the resources below to learn more. Hope you found this post relevant and useful. Feel free to reach out if you have any questions!

    SGF PAPER | Using SAS® Forecast Server and the SASEFRED Engine to Enhance Your Forecast YOUTUBE | Extracting a Common Time Trend in SAS/ETS

    How to access Federal Reserve Economic Data (FRED) with SASEFRED in SAS/ETS® software was published on SAS Users.