Tech

3月 042021
 

Data scientist has been on the career hot list for years, but how does one become a data scientist?

Well, my journey to becoming a data scientist was not an intentional journey - I didn't start with a plan to end up in a data science career. In fact, I ended up doing data science for years before I realized it had a name.

What is a data scientist?

Data scientists have been described as a mashup between mathematicians and computer scientists with a curiosity to solve unexplored problems. There's plenty of data scientist job descriptions out there, but I think this description sums it up. If you'd like to learn more about what it means to be a data scientist, I recommend this article.

They’re part mathematician, part computer scientist and part trend-spotter.

Where did I start? Math.

If I were to try to pinpoint the start of my journey, I would have to say it started with a love of math. I started college as a mathematics major and briefly had the idea that I was going to become a mathematician.

My math program required an intro to computer science class, which I took my first semester and met my eventual undergraduate advisor. As part of the class, she met with each person individually to discuss their goals for the class and their career. In that meeting, she showed me the job opportunities (and salaries) for a mathematician versus a computer scientist. Let's just say that computer science looked more promising and as someone who always wanted to say they were a scientist, I was sold.

Shortly after that class, I started doing research under her. I was using machine learning to predict fraudulent energy consumption usage, which continued until I graduated and led me to apply for Ph.D. programs. At that point, I had already been doing and loving data science, but I wasn't aware of it yet.

Ph.D.?

I applied to several grad schools and decided to seek out labs that worked with educational data. Education provided me with many opportunities, and I wanted to be able to contribute to research that made education more accessible and attainable for others. After getting acceptance and rejection letters, I reached out to a research professor at one of the schools I was accepted to whose research focused on educational data. I met with her and made my choice to join their 5-year Ph.D. program.

You may be wondering at this point if you need a Ph.D. to become a data scientist and the answer is, as always, it depends. Data science is a quickly expanding career and multiple levels of data scientists exist. Some positions may require a Ph.D., but you will find many more that are just looking for a particular skillset and analytical experience.

Academia or industry?

During the program, I learned experimental design and setup, advanced analytics techniques and more about machine learning. After finishing coursework requirements in the first two years, I spent most of my time doing data analytics and writing up the results for academic papers.

Here, I also discovered a love of teaching through being an instructor. I dreamed up a new idea of becoming a research professor, but after I witnessed the day-to-day lives of professors, I started to doubt that I would enjoy it as a career. So, industry it was.

Luckily, I found that the opportunity to teach extends well past the academic world - after all, people want to learn about data science. In my current job at SAS within the Education department, I teach and design courses for our machine learning and analytical tools, although I spend most of my time doing typical data science work.

I use virtually all the skills I learned from my Ph.D. program, including skills I learned working with educational data. On a day-to-day basis, I work on designing, monitoring, and analyzing experiments to address business concerns. I also spend a considerable amount of time doing a variety of data processing tasks from cleaning to feature engineering, as well as visualization of data to help inform and improve internal processes in the SAS Education team.

What are some challenges of transitioning from academia to industry?

New domain, new audience, and most importantly, new goals!

Transitioning from academia to industry was, for me, a relatively easy process; however, there were a few challenges.

The biggest challenge I had was changing my mindset when it came to analytical goals.  Compared to academia where the analytical process you follow to answer a question is just as important as the goal (answering the research question), the goal tends to be the focus in industry. For me, this meant updating how I communicate results and their implications. I also needed to learn an entirely new domain (business operations) and how to communicate results to a non-technical audience.

Understanding the needs of your company is the most difficult challenge by far, so asking questions is key!  Presenting your interpretations or concerns of the goal is crucial to doing the job right. If you are someone in the process of transitioning from the academic world to industry, be ready to adapt to new processes and shift your mindset to accommodate solving business challenges, as the goals in business will differ from the goals of academic research.

Interested in becoming a data scientist?

If you are interested in becoming a data scientist, ask yourself:

  • "Am I the type of person who digs deep into the problem to find the answer - often generating additional questions and answers in the process?" or
  • "Would I rather focus on more defined, straightforward issues?"

Feel like you fall into the first type? Check out SAS Academy for Data Science (free for 30 days)

International Women's Day - March 8

In honor of International Women's day coming up March 8, I would like to thank all of the powerful women that have supported my journey.

I’ve met many inspiring women along the way. Both my undergraduate and graduate advisors happened to be women as well as my current boss, and I’m incredibly grateful to have had these leaders to look up to.

 

How I became a data scientist was published on SAS Users.

3月 042021
 

Data scientist has been on the career hot list for years, but how does one become a data scientist?

Well, my journey to becoming a data scientist was not an intentional journey - I didn't start with a plan to end up in a data science career. In fact, I ended up doing data science for years before I realized it had a name.

What is a data scientist?

Data scientists have been described as a mashup between mathematicians and computer scientists with a curiosity to solve unexplored problems. There's plenty of data scientist job descriptions out there, but I think this description sums it up. If you'd like to learn more about what it means to be a data scientist, I recommend this article.

They’re part mathematician, part computer scientist and part trend-spotter.

Where did I start? Math.

If I were to try to pinpoint the start of my journey, I would have to say it started with a love of math. I started college as a mathematics major and briefly had the idea that I was going to become a mathematician.

My math program required an intro to computer science class, which I took my first semester and met my eventual undergraduate advisor. As part of the class, she met with each person individually to discuss their goals for the class and their career. In that meeting, she showed me the job opportunities (and salaries) for a mathematician versus a computer scientist. Let's just say that computer science looked more promising and as someone who always wanted to say they were a scientist, I was sold.

Shortly after that class, I started doing research under her. I was using machine learning to predict fraudulent energy consumption usage, which continued until I graduated and led me to apply for Ph.D. programs. At that point, I had already been doing and loving data science, but I wasn't aware of it yet.

Ph.D.?

I applied to several grad schools and decided to seek out labs that worked with educational data. Education provided me with many opportunities, and I wanted to be able to contribute to research that made education more accessible and attainable for others. After getting acceptance and rejection letters, I reached out to a research professor at one of the schools I was accepted to whose research focused on educational data. I met with her and made my choice to join their 5-year Ph.D. program.

You may be wondering at this point if you need a Ph.D. to become a data scientist and the answer is, as always, it depends. Data science is a quickly expanding career and multiple levels of data scientists exist. Some positions may require a Ph.D., but you will find many more that are just looking for a particular skillset and analytical experience.

Academia or industry?

During the program, I learned experimental design and setup, advanced analytics techniques and more about machine learning. After finishing coursework requirements in the first two years, I spent most of my time doing data analytics and writing up the results for academic papers.

Here, I also discovered a love of teaching through being an instructor. I dreamed up a new idea of becoming a research professor, but after I witnessed the day-to-day lives of professors, I started to doubt that I would enjoy it as a career. So, industry it was.

Luckily, I found that the opportunity to teach extends well past the academic world - after all, people want to learn about data science. In my current job at SAS within the Education department, I teach and design courses for our machine learning and analytical tools, although I spend most of my time doing typical data science work.

I use virtually all the skills I learned from my Ph.D. program, including skills I learned working with educational data. On a day-to-day basis, I work on designing, monitoring, and analyzing experiments to address business concerns. I also spend a considerable amount of time doing a variety of data processing tasks from cleaning to feature engineering, as well as visualization of data to help inform and improve internal processes in the SAS Education team.

What are some challenges of transitioning from academia to industry?

New domain, new audience, and most importantly, new goals!

Transitioning from academia to industry was, for me, a relatively easy process; however, there were a few challenges.

The biggest challenge I had was changing my mindset when it came to analytical goals.  Compared to academia where the analytical process you follow to answer a question is just as important as the goal (answering the research question), the goal tends to be the focus in industry. For me, this meant updating how I communicate results and their implications. I also needed to learn an entirely new domain (business operations) and how to communicate results to a non-technical audience.

Understanding the needs of your company is the most difficult challenge by far, so asking questions is key!  Presenting your interpretations or concerns of the goal is crucial to doing the job right. If you are someone in the process of transitioning from the academic world to industry, be ready to adapt to new processes and shift your mindset to accommodate solving business challenges, as the goals in business will differ from the goals of academic research.

Interested in becoming a data scientist?

If you are interested in becoming a data scientist, ask yourself:

  • "Am I the type of person who digs deep into the problem to find the answer - often generating additional questions and answers in the process?" or
  • "Would I rather focus on more defined, straightforward issues?"

Feel like you fall into the first type? Check out SAS Academy for Data Science (free for 30 days)

International Women's Day - March 8

In honor of International Women's day coming up March 8, I would like to thank all of the powerful women that have supported my journey.

I’ve met many inspiring women along the way. Both my undergraduate and graduate advisors happened to be women as well as my current boss, and I’m incredibly grateful to have had these leaders to look up to.

 

How I became a data scientist was published on SAS Users.

3月 022021
 

The more I use SAS Studio in the cloud via SAS OnDemand for Academics, the more I like it. To demonstrate how useful the Files tab is, I'm going to show you what happens when you drag a text file, a SAS data set, and a SAS program into the Editor window.

I previously created a folder called MyBookFiles and uploaded several files from my local computer to that folder.  You can see a partial list of files in the figure below.

Notice that there are text files, SAS data sets, SAS programs, and some Excel workbooks. Look what happens when I drag a text file (Blank_Delimiter.txt) into the Editor window.

No need to open Notepad to view this file—SAS Studio displays it for you. What about a SAS data set? As an example, I dragged a SAS data set called blood_pressure into the Editor.

You see a list of variables and some of the observations in this data set.  There are vertical and horizontal scroll bars (not shown in the figure) to see more rows or columns. If you want to see a listing of the entire data set or the first 'n' observations, you can run the List Data task, located under the Tasks and Utilities tab.

For the last example, I dragged a SAS program into the editor. It appears exactly the same as if I opened it in my stand-alone version of SAS.

At this point, you can run the program or continue to write more SAS code. By the way, the tilde (~) used In the INFILE statement is a shortcut for your home directory. Follow it with the folder name and the file name.

You can read more about SAS Studio in the cloud in my latest book, Getting Started with SAS Programming: Using SAS Studio in the Cloud.

Viewing files, programs, and data sets in SAS Studio was published on SAS Users.

2月 252021
 

The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.

Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!

Go ahead and save the date(s)…you don’t want to miss this event!

Content, content, content

We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.

Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).

Networking and entertainment

You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.

A big thank you

We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!

Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.

What makes SAS Global Forum great? was published on SAS Users.

2月 222021
 

Removing a piece from character string In my previous post, we addressed the problem of inserting substrings into SAS character strings. In this post we will solve a reverse problem of deleting substrings from SAS strings.

These two complementary tasks are commonly used for character data manipulation during data cleansing and preparation to transform data to a shape suitable for analysis, text mining, reporting, modeling and decision making.

As in the previous case of substring insertion, we will cover substring deletion for both, character variables and macro variables as both data objects are strings.

The following diagram illustrates what we are going to achieve by deleting a substring from a string:

Removing a substring from SAS string illustration

Have you noticed a logical paradox? We take away a “pieceof” cake and get the whole thing as result! 😊

Now, let’s get serious.

Deleting all instances of a substring from a character variable

Let’s suppose we have a variable STR whose values are sprinkled with some undesirable substring ‘<br>’ which we inherited from some HTML code where tag <br> denotes a line break. For our purposes, we want to remove all instances of those pesky <br>’s. First, let’s create a source data set imitating the described “contaminated” data:

data HAVE;
   infile datalines truncover;
   input STR $100.;
   datalines;
Some strings<br> have unwanted sub<br>strings in them<br>
<br>A s<br>entence must not be cont<br>aminated with unwanted subs<br>trings
Several line<br> breaks<br> are inserted here<br><br><br>
<br>Resulting st<br>ring must be n<br>eat and f<br>ree from un<br>desirable substrings
Ugly unwanted substrings<br><br> must <br>be<br> removed
<br>Let's remove them <br>using S<br>A<br>S language
Ex<br>periment is a<br>bout to b<br>egin
<br>Simpli<br>city may sur<br>prise you<br><br>
;

This DATA step creates WORK.HAVE data set that looks pretty ugly and is hardly usable:
Source data to be cleansed
The following code, however, cleans it up removing all those unwanted substrings ‘<br>’:

data WANT (keep=NEW_STR);
   length NEW_STR $100;
   SUB = '<br>';
   set HAVE;
   NEW_STR = transtrn(STR,'<br>',trimn(''));
run;

After this code runs, the data set WANT will look totally clean and usable:
Cleaned data

Code highlights

  • We use .

The TRANSTRN function is similar to TRANWRD function which replaces all occurrences of a substring in a character string. While TRANWRD uses a single blank when the replacement string has a length of zero, TRANSTRN does allow the replacement string to have a length of zero which essentially means removing.

  • TRIM() function which removes trailing blanks from a character string and returns one blank if the string is missing. However, when it comes to removing (which is essentially replacement with zero length substring) the ability of TRIMN function to return a zero-length string makes all the difference.

Deleting all instances of a substring from a SAS macro variable

For macro variables, I can see two distinct methods of removing all occurrences of undesirable substring.

Method 1: Using SAS data step

Here is a code example:

%let STR = Some strings<br> have unwanted sub<br>strings in them<br>;
%let SUB = <br>;
 
data _null_;
   NEW_STR = transtrn("&STR","&SUB",trimn(''));
   call symputx('NEW',NEW_STR);
run;
 
%put &=STR;
%put &=NEW;

In this code, we stick our macro variable value &STR in double quotes in the transtrn() function as the first argument (source). The macro variable value &SUB, also double quoted, is placed as a second argument. After variable NEW_STR is produced free from the &SUB substrings, we create a macro variable NEW using

%let STR = Some strings<br> have unwanted sub<br>strings in them<br>;
%let SUB = <br>;
 
%let NEW = %sysfunc(transtrn(&STR,&SUB,%sysfunc(trimn(%str()))));
 
%put &=STR;
%put &=NEW;

Deleting selected instance of a substring from a character variable

In many cases we need to remove not all substring instances form a string, but rather a specific occurrence of a substring. For example, in the following sentence (which is a quote by Albert Einstein) “I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.” the second word “sometimes” was added by mistake. It needs to be removed. Here is a code example presenting two solutions of how such a deletion can be done:

data A;
   length STR STR1 STR2 $250;
   STR = 'I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.';
   SUB = 'sometimes';
   STR_LEN = length(STR);
   SUB_LEN = length(SUB);
   POS = find(STR,SUB,-STR_LEN);
   STR1 = catx(' ', substr(STR,1,POS-1), substr(STR,POS+SUB_LEN)); /* solution 1 */
   STR2 = kupdate(STR,POS,SUB_LEN+1);                              /* solution 2 */
   put STR1= / STR2=;
run;

The code will produce two correct identical values of this quote in the SAS log (notice, that the second instance of word “sometimes” is gone):

STR1=I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.
STR2=I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.

Code highlights

  • FIND() function determines position POS of the substring SUB to be deleted in the string STR. In this particular example, we used the fact, that the second occurrence of word “sometimes” is the first occurrence of this word when counted from right to left. That is indicated by the negative 3-rd argument (-STR_LEN) which means that FIND function searches STR for SUB starting from position STR_LEN from right to left.

Solution 1

This is the most traditional solution that cuts out two pieces of the string – before and after the substring being deleted – and then concatenates them together thus removing that substring:

  • substr(STR,1,POS-1) extracts the first part of the source string STR before the substring to be deleted: from position 1 to position POS-1.
  • substr(STR,POS+SUB_LEN) extracts the second part of the source string STR after the substring to be deleted: from position POS+SUB_LEN till the end of STR value (since the third argument, length, is not specified).
  • Solution 2

    Finding n-th instance of a substring within a string .

Deleting selected instance of a substring from a SAS macro variable

Here is a code example of how to solve the same problem as it relates to SAS macro variables. For brevity, we provide just one solution using %sysfunc and KUPDATE() function:

%let STR = I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.;
%let SUB = sometimes;
%let POS = %sysfunc(find(&STR,&SUB,-%length(&STR)));
%let STR2 = %sysfunc(kupdate(&STR,&POS,%eval(%length(&SUB)+1)));
%put "&STR2";

This should produce the following corrected Einstein’s quote in the SAS log:

"I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am."

Additional Resources for SAS character strings processing

Your thoughts?

Have you found this blog post useful? Please share your thoughts and feedback in the comments section below.

Deleting a substring from a SAS string was published on SAS Users.

2月 162021
 

Insert a piece into a stringSAS provides an extensive set of tools for data cleansing and preparation – transforming data to a shape suitable for analysis, text mining, reporting, modeling and ultimately decision making.

In this post we will cover one of the common tasks of character data manipulation – inserting a substring into a SAS character string.

A diagram below illustrates what we are going to achieve:

Illustration for inserting a substring into a string

SAS character strings come in two different incarnations: character variables and macro variables. Since these two are quite different SAS language objects, let’s cover them one by one separately.

Inserting a substring into a character variable

Here is our task: we have a SAS character variable (string) and we want to insert in it a value of another character variable (substring) starting at a particular specified position.

Let’s say we have a string BASE in which we want to insert a COUNTRY name right before word "stays" to make different variation of the resultant phrase. Here is an example of how this can be easily done:

data COUNTRIES;
   length COUNTRY $20;
   input COUNTRY;
   datalines;
Spain
Argentina
Slovenia
Romania
USA
Luxembourg
Egypt
Switzerland
;
 
data NEW (keep=COUNTRY PHRASE);
   BASE = 'The rain in stays mainly in the plain';
   INSPOS = find(BASE,'stays');
   set COUNTRIES;
   length PHRASE $50;
   PHRASE = catx(' ',substr(BASE,1,INSPOS-1),COUNTRY,substr(BASE,INSPOS));
run;

This code dynamically creates variable PHRASE out of values of variable BASE and the values of variable COUNTRY, thus making it data-driven.

After this code runs, the data set NEW will look like this:

Results after inserting a substring into a character string

Here are the code highlights:

  • maximum length of varying-length character variables is 536,870,911 characters (UTF-8 encoding).

    Inserting a substring into a SAS macro variable

    Let’s solve a similar task, but now instead of SAS variables we will operate with SAS macro variables, since they are strings too.

    Here is our problem to solve: we have a SAS macro variable (string) and we want to insert in it a value of another macro variable (substring) starting at a particular specified position.

    Let’s say we have a macro variable BASE with value of The rain in stays mainly in the plain in which we want to insert a country name defined by macro variable COUNTRY with value of Spain right before word stays. Here is an example of how this can be done:

    %let BASE = The rain in stays mainly in the plain;
    %let COUNTRY = Spain;
    %let W = stays;
     
    %let INSPOS = %index(&BASE,&W);
    %let PHRASE = %substr(&BASE,1,%eval(&INSPOS-1))&COUNTRY %substr(&BASE,&INSPOS);
    %put ***&PHRASE***;

    This code will insert the country name in the appropriate place within the BASE macro variable which will be printed in the SAS log by %put statement:

    ***The rain in Spain stays mainly in the plain***

    Here are the code highlights:

    • %substr() macro function to extract two parts of its first argument (&BASE) - before and after insertion:
      • %substr(&BASE,1,%eval(&INSPOS-1))captures the first part of &BASE (before insertion): substring of &BASE starting from the position 1 with a length of %eval(&INSPOS-1).
      • %substr(&BASE,&INSPOS) captures the second part of &BASE (after insertion): substring of &BASE starting from the position &INSPOS till the end of &BASE (since the third argument is not specified).
    • In case of macro variables, we don’t need any concatenation functions – we just list the component pieces of the macro variable value in a proper order with desired separators (blanks in this case).

    NOTE: Unlike for SAS variables, you don’t need to assign the length of SAS macro variables which are automatically defined by their assigned values. The maximum length of SAS macro variables is 65,534 bytes.

    Inserting multiple instances of a substring into a SAS character string

    Sometimes you need to insert a substring into several places (positions p1, p2, …, pn) of a character string. In this case you can use the above strategy repeatedly or iteratively with one little caveat: start inserting from the highest position and moving backwards to the lowest position. This will preserve your pre-determined positions because positions are counted from left to right and inserting a substring at a higher position won’t change the lower position number. Otherwise, after insertion of a substring into lower position, all your higher positions will shift by the length of the inserted substring.

    Additional Resources for SAS character strings processing

    Your thoughts?

    Have you found this blog post useful? Please share your thoughts and feedback in the comments section below.

    Inserting a substring into a SAS string was published on SAS Users.

  • 2月 012021
     

    Do you want to spend less time on the tedious task of preparing your data? I want to tell you about a magical and revolutionary SAS macro called %TK_codebook. Not only does this macro create an amazing codebook showcasing your data, it also automatically performs quality control checks on each variable. You will easily uncover potential problems lurking in your data including variables that have:

    • Incomplete formats
    • Out of range values
    • No variation in response values
    • Variables missing an assigned user-defined format
    • Variables that are missing labels

    All you need is a SAS data set with labels and formats assigned to each variable and the accompanying format catalogue. Not only will this macro change the way you clean and prepare your data, but it also gives you an effortless way to evaluate the quality of data you obtain from others before you start your analysis. Look how easy it is to create a codebook if you have a SAS data set with labels and formats:

    title height=12pt 'Master Codebook for Study A Preliminary Data';
    title2 height=10pt 'Simulated Data for Participants in a Health Study';
    title3 height=10pt 'Data simulated to include anomalies illustrating the power of %TK_codebook';
     
    libname library "/Data_Detective/Formats/Blog_1_Codebooks";
     
    %TK_codebook(lib=work,
    	file1=STUDYA_PRELIM,
    	fmtlib=LIBRARY,
    	cb_type=RTF,
    	cb_file=/Data_Detective/Book/Blog/SAS_Programs/My_Codebook.rtf,
    	var_order=INTERNAL,
    	organization = One record per CASEID,
    	include_warn=YES;
    run;

    Six steps create your codebook

    After creating titles for your codebook, this simple program provides %TK_codebook with the following instructions:

    1. Create a codebook for SAS data set STUDYA_PRELIM located in the temporary Work library automatically defined by SAS
    2. Find the formats assigned to the STUDYA_PRELIM in a format catalogue located in the folder assigned to the libref LIBRARY
    3. Write your codebook in a file named /Data_Detective/Book/Blog/SAS_Programs/My_Codebook.rtf
    4. List variables in the codebook by their INTERNAL order (order stored in the data set)
    5. Add “One record per CASEID” indicating which variable(s) uniquely identify each observation to codebook header
    6. Include reports identifying potential problems in the data

    Just these few lines of code will create the unbelievably useful codebook shown below.

    The data set used has many problems that can interfere with analysis. %TK_codebook creates reports showing a concise summary of only those problem variables needing close examination. These reports save you an incredible amount of time.

    Using assigned formats, %TK_codebook identifies unexpected values occurring in each variable and provides a summary in the first two reports.

    Values occurring outside those defined by the assigned format indicate two possible problems:

    1. A value was omitted from the format definition (Report 1 – Incomplete formats)
    2. The variable has unexpected values needing mitigation before the data is analyzed (Report 2 – Out of Range Value)

    The next report lists numeric variables that have no variation in their values.

    These variables need examining to uncover problems with preparing the data set.

    The next two reports warn you about variables missing an assigned user-defined format. These variables will be excluded from screening for out-of-range values and incomplete format definitions.

    The last report informs you about variables that are missing a label or have a label that matches the variable name.

    It is easy to use %TK_codebook to resolve problems in your data and create an awesome codebook. Instead of spending your time preparing your data, you will be using your data to change the world!

    Create your codebook

    Download %TK_codebook from my author page, then learn to use it from my new book, The Data Detective’s Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data.

    THE DATA DETECTIVE'S TOOLKIT | BUY IT NOW

    Creating codebooks with SAS macros was published on SAS Users.

    1月 272021
     

    SASensei logoSAS offers myriad ways to level up your SAS skills (scroll to the bottom to see a list of SAS-provided learning paths and channels). In this post, I introduce you to SASensei, an independent, third-party online SAS learning resource that I enjoy a lot.

    Learning: dull or fun?

    Learning is not always associated with fun. Sometimes it feels difficult and exhausting. New concepts often contradict our prior knowledge and experience, compelling us to rethink, adjust, change and adapt to new paradigms.

    Learning new ideas, skills and technologies can be intimidating, challenging and demanding. While learning, you are stretching out of your comfort zone. But that feeling is only transient. As a matter of attitude, learning is not about pushing yourself out of your comfort zone, it’s about expanding your comfort zone. And that is long lasting. The more you learn, the more comfortable and self-confident you become.

    Learning does not have to be tedious. Look at pre-school kids. They learn basic life skills like walking (rolling, crawling), talking (in one or more languages), asking questions (a lot) – all without taking classes, just through their natural curiosity and ... playing games.

    What is SASensei? Gamified SAS learning

    When I first discovered the SASensei online SAS learning game/application I was pleasantly surprised by its non-traditional approach to learning such a serious and well-established platform as SAS.

    As stated on its website, “Sasensei is a question based learning system. You must demonstrate your command of SAS® to earn Tokens - which should be wisely invested, to enable you to unlock new levels within the game...”

    The following screenshot shows the main page of the SASensei website that displays a dashboard of the top players (they call it leaderboard). You can filter it geographically - by Country, Continent, or World, as well as by the timeline – by Past Month, Past Year, or All Time.

    SASensei leaderboard

    Privacy or prominence

    Users have full control of their privacy or prominence. As you can see in the screenshot above, registered users are displayed by their screen names. This allows the users to either remain anonymous by selecting some fictitious obscure screen name or use their real name. Users can change their screen name at any time.

    Rules of the game

    In this blog post I provide just an overview of the main functionality and features of the SASensei learning platform. For detailed rules of the game, see SASensei Documentation.

    Play and learn

    Users are offered a variety of learning activities:

    • Viewing, reviewing and submitting SAS-learning flashcards;
    • Playing, reviewing and submitting questions by different SAS-related topics;
    • Taking and creating public, private, multi-player and custom quizzes;
    • Providing feedback on questions and flashcards by voting and commenting.

    Users can challenge themselves by delving into different topics. Your successes and failures will provide you an honest and objective estimation of your SAS strengths as well as weaknesses. A healthy competition with other users encourages you to learn more and hone your SAS skills. When you fail a question, you can review the explanation of the correct answer and thus learn why you failed and acquire new knowledge, tips and tricks quickly and efficiently.

    Invest, score, win and build a reputation

    To play you will need to earn and spend tokens which are essentially the game’s currency. To motivate you further, you also earn reputation points, which is your ultimate score, a level of achievement in demonstrating SAS skills. Your reputation score is prominently displayed in your public profile. As you progress in the game and your reputation grows, additional functionality unlocks and becomes available to you. Your reputation score determines your SASensei standing level which is derived from those used in martial arts:

    Sasensei title Sasamurai title

    • White Belt (new players)
    • Yellow Belt (reputation ≥ 50)
    • Green Belt (reputation ≥ 100)
    • Black Belt (reputation ≥ 200)
    • Sasamurai (reputation ≥ 500)
    • Assassin (reputation ≥ 1000)
    • Sasensei (reputation ≥ 5000)

    Sample SASensei question

    When you play a question, you select a topic, and then you are presented with a randomly selected multiple-choice question of a specified time limit (30, 60, 90 or 120 seconds). Here is a sample of such question:

    Question:

    What is wrong with the following LIBNAME statement?
    libname fruits (apples oranges tomatoes);

    Answers:

    • Incorrect syntax
    • You cannot mix apples and oranges in LIBNAME statement
    • Nothing is wrong, valid LIBNAME statement
    • Tomatoes are not fruits, therefore the statement is not correct

    Correct answer:

    Nothing is wrong, valid LIBNAME statement

    Explanation:

    According to Combine and conquer with SAS for examples of usage.

    Question
    Try tackling a question on your own in the SASensei environment to get real life experience: Sample Question.

    Take a SASensei sample quiz

    There are various quizzes available at SASensei: public quizzes, multiplayer quiz games, private quizzes (tests) for students.

    A public quiz contains 12 questions with a total time cap of 12 minutes, and costs eight tokens to play. You can choose a single topic (sas statements, sas macro, procedures, options, etc.), and if you pass (75% or more) you get 12 tokens back, plus 20 point to your reputation. If you get 100%, you get 30 reputation points plus Top Student badge. A count of passed sessions (by topic) is displayed on your public profile.

    QuizAlthough public quizzes are unlocked at the SASamurai level, for the readers of this blog, I have created a special custom quiz sample so you can experience it firsthand right here, right now. Just click on this link, Sample Quiz, register, and enjoy your ride.

    See you at the top of the SASensei dashboard!

    Credit

    Big THANKS to Allan Bowe (United Kingdom) – SAS innovator and entrepreneur who created and founded SASensei learning platform.

    Other SAS learning resources

    Game on! SASensei: a fun way to learn SAS was published on SAS Users.

    1月 132021
     

    Running SAS programs in parallel reduces run time

    As earth completes its routine annual circle around the sun and a new (and hopefully better) year kicks in, it is a perfect occasion to reflect on the idiosyncrasy of time.

    While it is customary to think that 3+2=5, it is only true in sequential world. In parallel world, however, 3+2=3. Think about it: if you have two SAS programs one of which runs 3 hours, and the second one runs 2 hours, their total duration will be 5 hours if you run them one after another sequentially, but it will take only 3 hours if you run them simultaneously, in parallel.

    I am sure you remember those “filling up a swimming pool” math problems from elementary school. They clearly and convincingly demonstrate that two pipes will fill up a swimming pool faster than one. That’s the power of running water in parallel.

    The same principle of parallel processing (or parallel computing) is applicable to SAS programs (or non-SAS programs) by running their different independent pieces in separate SAS sessions at the same time (in parallel).  Divide and conquer.

    You might be surprised at how easily this can be done, and at the same time how powerful it is. Let’s take a look.

    SAS/CONNECT

    MP CONNECT) parallel processing functionality was added to SAS/CONNECT enabling you to execute multiple SAS sessions asynchronously. When a remote SAS session kicks off asynchronously, a portion of your SAS program is sent to the server session for execution and control is immediately returned to the client session. The client session can continue with its own processing or spawn one or more additional asynchronous remote server sessions.

    Running programs in parallel on a single machine

    Sometimes, what comes across as new is just well forgotten old. They used to be Central Processing Units (CPU), but now they are called just processors. Nowadays, practically every single computer is a “multi-machine” (or to be precise “multi-processor”) device. Even your laptop. Just open Task Manager (Ctrl-Alt-Delete), click on the Performance tab and you will see how many physical processors (or cores) and logical processors your laptop has:

    Parallel processing on a single machine

    That means that this laptop can run eight independent SAS processes (sessions) at the same time. All you need to do is to say nicely “Dear Mr. & Mrs. SAS/CONNECT, my SAS program consists of several independent pieces. Would you please run each piece in its own SAS session, and run them all at the same time?” And believe me, SAS/CONNECT does not care how many logical processors you have, whether your logical processors are far away from each other “remote machines” or they are situated in a single laptop or even in a single chip.

    Here is how you communicate your request to SAS/CONNECT in SAS language.

    Spawning multiple SAS sessions using MP Connect

    Suppose you have a SAS code that consists of several pieces – DATA or PROC steps that are independent of each other, i.e. they do not require to be run in a specific sequence. For example, each of the two pieces generates its own data set.

    Then we can create these two data sets in two separate “remote” SAS sessions that run in parallel. Here is how you do this.  (For illustration purposes, I just create two dummy data sets.)

    options sascmd="sas";
     
    /* Current datetime */
    %let _start_dt = %sysfunc(datetime());
     
    /* Prosess 1 */
    signon task1;
    rsubmit task1 wait=no;
     
       libname SASDL 'C:\temp';
     
       data SASDL.DATA_A (keep=str);
          length str $1000;
          do i=1 to 1150000;
             str = '';
             do j=1 to 1000;
                str = cats(str,'A');
             end;
             output;
          end;
       run;
     
    endrsubmit;
     
    /* Process 2 */
    signon task2;
    rsubmit task2 wait=no;
     
       libname SASDL 'C:\temp';
     
       data SASDL.DATA_B (keep=str);
          length str $1000;
          do i=1 to 750000;
             str = '';
             do j=1 to 1000;
                str = cats(str,'B');
             end;
             output;
          end;
       run;
     
    endrsubmit;
     
    waitfor _all_;
    signoff _all_;
     
    /* Print total duration */
    data _null_;
       dur = datetime() - &_start_dt;
       put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
    run;

    In this code, the key elements are:

    SIGNON Statement - initiates a connection between a client session and a server session.

    ENDRSUBMIT statement - marks the end of a block of statements that a client session submits to a server session for execution.

    SIGNOFF Statement - ends the connection between a client session and a server session.

    Parallel processing vs. threaded processing

    There is a distinction between parallel processing described above and threaded processing (aka multithreading). Parallel processing is achieved by running several independent SAS sessions, each processing its own unit of SAS code.

    Threaded processing, on the other hand, is achieved by developing special algorithms and implementing executable codes that run on multiple processors (threads) within the same SAS session. Amdahl's law, which provides theoretical background and estimation of potential time saving achievable by parallel computing on multiple processors.

    Passing information to and from “remote” SAS sessions

    Besides passing pieces of SAS code from client sessions to server sessions, MP CONNECT allows you to pass some other SAS objects.

    Passing data library definitions

    For example, if you have a data library defined in your client session, you may pass that library definition on to multiple server sessions without re-defining them in each server session.

    Let’s say you have two data libraries defined in your client session:

    libname SRCLIB oracle user=myusr1 password=mypwd1 path=mysrv1;
    libname TGTLIB '/sas/data/datastore1';

    In order to make these data libraries available in the remote session all you need is to add

    rsubmit task1 wait=no inheritlib=(SRCLIB TGTLIB);

    This will allow libraries that are defined in the client session to be inherited by and available in the server session. As an option, each client libref can be associated with a libref that is named differently in the server session:

    rsubmit task1 wait=no inheritlib=(SRCLIB=NEWSRC TGTLIB=NEWTGT);

    Passing macro variables from client to server session

    options sascmd="sas";
    %let run_dt = %sysfunc(datetime());
    signon task1;
    %syslput rem_run_dt=&run_dt / remote=task1;
    rsubmit task1 wait=no;
       %put &=rem_run_dt;
    endrsubmit;
     
    waitfor task1;
    signoff task1;

    Passing macro variables from server to client session

    Similarly,

  • %SYSRPUT_USER_ </LIKE=‘character-string’>;
  • (/LIKE=<‘character-string’ >specifies a subset of macro variables whose names match a user-specified character sequence, or pattern.)

    Here is a code example that passes two macro variables, rem_start and rem_year from the remote session and outputs them to the SAS log in the client session:

    options sascmd="sas";
    signon task1;
    rsubmit task1 wait=no;
       %let start_dt = %sysfunc(datetime());
       %sysrput rem_start=&start_dt;
       %sysrput rem_year=2021;
    endrsubmit;
     
    waitfor task1;
    signoff task1;
    %put &=rem_start &=rem_year;

    Summary

    SAS’ Multi-Process Connect is a simple and efficient tool enabling parallel execution of independent programming units. Compared to sequential processing of time-intensive programs, it allows to substantially reduce overall duration of your program execution.

    Additional resources

    Running SAS programs in parallel using SAS/CONNECT® was published on SAS Users.

    1月 072021
     

    This is the last of three posts on our hot-fix process, aimed at helping you better manage your SAS®9 environment through tips and best practices. The first two installments are linked here:

    Having a good understanding of the hot-fix process can help you keep your SAS environment running smoothly. This last installment aims to help you get on a schedule with your hot-fix installations and provides an example spreadsheet (available for download on GitHub) to manage hot fixes.

    Schedule hot fixes

    As an administrator, sometimes applying outstanding hot fixes can be a daunting task. However, the longer you wait, the worse your situation becomes—with a potentially unstable environment and a growing backlog of hot fixes to apply. With a little careful planning, the task can become routine and everyone involved will be much happier. The next sections outline a strategy for getting on a quarterly schedule.

    Run the SASHFADD tool

    The first step of getting on a quarterly schedule to apply hot fixes is to run the SAS Hot Fix Analysis, Download and Deployment (SASHFADD) Tool. For information about running this tool and analyzing the report it generates, see the first two installments in this blog series, The SAS Hot Fix Analysis, Download and Deployment (SASHFADD) Tool and Understanding the SAS Hot Fix Analysis, Download and Deployment Tool Report.

    Once you review the SASHFADD report, you will have a better understanding of what resources will be needed to apply the outstanding hot fixes. You also need to decide which philosophy of installing hot fixes you want to follow. For more information, see Which hot fixes should I apply?

    Coordinate with IT

    The second step is to coordinate the process with your IT department. Before you take the system offline to apply hot fixes, IT typically wants to do the following:

    • Perform a full system backup.
    • Check for scheduled jobs and make necessary adjustments.
    • Decide the best time to stop SAS services.
    • Evaluate how long the system will need to be offline.

    After the first session of applying your hot-fix backlog, all these tasks can run on a regular (preferably at least quarterly) schedule that won't require as much analysis time from IT.

    Communicate with end users

    Before you implement the plan that you and the IT department devised, you need to communicate with your end users. Let them know ahead of time (maybe by a week) when the outage will occur, what they need to do to prepare for it, and how long it will take. It's a best practice to perform the update outside of regular business hours.

    Reap the benefits

    When you follow a quarterly schedule of applying hot fixes, there are many benefits:

    • The administrator is more experienced from installing hot fixes regularly, so the process goes more smoothly and takes less time.
    • The IT department has an established process in place for backing up the system and taking it offline for the maintenance.
    • End users know what to expect and are not surprised by the outage.
    • The system runs better and is protected from vulnerabilities due to the regular schedule of updates.
    • There is less downtime because there are fewer hot fixes to install with each update.

    Manage hot fixes

    Applying hot fixes can often be a complicated process with multiple steps before and after you install them. So, a key aspect of successfully applying hot fixes is ensuring that you follow all the steps that are included in the SASHFADD report. A great tool for managing this complexity is a spreadsheet!

    Download one I created and customize it:

    SANDY'S SPREADSHEET | DOWNLOAD IT NOW

    This tool allows you to see and then check off (through highlighting, color coding, or notes) each of the steps to get the best results.

    Administrators will have different approaches to their spreadsheets. Mine, linked above, is the result of much trial and error. Here are the items that I keep track of in my spreadsheet:

    • Hot-fix number
    • Hot-fix dependencies
    • Pre-installation steps
    • Post-installation steps
    • Special notes

    Another benefit of the spreadsheet is that you can group steps together so that you can do them all at once. Here are some examples of when you can group steps to save time:

    • When performing the Rebuild Web Applications step, you can select the SAS® Marketing Automation, SAS® Marketing Optimization, and/or SAS® Deployment Manager web applications to be rebuilt in one iteration.
    • When performing the Deploy Web Applications step, you can select the SAS® Marketing Automation, SAS® Marketing Optimization, and/or SAS® Deployment Manager web applications to be rebuilt in one iteration.

    Helpful resources

    See the following links for the detailed and thorough documentation:

    I hope that this blog series has been helpful to you! Have a terrific day!

    READ PART ONE | The SAS Hot Fix Analysis, Download and Deployment (SASHFADD) Tool READ PART TWO | Understanding the SAS Hot Fix Analysis, Download and Deployment Tool Report

    How to schedule and manage your SAS hot fixes was published on SAS Users.