SAS programmers

6月 182021

SAS Global Certification is pleased to announce two (yes two!) new SAS Viya programming certifications. Traditional SAS programmers need to migrate their code and data to SAS Viya environments, and there are important skills required to make a successful transition. SAS Viya includes Cloud Analytic Services (CAS), where data is stored in-memory and programs execute in parallel on distributed nodes. To leverage CAS, you may need to update your DATA step and SAS SQL code. Also, there are new techniques to manage data stored in CAS. Once you have mastered these skills, you can take an exam and add the credential to your resumes, cv, and LinkedIn profiles!

Behind the Scenes

A cross functional team here at SAS worked since the beginning of 2021 to design, write, and validate these new credentials. Drawing from their experience working with SAS customers, this diverse team from Professional Services, Technical Support, Documentation, and Education overcame the challenges of distance and time-zones (and virtual meetings!) to create these credentials. We appreciate the work that this team accomplished and we had a great time working with you!

SAS Viya Programming Associate

The first credential, the SAS Certified Associate: Fundamentals of Programming Using SAS Viya is designed for SAS programmers who simply need to migrate their code and data to SAS Viya. They plan to work with CAS tables while still using traditional SAS programming techniques. One helpful tool is the CASUTIL procedure, which provides a familiar SAS procedure framework to help SAS programmers manage data in CAS. Also, we cover important skills to update DATA step code and convert PROC SQL code to PROC FEDSQL to execute in CAS. To learn these skills and prepare for the exam, the recommended training is the one-day Programming for SAS Viya course.

SAS Viya Programming Specialist

The second credential, the SAS Certified Specialist: Intermediate Programming Using SAS Viya builds on the skills introduced in the Associate credential, adding assessment of CAS Language (CASL) programming and CAS actions. With CASL programming skills, you can more precisely control accessing and processing of CAS data. There is also extensive coverage of CAS actions which is the smallest unit of work for the CAS server. CAS actions can load data, transform data, compute statistics, perform analytics, and create output. In addition to the Programming for SAS Viya course, the second recommended training class is the new three-day High-Performance Data Processing with CASL in SAS Viya.

Next Steps

You can learn more about these new credentials at the links above. There you will find detailed exam content guides, free sample questions, and free practice exams (yes - free full length practice exams!). If you have questions, please ask in the comments below. I'll start with a few easy ones:

  • Are there pre-requisites for these credentials? No. While they assume good understanding of traditional programming in a SAS 9 environment, these credentials do not require a prior SAS certification. You also can take the SAS Viya Programming Specialist without taking the SAS Viya Programming Associate.
  • Are these exams performance-based? No, these exams are not performance-based but are rather traditional format exams with multiple choice and short-answer fill-in-the-blank questions.
  • What exactly is on the exam? For detailed exam topics, hit the "Exam Content Guide" links on each credential's web page. The content guides contain exam sections, objectives, and expanded detail about the important topics.
  • What are the exam numbers? The Associate exam is A00-415 and the Specialist exam is A00-420.

We hope you tackle the recommended training, develop your skills, and register soon to be the first candidates to earn these credentials. Good luck with your preparations!

Two New SAS® Viya® Programming Certifications was published on SAS Users.

6月 152021

Dealing with big dataIn this fast-paced data age, when the sheer volume of data (generated, collected, and waiting to be processed and analyzed) grows at a breathtaking rate, the speed of data processing becomes critically important. In many cases, if data is not processed within an allotted time frame, we lose all its value as it becomes obsolete and ultimately irrelevant. That is why computing power becomes of the essence.

However, computing power itself does not guarantee timely processing. How we use that power makes all the difference. Way too often good old sequential processing just does not cut it anymore and different computing methods are required. One  such method is parallel processing.

In my previous post Using shell scripts for massively parallel processing I demonstrated a script-centered technique of running in parallel multiple independent SAS processes in SAS environments lacking SAS/CONNECT.

In this post, we will take a shot at a slightly different task and solution. Instead of having several totally independent processes, now we have some common “pre-processing” part, then we run several independent processes in parallel, and then we combine the results of parallel processing in the “post-processing” portion of our program.

Problem: monthly data ingestion use case

For simplification, we are going to use a scenario similar to one in the previous blog post:

Each month, shortly after the end of the previous month we needed to ingest a number of CSV files pertinent to transactions during the previous month and produce daily SAS data tables for each day of the previous month. Only now, we will go a step further: combining all those daily tables into a monthly table.

Solution: combining sequential and parallel processing

The solution is comprised of the three major components:

  • Shell script running the main SAS program.
  • Main SAS program, consisting of three parts: pre-parallel processing, parallel processing, and post-parallel processing.
  • Single thread SAS program responsible for a single day data ingestion.

1. Shell script running main SAS program

Below shell script runs the main SAS program

# nohup sh /path/ YYYYMM &
now=$(date +%Y.%m.%d_%H.%M.%S)
# getting YYYYMM as a parameter in script call
sas $pgmname -log $logname -set inDate $ym -set logname $logname

The script is run a background mode as indicated by the ampersand at the end of its invocation command:

nohup sh /path/ YYYYMM &

We pass a parameter YYYYMM (e.g. 202106) indicating year and month of our request.

When we call SAS program within the script we indicate the name of the SAS log file to be created (-log $logname) and also pass on inDate parameter (-set inDate $ym, which has the same value YYYYMM as parameter specified in the script calling command), and logname parameter (-set logname $logname). As you will see further, we are going to use these two parameters within program.

2. Main SAS program

Here is an abridged version of the program:

/* ======= pre-processing ======= */
/* parameters passed from shell script */
%let inDate = %sysget(inDate);
%let logname = %sysget(logname);
/* year and month */
%let yyyy = %substr(inDate,1,4);
%let mm = %substr(inDate,5,2);
/* output data library */
libname SASDL '/data/target';
/* number of days in month mm of year yyyy */
%let days = %sysfunc(day(%sysfunc(mdy(&mm+1,1,&yyyy))-1));
/* ======= parallel processing ======= */
%macro loop;
   %local threadprog looplogdir logdt workpath tasklist i z threadlog cmd;
   %let threadprog = /path/;
   %let looplogdir = %substr(&logname,1,%length(&logname)-4)_logs;
   x "mkdir &looplogdir"; *<- directory for loop logs;
   %let logdt = %substr(&logname,%length(&logname)-22,19);
   %let workpath = %sysfunc(pathname(WORK));
   %let tasklist=;
   %do i=1 %to &days;
      %let z = %sysfunc(putn(&i,z2.));
      %let threadlog = &looplogdir/thread_&z._&logdt..log;
      %let tasklist = &tasklist DAY&i;
      %let cmd = sas &threadprog -log &threadlog -set i &i -set workpath &workpath -set inDate &inDate;
      systask command "&cmd" taskname=DAY&i;
   waitfor _all_ &tasklist;
%mend loop;
/* ======= post-processing ======= */
/* combine daily tables into one monthly table */
data SASDL.TARGET_&inDate;
   set WORK.TARGET_&inDate._1 - WORK.TARGET_&inDate._&days;

The key highlights of this program are:

  • We capture values of the parameters passed to the program (inDate and logname).
  • Based on these parameters, assign source directory and target data library SASDL.
  • Calculate number of days in a specific month defined by year and month.
  • Create a directory to hold SAS logs of all parallel threads; the directory name is matching the log name of the
  • Capture the WORK library location of the main SAS session running as:

    %let workpath = %sysfunc(pathname(WORK));We use that location in the thread sessions to pass back to the main session data produced by the thread sessions.

  • Macro %do-loop generates a series of SYSTASK statements to spawn additional SAS sessions in the background mode, each ingesting data for a single day of a month:

    systask command "&cmd" taskname=DAY&i;The SYSTASK statement enables you to execute host-specific commands from within your SAS session or application. Unlike the X statement, the SYSTASK statement runs these commands as asynchronous tasks, which means that these tasks execute independently of all other tasks that are currently running. Asynchronous tasks run in the background, so you can perform additional tasks (including launching other asynchronous tasks) while the asynchronous task is still running.

    Restriction: SYSTASK statement is not supported on the CAS server.

  • Also, we generate a cumulative list of all tasknames assigned to each thread sessions:

    %let tasklist = &tasklist DAY&i;

  • Outside the macro %do-loop we use WAITFOR statement which suspends execution of the main SAS session until the specified tasks finish executing. Since we created a list of all daily thread sessions (&tasklist), this will synchronize all our parallel threads and continue session only when all threads finished executing.
  • At the end of the main SAS session we concatenate all our daily data tables that have been created by parallel threads in the location of the WORK library of the main SAS session.

Using SAS macro loop to generate a series of SYSTASK statements for parallel processing is not the only method available. Alternatively, you can achieve this within a data step using CALL EXECUTE. In this case, each data step iteration will generate a single global SYSTASK statement and push it out of the data step boundaries where they will be sequentially executed (just like in the case of macro implementation). Since option NOWAIT is the default for SYSTASK statements, despite all of them being launched sequentially, their corresponding OS commands will be still running in parallel.

3. Single thread SAS program

Here is an abridged version of the program:

/* inDate parameter */
%let inDate = %sysget(inDate);
/* parent program's WORK library */
%let workpath = %sysget(workpath);
libname MAINWORK "&workpath";
/* thread number */
%let i = %sysget(i);
/* year and month */
%let yyyy = %substr(inDate,1,4);
%let mm = %substr(inDate,5,2);
/* source data directory */
%let srcdir = /datapath/&yyyy/&mm;
/* create varlist macro variable to list all input variable names */
proc sql noprint;
   select name into :varlist separated by ' ' from SASHELP.VCOLUMN
   where libname='PARMSDL' and memname='DATA_TEMPLATE';
/* create fileref inf for the source file */
filename inf "&srcdir/source_data_&inDate._day&i..cvs";
/* create daily output data set */
data MAINWORK.TARGET_&inDate._&i; 
   if 0 then set PARMSDL.DATA_TEMPLATE;
   infile inf missover dsd encoding='UTF-8' firstobs=2 obs=max;
   input &varlist;

This program ingests a single .csv file corresponding to the &i-th day of &inDate (year and month) and creates a SAS data table MAINWORK.TARGET_&inDate._&i. To be available in the main SAS session the MAINWORK library is defined here in the same physical location as the WORK library of the main parental SAS session.

We also use a pre-created SAS data template PARMSDL.DATA_TEMPLATE - a zero-observations data set that contains descriptions of all the variables and their attributes.

Additional resources

Thoughts? Comments?

Do you find this post useful? Do you have processes that may benefit from parallelization? Please share with us below.

Using SYSTASK and SAS macro loops for massively parallel processing was published on SAS Users.

5月 112021

It’s safe to say that SAS Global Forum is a conference designed for users, by users. As your conference chair, I am excited by this year’s top-notch user sessions. More than 150 sessions are available, many by SAS users just like you. Wherever you work or whatever you do, you’ll find sessions relevant to your industry or job role. New to SAS? Been using SAS forever and want to learn something new? Managing SAS users? We have you covered. Search for sessions by industry or topic, then add those sessions to your agenda and personal calendar.

Creating a customizable agenda and experience

Besides two full days of amazing sessions, networking opportunities and more, many user sessions will be available on the SAS Users YouTube channel on May 20, 2021 at 10:00am ET. After you register, build your agenda and attend the sessions that most interest you when the conference begins. Once you’ve viewed a session, you can chat with the presenter. Don’t know where to start? Sample agendas are available in the Help Desk.

For the first time, proceedings will live on SAS Support Communities. Presenters have been busy adding their papers to the community. Everything is there, including full paper content, video presentations, and code on GitHub. It all premiers on “Day 3” of the conference, May 20. Have a question about the paper or code? You’ll be able to post a question on the community and ask the presenter.

Want training or help with your code?

Code Doctors are back this year. Check out the agenda for the specific times they’re available and make your appointment, so you’ll be sure to catch them and get their diagnosis of code errors. If you’re looking for training, you’ll be quite happy. Training is also back this year and it’s free! SAS instructor-led demos will be available on May 20, along with the user presentations on the SAS Users YouTube channel.

Chat with attendees and SAS

It is hard to replicate the buzz of a live conference, but we’ve tried our best to make you feel like you’re walking the conference floor. And we know networking is always an important component to any conference. We’ve made it possible for you to network with colleagues and SAS employees. Simply make your profile visible (by clicking on your photo) to connect with others, and you can schedule a meeting right from the attendee page. That’s almost easier than tracking down someone during the in-person event.

We know the exhibit hall is also a big draw for many attendees. This year’s Innovation Hub (formerly known as The Quad) has industry-focused booths and technology booths, where you can interact in real-time with SAS experts. There will also be a SAS Lounge where you can learn more about various SAS services and platforms such as SAS Support Communities and SAS Analytics Explorers.

Get started now

I’ve highlighted a lot in this blog post, but I encourage you to view this 7-minute Innovation Hub video. It goes in depth on the Hub and all its features.

This year there is no reason not to register for SAS Global Forum…and attend as few or as many sessions as you want. Why? Because the conference is FREE!

Where else can you get such quality SAS content and learning opportunities? Nowhere, which is why I encourage you to register today. See you soon!

SAS Global Forum: Your experience, your way was published on SAS Users.

4月 202021

I can’t believe it’s true, but SAS Global Forum is just over a month away. I have some exciting news to share with you, so let’s start with the theme for this year:

New Day. New Answers. Inspired by Curiosity.

What a fitting theme for this year! Technology continues to evolve, so each new day is a chance to seek new answers to what can sometimes feel like impossible challenges. Our curiosity as humans drives us to seek out better ways to do things. And I hope your curiosity will drive you to register for this year’s SAS Global Forum.

We are excited to offer a global event across three regions. If you’re in the Americas, the conference is May 18-20. In Asia Pacific? Then we’ll see you May 19-20. And we didn’t forget about Europe. Your dates are May 25-26. We hope these region-specific dates and the virtual nature of the conference means more SAS users than ever will join us for an inspiring event. Curious about the exciting agenda? It’s all on the website, so check it out.

Keynotes speakers that you’ll talk about for months to come

Want to be inspired to chase your “impossible” dreams? Or hear more about the future of AI? How about learning about work-life balance and your mental health? We have you covered. SAS executives are gearing up to host an exciting lineup of extremely smart, engaging and thought-provoking keynote speakers like Adam Grant, Ayesha Khanna and Hakeem Oluseyi.

And who knows, we might have a few more surprises up our sleeve. You’ll just have to register and attend to find out.

Papers and proceedings: simplified and easy to find

Have you joined the SAS Global Forum online community? You should, because that’s where you’ll find all the discussion around the conference…before, during and after. It’s also where you’ll find a link to the 2021 proceedings, when they become available. Authors are busy preparing their presentations now and they are hard at work staging their proceedings in the community. Join the community so you can connect with other attendees and know when the proceedings become available.

Stay tuned for even more details

SAS Global Forum is the place where creativity meets curiosity, and amazing analytics happens! I encourage you to regularly check the conference website, as we’re continually adding new sessions and events. You don’t want to miss this year’s conference, so don’t forget to register for SAS Global Forum. See you soon!

Registration is open for a truly inspiring SAS Global Forum 2021 was published on SAS Users.

4月 142021

Improving programming jobs performance with massively parallel processingUntil recently, I used UNIX/Linux shell scripts in a very limited capacity, mostly as vehicle of submitting SAS batch jobs. All heavy lifting (conditional processing logic, looping, macro processing, etc.) was done in SAS and by SAS.  If there was a need for parallel processing and synchronization, it was also implemented in SAS. I even wrote a blog post  Running SAS programs in parallel using SAS/CONNECT®, which I proudly shared with my customers.

The post caught their attention and I was asked if I could implement the same approach to speed up processes that were taking too long to run.

However, it turned out that SAS/CONNECT was not licensed at their site and procuring the license wasn’t going to happen any time soon. Bummer!

Or boon? You should never be discouraged by obstacles. In fact, encountering an obstacle might be a stroke of luck. Just add a mixture of curiosity, creativity, and tenacity – and you get a recipe for new opportunity and success. That’s exactly what happened when I turned to exploring shell scripting as an alternative way of implementing parallel processing.

Running several batch jobs in parallel

UNIX/Linux OS allows running several scripts in parallel. Let’s say we have three SAS batch jobs controlled by their own scripts,, and We can run them concurrently (in parallel) by submitting these shell scripts one after another in background mode using & at the end. Just put them in a wrapper “parent” script and run it in background mode as:

$ nohup &

Here what is inside the 

#!/bin/sh & & &

With such an arrangement, “parent” script starts all three background tasks (and corresponding SAS programs) that will run by the server concurrently (as far as resources would allow.) Depending on the server capacity (mainly, the number of CPU’s) these jobs will run in parallel, or quasi parallel competing for the server shared resources with the Operating System taking charge for orchestrating their co-existence and load balancing.

The wait command at the end is responsible for the “parent” script’s synchronization. Since no process id or job id is specified with wait command, it will wait for all current “child” processes to complete. Once all three tasks completed, the parent script will continue past the wait command.

Get the UNIX/Linux server information

To evaluate server capabilities as it relates to the parallel processing, we would like to know the number of CPU’s.

To get this information we can ran the the lscpu command as it provides an overview of the CPU architectural characteristics such as number of CPU’s, number of CPU cores, vendor ID, model, model name, speed of each core, and lots more. Here is what I got:

Ha! 56 CPUs! This is not bad, not bad at all! I don’t even have to usurp the whole server after all. I can just grab about 50% of its capacity and be a nice guy leaving another 50% to all other users.

Problem: monthly data ingestion use case

Here is a simplified description of the problem I was facing.

Each month, shortly after the end of the previous month we needed to ingest a number of CSV files pertinent to transactions during the previous month and produce daily SAS data tables for each day of the previous month.  The existing process sequentially looped through all the CSV files, which (given the data volume) took about an hour to run.

This task was a perfect candidate for parallel processing since data ingestions of individual days were fully independent of each other.

Solution: massively parallel process

The solution is comprised of the two parts:

  • Single thread SAS program responsible for a single day data ingestion.
  • Shell script running multiple instances of this SAS program concurrently.

Single thread SAS process

The first thing I did was re-writing the SAS program from looping through all of the days to ingesting just a single day of a month-year. Here is a bare-bones version of the SAS program:

/* capture parameter &sysparm passed from OS command */ 
%let YYYYMMDD = &sysparm;
/* create varlist macro variable to list all input variable names */
proc sql noprint;
   select name into :varlist separated by ' ' from SASHELP.VCOLUMN
   where libname='PARMSDL' and memname='DATA_TEMPLATE';
/* create fileref inf for the source file */
filename inf "/cvspath/rawdata&YYYYMMDD..cvs";
/* create daily output data set */
   if 0 then set PARMSDL.DATA_TEMPLATE;
   infile inf missover dsd encoding='UTF-8' firstobs=2 obs=max;
   input &varlist;

This SAS program (let’s call it can be run in batch using the following OS command:

sas -log oneday.log -sysparm 202103

Note, that we pass a parameter (e.g. 202103 means year 2021, month 03) defining the requested year and month YYYYMM as -sysparm value.

That value becomes available in the SAS program as a macro variable reference &sysparm.

We also use a pre-created data template PARMSDL.DATA_TEMPLATE - a zero-observations data set that contains descriptions of all the variables and their attributes (see Simplify data preparation using SAS data templates).

Shell script running the whole process in parallel

Below shell script puts everything together. It spawns and runs concurrently as many daily processes as there are days in a specified month-of-year and synchronizes all single day processes (threads) at the end by waiting them all to complete. It logs all its treads and calculates (and prints) the total processing duration. As you can see, shell script as a programming language is a quite versatile and powerful. Here it is:

# cd /projpath/scripts
# nohup sh &
# Project path
# Program file name
# Current date/time stamp
now=$(date +%Y.%m.%d_%H.%M.%S)
echo 'Start time:'$now
# Reset timer
# Get YYYYMM as the script parameter
# Extract year and month from $par
# Get number of days in month $m of year $y
days=$(cal $m $y | awk 'NF {DAYS = $NF}; END {print DAYS}')
# Create log directory
mkdir $logdir
# Loop through all days of month $m of year $y
for i in $(seq -f "%02g" 1 $days)
   # Assign log name for a single day thread
   # Run single day thread
   /SASHome/SASFoundation/9.4/sas $pgmname -log $logname -sysparm $par$i &
# Wait until all threads are finished
# Calculate and print duration
end=$(date +%Y.%m.%d_%H.%M.%S)
echo 'End time:'$end
mm=$(( $(($SECONDS - $hh * 3600)) / 60 ))
ss=$(($SECONDS - $hh * 3600 - $mm * 60))
printf " Total Duration: %02d:%02d:%02d\n" $hh $mm $ss
echo '------- End of job -------'

This script is self-described by detail comments and can be run as:

cd /projpath/scripts
nohup sh &


The results were as expected as they were stunning. The overall duration was cut roughly by a factor of 25, so now this whole task completes in about two minutes vs. one hour before. Actually, now it is even fun to watch how SAS logs and output data sets are being updated in real time.

What is more, this script-centric approach can be used for running not just SAS processes, but non-SAS, open source and/or hybrid processes as well. This makes it a powerful amplifier and integrator for heterogeneous software applications development.

SAS Consulting Services

The solution presented in this post is a stripped-down version of the original production quality solution. This better serves our educational objective of communicating the key concepts and coding techniques. If you believe your organization’s computational powers are underutilized and may benefit from a SAS Consulting Services engagement, please reach out to us through your SAS representative, and we will be happy to help.

Additional resources

Thoughts? Comments?

Do you find this post useful? Do you have processes that may benefit from parallelization? Please share with us below.

Using shell scripts for massively parallel processing was published on SAS Users.

4月 012021

Uncertainty Principle blackboard

If I were to say that we live in uncertain times, that would probably be an understatement. Therefore, I won’t say that. Oops, I already did. Or did I?

For centuries, people around the world have been busy scratching their heads in search of a meaningful answer to Shakespeare’s profoundly elementary question: “To be or not to be?”

Have we succeeded? Sure. And in pursuit of even further greatness, we have progressed beyond the simple binary choice. Thanks to human ingenuity, it is now possible to have it all: to be and not to be.

But doesn’t this contradict human logic? Not at all, according to the Heisenberg uncertainty principle – a cornerstone of quantum mechanics asserting a fundamental limit to the certainty of knowledge.

According to the uncertainty principle, it is not possible to determine both the momentum and position of particles (bosons, electrons, quarks, etc.) simultaneously. Here is the famous formula:

Δx = uncertainty in position.
Δp = uncertainty in momentum.
h = Planck’s constant (a rare and precious number equal to 6.62607015×10−34 representing how much the energy of a photon increases, when the frequency of its electromagnetic wave increases by 1).
4π = π π π π (4 pi’s; no mathematical formula of any scientific significance can do without at least one of them!)

In addition, every particle or quantum entity may be defined as either a particle or a wave depending on how you feel about it according to the wave-particle duality principle. But let’s not let the dual meaning inconvenience us. Let’s just call them matters, or things for simplicity.

Then we can formulate the uncertainty principle in plain and clear terms:

Since it is impossible to know whether the position of a thing is X or not X, then that thing can be in position X and not be in position X simultaneously. Thus “to be and not to be”.


There is an abundance of examples of the uncertainty principle in SAS software. Let’s consider several of them.

History of the present and present of the history

Some of you may remember SAS version 7.0. It’s remarkable in a way that it was the shortest-lived SAS version that lasted roughly one year. It was released in October 1998 and was replaced by SAS 8.0 in November 1999. There were no 7.1 or 7.2 sub-versions, only 7.0.

But (and this is a big BUT), have you noticed that even today the latest SAS products (9.4 and Viya) use the following 9
Physical Name: c:\temp

Notice how it’s SAS Engine V9, but SAS datasets created with it have .sas7bdat extensions.

Where do you think that digit “7” came from? Obviously, even almost two decades after version 7.0’s demise it is still alive and kicking. How can you explain that other than by the uncertainty principle: “it is while it is not”!

Transience and permanence

Let’s take another example. How long have you known the fact that in order to create a permanent SAS data set you need to specify its name as a two-level name, e.g. LIBREF.DATASETNAME, while for temporary data sets you can specify a one-level name, e.g. DATASETNAME, or you can use a two-level name where the first level is WORK to explicitly signify the temporary library. Now, equipped with that “settled science” knowledge, what do you think the following code will create, a temporary or a permanent data set?

options user='c:\temp';
data MYDATA;
   x = 22371;

Just run this code and check your c:\temp folder to make sure that data set MYDATA is permanent. Credit for this shortcut goes to the

options user='c:\temp';
   x = 22371;

SAS Log will show:
NOTE: The data set USER.DATA1 has 1 observations and 1 variables.

Isn’t an ultimate proof of the “to be and not to be” principle (sponsored by create a data set by defining its physical pathname without even relying on SAS data set names, whether one or two-level:

data "c:\temp\aaa";
   x = 22371;
   format x date9.;

This code runs perfectly fine, creating a SAS data set as a file named aaa.sas7bdat in the c:\temp folder.

And I am not even talking  about the Uncertainty principle: Final Exam

And now, ladies and gentlemen, you will have to pass your final exam to receive an official April Fools diploma from SAS University.

Problem to solve

You know that every SAS data step creates automatic variables, _N_ and _ERROR_, which are available during the data step execution. Is it possible to save those automatic variables on the output data set?

In other words, will the following code create 3 variables on the output data set ABC?

data ABC (keep=MODEL _N_ _ERROR_);
   set SASHELP.CARS(keep=MODEL);

If you answered “No” you get 1 credit. If you answered “Yes” you get 0 credit. But that’s only if you answered the second question (I assume you noticed that I asked two questions in a row). If your “Yes”/ ”No” answer relates to the first question your credits are in reverse.

Bonus for creativity

However, if you not only answered “Yes” to the first question, but also provided a “how-to” code example, you get a bonus in the amount of 10 credits. Here is your bonus for creativity:

data BBC;
   set SASHELP.CARS(keep=MODEL);
   x = _n_;
   e = _error_;
   rename x=_n_ e=_error_;

You still have to run this code to make sure it creates data set BBC with 3 variables: MODEL, _N_, and _ERROR_ in order to get your 10 credits vested.

Problem solved = problem created

And lastly, the final curiosity test and exercise where you find out about SAS’ no-nonsense solution in the face of uncertainty. What happens in the following data step when the SAS-created automatic data step variables, _N_ and _ERRROR_, collide with the same-name variables brought in by the previously created BBC data set?

data CBC;
   set BBC;

After you complete this test/exercise and find out the answer, you can grab your diploma below and proudly brag about it and display it anywhere.

SAS Institute diploma

WAIT! Before you leave, please do not forget to provide your answers, questions, code examples, and comments below.

More April Fools’ Day SAS articles

April 1, 2020: Theory of relativity in SAS programming
April 1, 2019: Dividing by zero with SAS
April 1, 2018: SAS discovers a new planet in the Solar System
April 1, 2017: SAS code to prove Fermat's Last Theorem

To be and not to be – the uncertainty principle in SAS was published on SAS Users.

3月 252021

French leave, English style leave, Irish goodbyeIn many SAS applications, there is a need to conditionally stop SAS code execution and gracefully (without generating an ERROR or a WARNING) terminate SAS session when no further processing is required. For example, your program processes large data and flags certain transactions as suspicious. If any suspicious transactions are found, then you continue further processing to gather more information about those transactions. However, if no transactions were flagged, you just want to stop your SAS job and augment the SAS log with a nice NOTE (not ERROR or WARNING), for example:

NOTE: No suspicious transactions are found. Further processing canceled. Exiting program execution.

Graceful termination techniques described in this post primarily apply to batch processing scenarios when SAS programs run unattended without human intervention. However, development of these programs is usually done in interactive SAS sessions, and we need to make sure SAS log is captured before our interactive application is “terminated”. Therefore, before we proceed reviewing and experimenting with SAS termination techniques, let’s make arrangements for capturing the SAS log.

Capturing SAS log

When you run a SAS program in batch you would usually submit it using OS command. For example, in UNIX/Linux you may submit or place in a shell script the following command:

sas /code/proj1/ -log /code/proj1/job1.log

The log file name is specified in the command itself (-log /code/proj1/job1.log), and it will record our “goodbye NOTE” generated before SAS session is terminated.

However, when you run/debug your program in an interactive application, e.g. Enterprise Guide, the SAS log is captured in the Log Window. When SAS terminates its session, it will not only terminate your program execution, it will also terminate the application itself. That means your Enterprise Guide will close, along with its Log Window leaving you with no Log to see and inspect.

To capture SAS log to a file while debugging a SAS program containing session termination feature in interactive environment/application you can use

/* Beginning of SAS program */
proc printto log='C:\PROJECT1\program1.log';
/* End of SAS program */
proc printto; run;

In this code ABORT Statement is an executable statement that can be used as part of IF-THEN/ELSE conditional processing in a DATA step. Its action, however, extends beyond the DATA step as it not only stops executing the current DATA step, but also stops executing SAS session.

There are several flavors of the ABORT statement in SAS 9.4 – ABORT, ABORT ABEND, ABORT CANCEL, ABORT RETURN, and ABORT <number>. All of them are useful when something goes bad in your program (e.g. database library is not available at the time of run, or a data set is locked preventing you from writing to it, etc.), and you need to kill your program to avoid an even bigger snafu. All such situations are real, and ABORT statement handles them properly – stopping further program execution, killing SAS session, and generating an ERROR message in the SAS log.

But that ERROR message in the SAS log is what effectively disqualifies ABORT statements from graceful termination status. Think about it: “no suspicious transactions” is an occasion to celebrate, not a reason to cry an ERROR. Besides, in many mission-critical production-quality applications having an ERROR or even a WARNING in the SAS log is not an option.

Fortunately, for SAS® Viya users, there is a new global statement, it takes effect as soon as it is encountered in a SAS program. It can be placed anywhere in a SAS program, except where only executable statements are allowed. For example, if you place it as part of the IF-THEN/ELSE statement you get a syntax ERROR. Try running this code:

data _null_;
   if 1=1 then endsas;

SAS log will show a syntax error:

2    data _null_;
3       if 1=1 then endsas;
ERROR 180-322: Statement is not valid or it is used out of proper order.

However, if you place it in a conditionally executed DO-block it will not generate a syntax ERROR, but it will not produce what we wanted either because of the following.

First, it will execute even when the condition if FALSE. Second, since it executes inside the DO-block, it will end SAS data step and SAS session right there, without even giving DO-block a chance of completing its compilation, thus generating an ERROR. Here is the code illustration:

data _null_;
   if 0 then
      put 'Nooo!';

If you dare to run this code, here is what you will see in the SAS log after your SAS session gets killed:

2    data _null_;
3       if 0 then
4       do;
5          put 'Nooo!';
6          endsas;
ERROR 117-185: There was 1 unclosed DO block.

No, it is not what we are after.

ENDSAS statement – data step solution

Using coding technique described in my previous post How to conditionally execute SAS global statements, we can make ENDSAS to be conditionally generated within data step and executed after the data step.

Suppose we have a data set SUSPICIOUS_CASES which may have either zero or some positive number of observations. We want to stop further processing and terminate SAS session in case it has 0 observations.

Here is how we can achieve this:

data _null_;
   set SUSPICIOUS_CASES nobs=n;
   if n=0 then call execute('endsas;');

Here we conditionally (if n=0) invoke CALL EXECUTE routine which un-quotes its character argument and pushes it outside the data step boundaries, thus generating the following code after RUN statement (as shown in the SAS log):

NOTE: CALL EXECUTE generated line.
1   + endsas;

Thus, we conditionally generated global ENDSAS statement and placed it after the data step. This global statement will terminate SAS session without a fuss; no ERROR, no WARNING, and even no NOTE.

If you want it to be not so “silent goodbye”, you can add some informative NOTES using PUT statement executed under the same condition as CALL EXECUTE (we combine them in a DO-block):

data _null_;
   set SUSPICIOUS_CASES nobs=n;
   if n=0 then
      put 'NOTE: No suspicious cases were found. Further processing is terminated.';
      call execute('endsas;');

This code will conditionally output NOTE to the SAS log during data step. Obviously, you can generate any number of NOTE lines making your exit more verbose. We can place PUT statement either before or after CALL EXECUTE within the DO-block, because it will be executed within the data step, while generated ENDSAS statement will be executed after the data step.

ENDSAS statement – SAS macro solution

Another way of conditionally generating SASEND statement is by using

%if 1 %then
   %put NOTE: Ending SAS session gracefully.;

SAS log will show:

2    %if 1 %then
3    %do;
4       %put NOTE: Ending SAS session gracefully.;
NOTE: Ending SAS session gracefully.
5       endsas;
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414

Additional resources

Thoughts? Comments?

Do you find this blog post useful? How do you handle graceful termination of your SAS programs? Please share with us below.

How to conditionally stop SAS code execution and gracefully terminate SAS session was published on SAS Users.

3月 162021

SAS global statementsSAS IF-THEN/ELSE statement that executes DATA step statements depending on specified conditions:

IF expression THEN executable-statement1;
<ELSE executable-statement2;>

Try sticking it in there and SAS will slap you with an ERROR:

data _null_;
   set SASHELP.CARS nobs=n;
   if n=0 then libname outlib 'c:\temp';

SAS log will show:

3       if n=0 then libname outlib 'c:\temp';
ERROR 180-322: Statement is not valid or it is used out of proper order.

But global statements’ “not executable” status only means that they cannot be executed as part of a DATA step execution. Otherwise, “they take effect” (in my mind that equates to “they execute”) right after the compilation phase but before DATA step executes (or processes) its data reads, writes, logic and iterations.

Here is another illustration. Let’s get a little creative and tack a LIBNAME global statement within conditionally executed DO-group of the IF-THEN statement:

   set SASHELP.CARS nobs=n;
   if n=0 then
      libname OUTLIB 'c:\temp';
In this case, SAS log will show:
NOTE: Libref OUTLIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: c:\temp
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set OUTLIB.CARS has 428 observations and 15 variables.

As you can see, not only our LIBNAME statement “executed” (or “took effect”) despite the IF-THEN condition was FALSE, it successfully assigned the OUTLIB library and applied it to the data OUTLIB.CARS; statement that appears earlier in the code. That is because the LIBNAME global statement took effect (executed) right after the DATA step compilation before its execution.

For the same reason, you can place global statement TITLE either in open code before PROC that produces output with a title or within that PROC. In the first case, the stand-alone TITLE statement is compiled on its own and immediately executed thus setting the title for the PROCs that follow. In the latter case, it is compiled with the PROC step, then immediately executed before PROC step’s execution.

Now, when we have a solid grasp of the global statements timing habits, let’s look at the coding techniques allowing us to take full control of when and whether global statements take effect (executed).

Macro language to conditionally execute SAS global statements


%let dsname = SASHELP.CARS;
/*%let dsname = SASHELP.CLASS;*/
%let name = %scan(&dsname,2);
%if (&name eq CARS) or (&name eq CLASS) %then
   options DLCREATEDIR;
   libname outlib "c:\temp\&name";
   libname outlib "c:\temp";
data OUTLIB.&name;
   set &dsname;

In this code, if name is either CARS or CLASS the following global statements will be generated and passed on to the SAS compiler:

   options DLCREATEDIR;
   libname outlib "c:\temp\&name";

This will create a directory c:\temp\&name (if it does not exist) and assign libref OUTLIB to that directory.

Otherwise, the following global statement will be generated and passed on to the SAS compiler:

   libname outlib "c:\temp";

The DATA step then creates data set OUTLIB.&name in the corresponding dynamically assigned library. Using this technique, you can conditionally generate global statements for SAS system options, librefs, filerefs, titles, footnotes, etc. SAS compiler will pick up those generated global statements and execute (activate, put in effect) them.

CALL EXECUTE to conditionally execute SAS global statements

Sometimes, it is necessary to conditionally execute global statements based on values contained in data, whether in raw data or SAS data sets. Such a data-driven approach can be easily implemented using CALL EXECUTE routine in a DATA step.

data _null_;
   by MAKE;
   if first.MAKE then
      call execute('title "'||trim(MAKE)||' models";');
      call execute('proc print noobs data=SASHELP.CARS(where=(MAKE="'||trim(MAKE)||'"));');
      call execute('   var MAKE MODEL TYPE;');
      call execute('run;');

In this code, for every block of unique MAKE values (identified by first.MAKE) we have CALL EXECUTE generating lines of SAS code and pushing them outside the DATA step boundary where they compile and execute. The code snippets for TITLE and WHERE clause are data-driven and generated dynamically. The SAS log will show a series of the generated statements:

NOTE: CALL EXECUTE generated line.
1   + title "Acura models";
2   + proc print noobs data=SASHELP.CARS(where=(MAKE="Acura"));
3   +    var MAKE MODEL TYPE;
4   + run;
5   + title "Audi models";
6   + proc print noobs data=SASHELP.CARS(where=(MAKE="Audi"));
7   +    var MAKE MODEL TYPE;
8   + run;

. . . and so forth.

In this implementation, global statement TITLE is prepared (“pre-cooked”) conditionally (if first.MAKE is TRUE) within the DATA step in a form of a character value. It’s still not a global statement until CALL EXECUTE pushes it out of the DATA step. There it becomes a global statement as part of SAS code stream. There it gets compiled and executed, setting a nice data-driven title for the PROC PRINT output (individually for each Make):

PROC PRINT outputs with dynamically generated titles

Additional resources

Your thoughts?

Have you found this blog post useful? Do you have any questions? Please feel free to ask and share your thoughts and feedback in the comments section below.

How to conditionally execute SAS global statements was published on SAS Users.

2月 252021

The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.

Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!

Go ahead and save the date(s)…you don’t want to miss this event!

Content, content, content

We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.

Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).

Networking and entertainment

You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.

A big thank you

We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!

Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.

What makes SAS Global Forum great? was published on SAS Users.

2月 222021

Removing a piece from character string In my previous post, we addressed the problem of inserting substrings into SAS character strings. In this post we will solve a reverse problem of deleting substrings from SAS strings.

These two complementary tasks are commonly used for character data manipulation during data cleansing and preparation to transform data to a shape suitable for analysis, text mining, reporting, modeling and decision making.

As in the previous case of substring insertion, we will cover substring deletion for both, character variables and macro variables as both data objects are strings.

The following diagram illustrates what we are going to achieve by deleting a substring from a string:

Removing a substring from SAS string illustration

Have you noticed a logical paradox? We take away a “pieceof” cake and get the whole thing as result! 😊

Now, let’s get serious.

Deleting all instances of a substring from a character variable

Let’s suppose we have a variable STR whose values are sprinkled with some undesirable substring ‘<br>’ which we inherited from some HTML code where tag <br> denotes a line break. For our purposes, we want to remove all instances of those pesky <br>’s. First, let’s create a source data set imitating the described “contaminated” data:

data HAVE;
   infile datalines truncover;
   input STR $100.;
Some strings<br> have unwanted sub<br>strings in them<br>
<br>A s<br>entence must not be cont<br>aminated with unwanted subs<br>trings
Several line<br> breaks<br> are inserted here<br><br><br>
<br>Resulting st<br>ring must be n<br>eat and f<br>ree from un<br>desirable substrings
Ugly unwanted substrings<br><br> must <br>be<br> removed
<br>Let's remove them <br>using S<br>A<br>S language
Ex<br>periment is a<br>bout to b<br>egin
<br>Simpli<br>city may sur<br>prise you<br><br>

This DATA step creates WORK.HAVE data set that looks pretty ugly and is hardly usable:
Source data to be cleansed
The following code, however, cleans it up removing all those unwanted substrings ‘<br>’:

data WANT (keep=NEW_STR);
   length NEW_STR $100;
   SUB = '<br>';
   set HAVE;
   NEW_STR = transtrn(STR,'<br>',trimn(''));

After this code runs, the data set WANT will look totally clean and usable:
Cleaned data

Code highlights

  • We use .

The TRANSTRN function is similar to TRANWRD function which replaces all occurrences of a substring in a character string. While TRANWRD uses a single blank when the replacement string has a length of zero, TRANSTRN does allow the replacement string to have a length of zero which essentially means removing.

  • TRIM() function which removes trailing blanks from a character string and returns one blank if the string is missing. However, when it comes to removing (which is essentially replacement with zero length substring) the ability of TRIMN function to return a zero-length string makes all the difference.

Deleting all instances of a substring from a SAS macro variable

For macro variables, I can see two distinct methods of removing all occurrences of undesirable substring.

Method 1: Using SAS data step

Here is a code example:

%let STR = Some strings<br> have unwanted sub<br>strings in them<br>;
%let SUB = <br>;
data _null_;
   NEW_STR = transtrn("&STR","&SUB",trimn(''));
   call symputx('NEW',NEW_STR);
%put &=STR;
%put &=NEW;

In this code, we stick our macro variable value &STR in double quotes in the transtrn() function as the first argument (source). The macro variable value &SUB, also double quoted, is placed as a second argument. After variable NEW_STR is produced free from the &SUB substrings, we create a macro variable NEW using

%let STR = Some strings<br> have unwanted sub<br>strings in them<br>;
%let SUB = <br>;
%let NEW = %sysfunc(transtrn(&STR,&SUB,%sysfunc(trimn(%str()))));
%put &=STR;
%put &=NEW;

Deleting selected instance of a substring from a character variable

In many cases we need to remove not all substring instances form a string, but rather a specific occurrence of a substring. For example, in the following sentence (which is a quote by Albert Einstein) “I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.” the second word “sometimes” was added by mistake. It needs to be removed. Here is a code example presenting two solutions of how such a deletion can be done:

data A;
   length STR STR1 STR2 $250;
   STR = 'I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.';
   SUB = 'sometimes';
   STR_LEN = length(STR);
   SUB_LEN = length(SUB);
   POS = find(STR,SUB,-STR_LEN);
   STR1 = catx(' ', substr(STR,1,POS-1), substr(STR,POS+SUB_LEN)); /* solution 1 */
   STR2 = kupdate(STR,POS,SUB_LEN+1);                              /* solution 2 */
   put STR1= / STR2=;

The code will produce two correct identical values of this quote in the SAS log (notice, that the second instance of word “sometimes” is gone):

STR1=I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.
STR2=I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.

Code highlights

Solution 1

This is the most traditional solution that cuts out two pieces of the string – before and after the substring being deleted – and then concatenates them together thus removing that substring:

  • substr(STR,1,POS-1) extracts the first part of the source string STR before the substring to be deleted: from position 1 to position POS-1.
  • substr(STR,POS+SUB_LEN) extracts the second part of the source string STR after the substring to be deleted: from position POS+SUB_LEN till the end of STR value (since the third argument, length, is not specified).
  • Solution 2

    Finding n-th instance of a substring within a string .

Deleting selected instance of a substring from a SAS macro variable

Here is a code example of how to solve the same problem as it relates to SAS macro variables. For brevity, we provide just one solution using %sysfunc and KUPDATE() function:

%let STR = I believe in intuitions and inspirations. I sometimes feel that I am right. I sometimes do not know that I am.;
%let SUB = sometimes;
%let POS = %sysfunc(find(&STR,&SUB,-%length(&STR)));
%let STR2 = %sysfunc(kupdate(&STR,&POS,%eval(%length(&SUB)+1)));
%put "&STR2";

This should produce the following corrected Einstein’s quote in the SAS log:

"I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am."

Additional Resources for SAS character strings processing

Your thoughts?

Have you found this blog post useful? Please share your thoughts and feedback in the comments section below.

Deleting a substring from a SAS string was published on SAS Users.