Data Management

5月 172022
 

Here at SAS, we understand the importance of having access to cutting-edge professional resources. That’s why, for more than 40 years, we’ve provided individuals in programming, data management and analytics fields with low-cost and no-cost materials that promote success in their educational and professional journeys. And today, as the demand for employees with advanced skill sets and global certifications grows, we get it – having the ability to easily access the tools you need to succeed is more important than ever.

We’ve got you covered.

As part of our ongoing commitment to helping individuals enhance their skills, further their careers and increase their chance of success in the field, we’re now offering SAS Certification Practice Exams for free. Yes, free.

Over the years, candidates who have taken advantage of our practice exams have found them to be a valuable, effective tool for gauging their preparedness for SAS Certification Exams. When combined with other SAS training resources – like webinars, content guides, training courses and web tutorials – these free exams greatly increase candidates’ chances of success. Exams are currently available in:

    • Programming
    • Advanced Analytics
    • Data Management
    • Visual Analytics
    • Administration

Representative of the live exams, our online practice exams go through the same rigorous development process and are designed to give candidates an idea of what they should expect in the actual test questions. SAS practice exams also provide the rationale behind correct and incorrect answers, giving participants even more insight and opportunity for exam success.

And the numbers speak for themselves.*

    • Those who passed the practice exam had a 17.5% higher pass rate on the live exam than those who did not take or pass the practice version.
    • Those who took the practice exam – regardless of score – had an 8.15% higher pass rate on the live exam than those who opted not to take it.

Not to mention taking a practice test just might make the difference between passing the SAS Certification Exam on the first try or having to retake it. We’ve found that many who don’t pass the Certification Exam miss the mark by only a few questions, which they could have avoided with a bit more preparation.

Keep in mind, in order to make the best use of this resource, we recommend taking a practice exam as your final method of study in order to test your preparedness before diving into the actual exam. If you are curious about the types of questions typically on the certification exams, we have sample questions available for you to review.

So, what are you waiting for? A free resource to prepare you for a globally-recognized certification and make your resume stand out from the rest? That’s a no-brainer.

For more information about free SAS Certification Practice Exams, visit: https://www.sas.com/en_us/certification/resources/sas-practice-exams.html

*Based on data from 20,000 practice exams taken since 2020.

Access for success: SAS Certification practice exams now offered for free was published on SAS Users.

2月 112022
 

Ever had to work with data that had crazy column names, custom formats you didn’t have access to, or text columns sized way too big for the actual data they contained? Annoying, isn’t it? Well, you’re not alone and, as it turns out, it’s not that hard to fix. Let’s take a peek at renaming, reformatting, relabeling and resizing columns in your data using PROC SQL and PROC DATASETS. PROC SQL could handle most of these tasks, but you can’t rename a column in SQL without creating a new copy of the table. PROC DATASETS can do it all, but the syntax can be trickier. When we resize a column, we're going to have to re-write the data anyway, so I'll use PROC SQL for resizing, and PROC DATASETS for everything else. For example, we have this existing table:

Year Make Model MSRP Invoice Engine Size (L)
2004 Acura MDX $36,945 $33,337 3.5
2004 Audi A4 1.8T 4dr $25,940 $23,508 1.8
2004 Buick Rainier $37,895 $34,357 4.2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP num 8 DOLLAR8.
Invoice num 8 DOLLAR8.
EngineSize Engine Size (L) num 8

 
I want to modify the table to use European formats for the numeric values, to rename the EngineSize column as Size leaving its label unchanged, and to add descriptive labels for MSRP and Invoice.
 
The general syntax for this in PROC DATASETS is:
 
PROC DATASETS LIBRARY=;
MODIFY sas-dataset <(options)>;
FORMAT variable-1 <format-1> <variable-2 <format-2> ...>;
LABEL variable-1=<'label-1' | ' '> <variable-2=<'label-2' | ' '> ...>;
RENAME variable-1=new-name-1 <variable-2=new-name-2 ...>;

After running this code:

proc datasets library=work nolist nodetails;
   modify cars;
      format MSRP eurox8. Invoice eurox8. EngineSize commax5.1;
      label MSRP="Sticker Price" Invoice="Wholesale Price" ;
      rename EngineSize=Size;
run; quit;

 
the data now looks like this:
 

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
In the report, the longest value in the Make column is 5 characters long but the table metadata shows a column width of 13 characters. I can shorten the Make column to 5 characters without truncating the values, making my table size smaller on disk and in memory.

PROC SQL with an ALTER TABLE statement and a MODIFY clause will change a table’s descriptor portion (metadata) for one or more columns. Changing the length of a column will automatically rewrite the data portion of the table, too. The syntax looks like this:

ALTER TABLE table-name MODIFY column-definition-1 <, column-definition-2, ..., column-definition-n>;

After running this code:

proc sql;
alter table work.cars
   modify Make char(5)
;
quit;

The data is still correct:

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 
and the column size is now smaller:

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 5
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
Now, unless you want to write this type of code by hand every time you need it, you’ll want macros to do the work for you. And macros we’ve got! Check out my SAS tutorial on YouTube titled “Resize, Rename and Reformat Data with SAS Macro” for more details on how this works. The video is also embedded below. Links below the embedded video lead to the data and programs used in the video and to my personal macro trove on GitHub, including a macro that quickly strips labels, formats and informats from a data set.

Until next time, may the SAS be with you!
 
Mark

 

DOWNLOAD NOW | VIDEO DATA PACKAGE ON GITHUB
GET MACROS NOW | MY MACRO STASH ON GITHUB

Jedi SAS tricks: resizing, renaming and reformatting your data was published on SAS Users.

2月 112022
 

Ever had to work with data that had crazy column names, custom formats you didn’t have access to, or text columns sized way too big for the actual data they contained? Annoying, isn’t it? Well, you’re not alone and, as it turns out, it’s not that hard to fix. Let’s take a peek at renaming, reformatting, relabeling and resizing columns in your data using PROC SQL and PROC DATASETS. PROC SQL could handle most of these tasks, but you can’t rename a column in SQL without creating a new copy of the table. PROC DATASETS can do it all, but the syntax can be trickier. When we resize a column, we're going to have to re-write the data anyway, so I'll use PROC SQL for resizing, and PROC DATASETS for everything else. For example, we have this existing table:

Year Make Model MSRP Invoice Engine Size (L)
2004 Acura MDX $36,945 $33,337 3.5
2004 Audi A4 1.8T 4dr $25,940 $23,508 1.8
2004 Buick Rainier $37,895 $34,357 4.2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP num 8 DOLLAR8.
Invoice num 8 DOLLAR8.
EngineSize Engine Size (L) num 8

 
I want to modify the table to use European formats for the numeric values, to rename the EngineSize column as Size leaving its label unchanged, and to add descriptive labels for MSRP and Invoice.
 
The general syntax for this in PROC DATASETS is:
 
PROC DATASETS LIBRARY=;
MODIFY sas-dataset <(options)>;
FORMAT variable-1 <format-1> <variable-2 <format-2> ...>;
LABEL variable-1=<'label-1' | ' '> <variable-2=<'label-2' | ' '> ...>;
RENAME variable-1=new-name-1 <variable-2=new-name-2 ...>;

After running this code:

proc datasets library=work nolist nodetails;
   modify cars;
      format MSRP eurox8. Invoice eurox8. EngineSize commax5.1;
      label MSRP="Sticker Price" Invoice="Wholesale Price" ;
      rename EngineSize=Size;
run; quit;

 
the data now looks like this:
 

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
In the report, the longest value in the Make column is 5 characters long but the table metadata shows a column width of 13 characters. I can shorten the Make column to 5 characters without truncating the values, making my table size smaller on disk and in memory.

PROC SQL with an ALTER TABLE statement and a MODIFY clause will change a table’s descriptor portion (metadata) for one or more columns. Changing the length of a column will automatically rewrite the data portion of the table, too. The syntax looks like this:

ALTER TABLE table-name MODIFY column-definition-1 <, column-definition-2, ..., column-definition-n>;

After running this code:

proc sql;
alter table work.cars
   modify Make char(5)
;
quit;

The data is still correct:

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 
and the column size is now smaller:

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 5
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
Now, unless you want to write this type of code by hand every time you need it, you’ll want macros to do the work for you. And macros we’ve got! Check out my SAS tutorial on YouTube titled “Resize, Rename and Reformat Data with SAS Macro” for more details on how this works. The video is also embedded below. Links below the embedded video lead to the data and programs used in the video and to my personal macro trove on GitHub, including a macro that quickly strips labels, formats and informats from a data set.

Until next time, may the SAS be with you!
 
Mark

 

DOWNLOAD NOW | VIDEO DATA PACKAGE ON GITHUB
GET MACROS NOW | MY MACRO STASH ON GITHUB

Jedi SAS tricks: resizing, renaming and reformatting your data was published on SAS Users.

6月 152021
 

Dealing with big dataIn this fast-paced data age, when the sheer volume of data (generated, collected, and waiting to be processed and analyzed) grows at a breathtaking rate, the speed of data processing becomes critically important. In many cases, if data is not processed within an allotted time frame, we lose all its value as it becomes obsolete and ultimately irrelevant. That is why computing power becomes of the essence.

However, computing power itself does not guarantee timely processing. How we use that power makes all the difference. Way too often good old sequential processing just does not cut it anymore and different computing methods are required. One  such method is parallel processing.

In my previous post Using shell scripts for massively parallel processing I demonstrated a script-centered technique of running in parallel multiple independent SAS processes in SAS environments lacking SAS/CONNECT.

In this post, we will take a shot at a slightly different task and solution. Instead of having several totally independent processes, now we have some common “pre-processing” part, then we run several independent processes in parallel, and then we combine the results of parallel processing in the “post-processing” portion of our program.

Problem: monthly data ingestion use case

For simplification, we are going to use a scenario similar to one in the previous blog post:

Each month, shortly after the end of the previous month we needed to ingest a number of CSV files pertinent to transactions during the previous month and produce daily SAS data tables for each day of the previous month. Only now, we will go a step further: combining all those daily tables into a monthly table.

Solution: combining sequential and parallel processing

The solution is comprised of the three major components:

  • Shell script running the main SAS program.
  • Main SAS program, consisting of three parts: pre-parallel processing, parallel processing, and post-parallel processing.
  • Single thread SAS program responsible for a single day data ingestion.

1. Shell script running main SAS program

Below shell script mainprog.sh runs the main SAS program mainprog.sas:

#!/bin/sh
 
# HOW TO CALL:
# nohup sh /path/mainprog.sh YYYYMM &
 
now=$(date +%Y.%m.%d_%H.%M.%S)
 
# getting YYYYMM as a parameter in script call
ym=$1
 
pgmname=/path/mainprog.sas
logname=/path/saslogs/mainprog_$now.log
sas $pgmname -log $logname -set inDate $ym -set logname $logname

The script is run a background mode as indicated by the ampersand at the end of its invocation command:

nohup sh /path/mainprog.sh YYYYMM &

We pass a parameter YYYYMM (e.g. 202106) indicating year and month of our request.

When we call SAS program mainprog.sas within the script we indicate the name of the SAS log file to be created (-log $logname) and also pass on inDate parameter (-set inDate $ym, which has the same value YYYYMM as parameter specified in the script calling command), and logname parameter (-set logname $logname). As you will see further, we are going to use these two parameters within mainprog.sas program.

2. Main SAS program

Here is an abridged version of the mainprog.sas program:

/* ======= pre-processing ======= */
 
/* parameters passed from shell script */
%let inDate = %sysget(inDate);
%let logname = %sysget(logname);
 
/* year and month */
%let yyyy = %substr(inDate,1,4);
%let mm = %substr(inDate,5,2);
 
/* output data library */
libname SASDL '/data/target';
 
/* number of days in month mm of year yyyy */
%let days = %sysfunc(day(%sysfunc(mdy(&mm+1,1,&yyyy))-1));
 
/* ======= parallel processing ======= */
%macro loop;
   %local threadprog looplogdir logdt workpath tasklist i z threadlog cmd;
   %let threadprog = /path/thread.sas;
   %let looplogdir = %substr(&logname,1,%length(&logname)-4)_logs;
   x "mkdir &looplogdir"; *<- directory for loop logs;
   %let logdt = %substr(&logname,%length(&logname)-22,19);
   %let workpath = %sysfunc(pathname(WORK));
   %let tasklist=;
   %do i=1 %to &days;
      %let z = %sysfunc(putn(&i,z2.));
      %let threadlog = &looplogdir/thread_&z._&logdt..log;
      %let tasklist = &tasklist DAY&i;
      %let cmd = sas &threadprog -log &threadlog -set i &i -set workpath &workpath -set inDate &inDate;
      systask command "&cmd" taskname=DAY&i;
   %end;
 
   waitfor _all_ &tasklist;
 
%mend loop;
%loop
 
/* ======= post-processing ======= */
 
/* combine daily tables into one monthly table */
data SASDL.TARGET_&inDate;
   set WORK.TARGET_&inDate._1 - WORK.TARGET_&inDate._&days;
run;

The key highlights of this program are:

  • We capture values of the parameters passed to the program (inDate and logname).
  • Based on these parameters, assign source directory and target data library SASDL.
  • Calculate number of days in a specific month defined by year and month.
  • Create a directory to hold SAS logs of all parallel threads; the directory name is matching the log name of the mainprog.sas.
  • Capture the WORK library location of the main SAS session running mainprog.sas as:

    %let workpath = %sysfunc(pathname(WORK));We use that location in the thread sessions to pass back to the main session data produced by the thread sessions.

  • Macro %do-loop generates a series of SYSTASK statements to spawn additional SAS sessions in the background mode, each ingesting data for a single day of a month:

    systask command "&cmd" taskname=DAY&i;The SYSTASK statement enables you to execute host-specific commands from within your SAS session or application. Unlike the X statement, the SYSTASK statement runs these commands as asynchronous tasks, which means that these tasks execute independently of all other tasks that are currently running. Asynchronous tasks run in the background, so you can perform additional tasks (including launching other asynchronous tasks) while the asynchronous task is still running.

    Restriction: SYSTASK statement is not supported on the CAS server.

  • Also, we generate a cumulative list of all tasknames assigned to each thread sessions:

    %let tasklist = &tasklist DAY&i;

  • Outside the macro %do-loop we use WAITFOR statement which suspends execution of the main SAS session until the specified tasks finish executing. Since we created a list of all daily thread sessions (&tasklist), this will synchronize all our parallel threads and continue mainprog.sas session only when all threads finished executing.
  • At the end of the main SAS session we concatenate all our daily data tables that have been created by parallel threads in the location of the WORK library of the main SAS session.

Using SAS macro loop to generate a series of SYSTASK statements for parallel processing is not the only method available. Alternatively, you can achieve this within a data step using CALL EXECUTE. In this case, each data step iteration will generate a single global SYSTASK statement and push it out of the data step boundaries where they will be sequentially executed (just like in the case of macro implementation). Since option NOWAIT is the default for SYSTASK statements, despite all of them being launched sequentially, their corresponding OS commands will be still running in parallel.

3. Single thread SAS program

Here is an abridged version of the thread.sas program:

/* inDate parameter */
%let inDate = %sysget(inDate);
 
/* parent program's WORK library */
%let workpath = %sysget(workpath);
libname MAINWORK "&workpath";
 
/* thread number */
%let i = %sysget(i);
 
/* year and month */
%let yyyy = %substr(inDate,1,4);
%let mm = %substr(inDate,5,2);
 
/* source data directory */
%let srcdir = /datapath/&yyyy/&mm;
 
/* create varlist macro variable to list all input variable names */
proc sql noprint;
   select name into :varlist separated by ' ' from SASHELP.VCOLUMN
   where libname='PARMSDL' and memname='DATA_TEMPLATE';
quit;
 
/* create fileref inf for the source file */
filename inf "&srcdir/source_data_&inDate._day&i..cvs";
 
/* create daily output data set */
data MAINWORK.TARGET_&inDate._&i; 
   if 0 then set PARMSDL.DATA_TEMPLATE;
   infile inf missover dsd encoding='UTF-8' firstobs=2 obs=max;
   input &varlist;
run;

This program ingests a single .csv file corresponding to the &i-th day of &inDate (year and month) and creates a SAS data table MAINWORK.TARGET_&inDate._&i. To be available in the main SAS session the MAINWORK library is defined here in the same physical location as the WORK library of the main parental SAS session.

We also use a pre-created SAS data template PARMSDL.DATA_TEMPLATE - a zero-observations data set that contains descriptions of all the variables and their attributes.

Additional resources

Thoughts? Comments?

Do you find this post useful? Do you have processes that may benefit from parallelization? Please share with us below.

Using SYSTASK and SAS macro loops for massively parallel processing was published on SAS Users.

5月 112021
 

It’s safe to say that SAS Global Forum is a conference designed for users, by users. As your conference chair, I am excited by this year’s top-notch user sessions. More than 150 sessions are available, many by SAS users just like you. Wherever you work or whatever you do, you’ll find sessions relevant to your industry or job role. New to SAS? Been using SAS forever and want to learn something new? Managing SAS users? We have you covered. Search for sessions by industry or topic, then add those sessions to your agenda and personal calendar.

Creating a customizable agenda and experience

Besides two full days of amazing sessions, networking opportunities and more, many user sessions will be available on the SAS Users YouTube channel on May 20, 2021 at 10:00am ET. After you register, build your agenda and attend the sessions that most interest you when the conference begins. Once you’ve viewed a session, you can chat with the presenter. Don’t know where to start? Sample agendas are available in the Help Desk.

For the first time, proceedings will live on SAS Support Communities. Presenters have been busy adding their papers to the community. Everything is there, including full paper content, video presentations, and code on GitHub. It all premiers on “Day 3” of the conference, May 20. Have a question about the paper or code? You’ll be able to post a question on the community and ask the presenter.

Want training or help with your code?

Code Doctors are back this year. Check out the agenda for the specific times they’re available and make your appointment, so you’ll be sure to catch them and get their diagnosis of code errors. If you’re looking for training, you’ll be quite happy. Training is also back this year and it’s free! SAS instructor-led demos will be available on May 20, along with the user presentations on the SAS Users YouTube channel.

Chat with attendees and SAS

It is hard to replicate the buzz of a live conference, but we’ve tried our best to make you feel like you’re walking the conference floor. And we know networking is always an important component to any conference. We’ve made it possible for you to network with colleagues and SAS employees. Simply make your profile visible (by clicking on your photo) to connect with others, and you can schedule a meeting right from the attendee page. That’s almost easier than tracking down someone during the in-person event.

We know the exhibit hall is also a big draw for many attendees. This year’s Innovation Hub (formerly known as The Quad) has industry-focused booths and technology booths, where you can interact in real-time with SAS experts. There will also be a SAS Lounge where you can learn more about various SAS services and platforms such as SAS Support Communities and SAS Analytics Explorers.

Get started now

I’ve highlighted a lot in this blog post, but I encourage you to view this 7-minute Innovation Hub video. It goes in depth on the Hub and all its features.

This year there is no reason not to register for SAS Global Forum…and attend as few or as many sessions as you want. Why? Because the conference is FREE!

Where else can you get such quality SAS content and learning opportunities? Nowhere, which is why I encourage you to register today. See you soon!

SAS Global Forum: Your experience, your way was published on SAS Users.

4月 202021
 

I can’t believe it’s true, but SAS Global Forum is just over a month away. I have some exciting news to share with you, so let’s start with the theme for this year:

New Day. New Answers. Inspired by Curiosity.

What a fitting theme for this year! Technology continues to evolve, so each new day is a chance to seek new answers to what can sometimes feel like impossible challenges. Our curiosity as humans drives us to seek out better ways to do things. And I hope your curiosity will drive you to register for this year’s SAS Global Forum.

We are excited to offer a global event across three regions. If you’re in the Americas, the conference is May 18-20. In Asia Pacific? Then we’ll see you May 19-20. And we didn’t forget about Europe. Your dates are May 25-26. We hope these region-specific dates and the virtual nature of the conference means more SAS users than ever will join us for an inspiring event. Curious about the exciting agenda? It’s all on the website, so check it out.

Keynotes speakers that you’ll talk about for months to come

Want to be inspired to chase your “impossible” dreams? Or hear more about the future of AI? How about learning about work-life balance and your mental health? We have you covered. SAS executives are gearing up to host an exciting lineup of extremely smart, engaging and thought-provoking keynote speakers like Adam Grant, Ayesha Khanna and Hakeem Oluseyi.

And who knows, we might have a few more surprises up our sleeve. You’ll just have to register and attend to find out.

Papers and proceedings: simplified and easy to find

Have you joined the SAS Global Forum online community? You should, because that’s where you’ll find all the discussion around the conference…before, during and after. It’s also where you’ll find a link to the 2021 proceedings, when they become available. Authors are busy preparing their presentations now and they are hard at work staging their proceedings in the community. Join the community so you can connect with other attendees and know when the proceedings become available.

Stay tuned for even more details

SAS Global Forum is the place where creativity meets curiosity, and amazing analytics happens! I encourage you to regularly check the conference website, as we’re continually adding new sessions and events. You don’t want to miss this year’s conference, so don’t forget to register for SAS Global Forum. See you soon!

Registration is open for a truly inspiring SAS Global Forum 2021 was published on SAS Users.

2月 252021
 

The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.

Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!

Go ahead and save the date(s)…you don’t want to miss this event!

Content, content, content

We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.

Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).

Networking and entertainment

You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.

A big thank you

We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!

Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.

What makes SAS Global Forum great? was published on SAS Users.

12月 172020
 

There’s nothing worse than being in the middle of a task and getting stuck. Being able to find quick tips and tricks to help you solve the task at hand, or simply entertain your curiosity, is key to maintaining your efficiency and building everyday skills. But how do you get quick information that’s ALSO engaging? By adding some personality to traditionally routine tutorials, you can learn and may even have fun at the same time. Cue the SAS Users YouTube channel.

With more than 50 videos that show personality published to-date and over 10,000 hours watched, there’s no shortage of learning going on. Our team of experts love to share their knowledge and passion (with personal flavor!) to give you solutions to those everyday tasks.

What better way to round out the year than provide a roundup of our most popular videos from 2020? Check out these crowd favorites:

Most viewed

  1. How to convert character to numeric in SAS
  2. How to import data from Excel to SAS
  3. How to export SAS data to Excel

Most hours watched

  1. How to import data from Excel to SAS
  2. How to convert character to numeric in SAS
  3. Simple Linear Regression in SAS
  4. How to export SAS data to Excel
  5. How to Create Macro Variables and Use Macro Functions
  6. The SAS Exam Experience | See a Performance-Based Question in Action
  7. How it Import CSV files into SAS
  8. SAS Certification Exam: 4 tips for success
  9. SAS Date Functions FAQs
  10. Merging Data Sets in SAS Using SQL

Latest hits

  1. Combining Data in SAS: DATA Step vs SQL
  2. How to Concatenate Values in SAS
  3. How to Market to Customers Based on Online Behavior
  4. How to Plan an Optimal Tour of London Using Network Optimization
  5. Multiple Linear Regression in SAS
  6. How to Build Customized Object Detection Models

Looking forward to 2021

We’ve got you covered! SAS will continue to publish videos throughout 2021. Subscribe now to the SAS Users YouTube channel, so you can be notified when we’re publishing new videos. Be on the lookout for some of the following topics:

  • Transforming variables in SAS
  • Tips for working with SAS Technical Support
  • How to use Git with SAS

2020 roundup: SAS Users YouTube channel how to tutorials was published on SAS Users.

12月 072020
 

In my previous blog post, I talked about using PROC CAS to accomplish various data preparation tasks. Since then, my colleague Todd Braswell and I worked through some interesting challenges implementing an Extract, Transform, Load (ETL) process that continuously updates data in CAS. (Todd is really the brains behind getting this to work – I am just along for the ride.)

In a nutshell, the process goes like this:

  1. PDF documents drop into a "receiving" directory on the server. The documents have a unique SubmissionID. Some documents are very large – thousands of pages long.
  2. A Python job runs and converts PDFs into plain text. Python calls an API that performs Optical Character Recognition (OCR) and saves off the output as a CSV file, one row per page, in the PDF document.
  3. A SAS program, running in batch, loads the CSV file with new records into a CAS table. SubmissionID is passed to the batch program as a macro variable, which is used as part of the CAS table name.
  4. Records loaded from the CSV file are appended to the Main table. If records with the current SubmissionID already exist in the Main table, they are deleted and replaced with new records.
    The Main table is queried by downstream processes, which extract subsets of data, apply model score code, and generate results for the customer.

Continuously update data process flow

Due to the volume of data and the amount of time it takes to OCR large PDFs, the ETL process runs in multiple sessions simultaneously. And here is a key requirement: the Main table is always available, in a promoted state, with all up-to-date records, in order for the model score code to pick up the needed records.

What does "promoted state" mean?

The concept of table scope, which was introduced with the first release of SAS Viya, presents a challenge. CAS tables are in-memory tables that can have one of two "scopes":

  • Session scope – the table exists within the scope of your current CAS session and drops from memory as soon as the session ends. Functionally, this is somewhat similar to the data you write to the WORK library in SAS 9 – once you disconnect, the data drops from the WORK library.
  • Global scope – the table is available to all sessions. If your session ends, you will still have access to it when you start a new session. Your colleagues also maintain access, assuming they have the necessary permissions. For the table to assume global scope, it needs to be promoted.

Common promotion techniques for a table are the DATA STEP, PROC CAS, or PROC CASUTIL. For example:

/*promote a table using DATA STEP*/
*this snippet copies a SAS 9 table into CAS and promotes in one step;
data mycaslib.my_table (promote=yes);
     set mylib.my_table;
     run;
 
/*promote using PROC CASUTIL*/
*this snippet promotes a session-scope CAS table to global scope;
proc casutil incaslib='mycaslib' outcaslib='mycaslib';
     promote casdata='my_table';
     quit;
 
/*Promote using PROC CAS*/
*same as above, this snippet promotes a session-scope table to global scope;
proc cas;
table.promote / 
     caslib='mycaslib'
     targetcaslib='mycaslib' 
     name='my_table' 
     target='my_table';
    quit;

Fun Facts About Table Promotion

You cannot promote a table that has already been promoted. If you need to promote a new version of the same table, you need to first drop the existing table and promote the new version.

To discover the current table state, use the

proc cas;
     table.tableinfo / 
     caslib='mycaslib' 
     name='main';
     quit;

If you append rows to a promoted table using the DATA STEP append option, the new rows are automatically promoted. For example, in this snippet the mycaslib.main table, which is promoted, remains promoted when the rows from mycaslib.new_rows are appended to it:

data mycaslib.main(append=yes);
     set mycaslib.new_rows;
     run;

When you manipulate a promoted table using the DATA STEP apart from appending rows, it creates a new, session-scope version of the same table. You will have two versions of the table: the global-scope table, which remains unchanged, and the session-scope version which has the changes you implemented. Even if you don't change anything in the table and simply run:

data mycaslib.my_table;
     set mycaslib.my_table;
     run;

in which mycaslib.my_table is promoted, you end up with a promoted and an unpromoted version of this table in the mycaslib library – a somewhat unexpected and hardly desired result. Appendix 1 walks through a quick exercise you can try to verify this.

As you probably guessed, this is where we ran into trouble with our ETL process: the key requirement was for the Main table to remain promoted, yet we needed to continuously update it. The task was simple if we just needed to append the rows; however, we also needed to replace the rows if they already existed. If we tried to delete the existing rows using the DATA STEP, we would have to deal with the changes applied to a session-scope copy of the global-scope table.

Initially, we designed the flow to save off the session-scope table with changes, then drop the (original) global-scope version, and finally reload the up-to-date version. This was an acceptable workaround, but errors started to pop up when a session looked for the Main table to score data, while a different concurrent session reloaded the most up-to-date data. We were uncertain how this would scale as our data grew.

PROC CAS to the rescue!

After much research, we learned the deleteRows, which allows you to delete rows directly from a global-scope table. The data is never dropped to session-scope – exactly what we needed. Here's an example:

proc cas;
     table.deleteRows /
     table={caslib="mycaslib", name="Main", where="SubmissionID = 12345"};
     quit;

In case you are wondering, the Tables action set also has an

/*1. Load new rows. SubmissionID macro variable is a parameter passed to the batch program*/
/*New rows are written to the casuser library, but it does not really matter which caslib you choose – 
   we are not persisting them across sessions*/
proc casutil;
       load file="/path_to_new_rows/New_rows_&SubmissionID..csv" outcaslib="casuser" casout="new_rows";
       quit;
/*2. Delete rows with the current SubmissionID */
proc cas;
       table.deleteRows /
       table={caslib="prod", name="Main", where="SubmissionID = &SubmissionID."};
       quit;
/*3. Append new rows*/
data mycaslib.main(append=yes);
	set mycaslib.new_rows;
	run;
/*4. Save the main table to ensure we have a disk backup of in-memory data*/
proc casutil incaslib="prod" outcaslib="prod";
	save casdata="main" replace;
	quit;

Conclusion

We learned how to continuously update data in CAS while ensuring the data remains available to all sessions accessing it asynchronously. We learned the append option in DATA STEP automatically promotes new rows but manipulating the data in a global-scope table through DATA STEP in other ways leads to data being copied to session-scope. Finally, we learned that to ensure the table remains promoted while it is updated, we can fall back on PROC CAS.

Together, these techniques enabled implementation of a robust data flow that overcomes concurrency problems due to multiple processes updating and querying the data.

Acknowledgement

We thank Brian Kinnebrew for his generous help in investigating this topic and the technical review.

Appendix 1

Try the following exercise to verify that manipulating a promoted table in DATA STEP leads to two copies of the table – session- AND global-scope.

/*copy and promote a sample SAS 9 table*/
data casuser.cars(promote=yes);
     set sashelp.cars;
     run;
/*check the number of rows and confirm that the table is promoted*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit; /*The table is promoted and has 428 rows*/
 
/*delete some rows in the promoted table*/
data casuser.cars;
     set casuser.cars;
     if make='Acura' then delete;
     run;
/*check again – how may rows does the table have? Is it promoted?*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit;  /*The table is has 421 rows but it is no longer promoted*/
 
/*reset your CAS session*/
/*kill your current CAS session */
cas _all_ terminate;
/*start a new CAS session and assign caslibs*/
cas; 
caslib _all_ assign;
 
/*check again – how may rows does the table have? Is it promoted?*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit;  /*The table is promoted and has 428 rows*/

What we see here is, manipulating a global-scope table in DATA STEP leads to duplication of data. CAS copies the table to session-scope and applies the changes there. The changes go away if you terminate the session. One way to get around this issue is instead of trying to overwrite the promoted table, create a new table, then drop the old table and promote the new table under the old table's name. Otherwise, use table.Update actions to update/delete rows in place, as described in this post.

Append and Replace Records in a CAS Table was published on SAS Users.

11月 202020
 

If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.

Everything is virtual these days, so what do we mean by virtual?

Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.

Call for content is open

One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.

The Student Symposium is back

Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.

Learn more

I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.

Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.