SAS Studio

3月 022016
 

parallelI recently received a call from a colleague that is using parallel processing in a grid environment; he lamented that SAS Enterprise Guide did not show in the work library any of the tables that were successfully created in his project.

The issue was very clear in my mind, but I was not able to find any simple description or picture to show him: so why not put it all down in a blog post so everyone can benefit?

Parallel processing can speed up your projects by an incredible factor, especially when programs consist of subtasks that are independent units of work and can be distributed across a grid and executed in parallel. But when these parallel execution environments are not kept in sync, it can also introduce unforeseen problems.

This specific issue of “disappearing” temporary tables can happen using different client interfaces, because it does not depend upon using a certain software, but rather on the business logic that is implemented. Let’s look at two practical examples.

SAS Studio

We want to run an analysis – here a simple proc print – on two independent subsets of the same table. We decide to use two parallel grid sessions to partition the data and then we run the analysis in the parent session.  The code we submit in SAS Studio could be similar to the following:

%let rc = %sysfunc( grdsvc_enable(_all_, server= SASApp));
signon grid1;
signon grid2;
proc datasets library=work noprint;
delete sedan SUV;
run;
rsubmit grid1 wait=no ;
data sedan;
set sashelp.cars;
where Type=”Sedan”;
run;
endrsubmit;
rsubmit grid2 wait=no ;
data SUV;
set sashelp.cars;
where Type=”SUV”;
run;
endrsubmit;
waitfor _ALL_ grid1 grid2;
proc print data=sedan;
run;
proc print data=SUV;
run;

After submitting the code by pressing F3 or clicking Run, we do not get the expected RESULT window and the LOG window shows some errors:

ThePitfallsofParallelJobs

SAS Enterprise Guide

Suppose we have a project similar to the following.

ThePitfallsofParallelJobs2

Many items can be created independently of others. The orange arrows illustrate the potential tasks which can be executed in parallel.

Let’s try to run these project tasks in parallel. Open File, Project Properties. Select Code Submission and flag “Allow parallel execution on the same server.”

ThePitfallsofParallelJobs3

This property enables SAS Enterprise Guide to create one or more additional workspace server connections so that parallel process flow paths can be run in parallel.

Note: Despite the description, when used in a grid environment the additional workspace server sessions do not always execute on the same server. The grid master server decides where these sessions start.

After we select Run, Run Process Flow to submit the code, SAS Enterprise Guide submits the tasks in parallel respecting the required dependencies.  Unfortunately, some tasks fail and red X’s appear over the top right corner of the task icons. In the log summary we may find the following:

ERROR: File WORK.QUERY_FOR_PRODUCTS.DATA does not exist.

What's going on?

The WORK library is the temporary library that is automatically defined by SAS at the beginning of each SAS session or job. The WORK  library stores temporary SAS files that are written by a data step or a procedure and then read as input of subsequent steps. After enabling the parallel execution on the grid, tasks run in multiple SAS sessions, and each grid session has its own dedicated WORK library that is not shared with any other grid session, or with the parent work session which started it.

In the SAS Studio example, the data steps outputs their results – the SEDAN and SUV tables – in the WORK library of a SAS session, then the PROC PRINT tries to read those tables from the WORK library of a different SAS session. Obviously the tables are not there, and the task fails.

ThePitfallsofParallelJobs4

This is a quite common issue when dealing with multiple sessions – even without a grid. One simple solution is to avoid using the WORK library and any other non-shared resources. It is possible to assign a common library in many ways, such as in autoexec files or in metadata.
The issue is solved:

ThePitfallsofParallelJobs5

No shared resources, am I safe?

Well, maybe not. Coming back to the original issue presented at the opening of this post, sometimes we oversee what ‘shared’ means. I’ll show you with this very simple SAS Enterprise Guide project: it’s just a simple query that writes a result table in the WORK library. You can test it on your laptop, without any grid.

ThePitfallsofParallelJobs6

After running it, the FILTER_FOR_AIR table appears in the server’s pane:

ThePitfallsofParallelJobs7

Now, let’s say we have to prepare for a more complex project and we follow again the steps to “Allow parallel execution on the same server. “ Just to be safe, we resubmit the project to test what happens. All seems unchanged, so we save everything and close SAS Enterprise Guide. Say I forgot to ask you to write down something about the results. We reopen the project and knowing the result table in the WORK library was temporary, we rerun the project to recreate it.

This time something is wrong.

ThePitfallsofParallelJobs8

The Output Data pane shows an error, and the Servers pane does not list the FILTER_FOR_AIR table anymore.
Even if we rerun the project, the table will not reappear.

The reason lies, again, in the realm of shared v.s. local libraries.

As soon as we enable “Allow parallel execution on the same server,” SAS Enterprise Guide starts at least one additional SAS session to process the code, even if there is nothing to parallelize. Results are saved only there, but SAS Enterprise Guide always tries to read them from the original, parent session. So we are again in the trap of local WORK libraries.

ThePitfallsofParallelJobs9

Why didn’t we uncover the issue the first time we ran the project? If you run your code, at least once, without the “Allow parallel execution on the same server” option, the results are saved in the parent session. And, they remain there even after enabling parallelization. As a result, we actually have two copies of the FILTER_FOR_AIR table!
As soon as we close SAS Enterprise Guide, both tables are deleted. So, on the next run, there is nothing in the parent session WORK library to send to SAS Enterprise Guide!

The solution? Same as before – only use shared libraries.

Is this all?

As you might have guessed, the answer is no. Libraries are not the only objects that should be shared across sessions. Every local setting – be it the value of an option, a macro, a format – has to be shared across all parallel sessions. Not difficult, but we have to remember to do it!

 

 

 

 

 

 

 

tags: parallel processing, sas enterprise guide, SAS Grid Manager, SAS Professional Services, SAS Programmers, sas studio

Avoid the pitfalls of parallel jobs was published on SAS Users.

11月 142015
 

I brushed aside some sawdust on the workbench and set my laptop down. It wasn’t really mine. SAS Library Services had kindly lent me a new laptop for the “Making Sense of Sensor Data” workshop at UNC’s BEaM Makerspace. I had just set the laptop down…in sawdust. Like any normal […]

Sawdust and SAS Studio: Thoughts on a liberal arts education during an IoT workshop was published on SAS Voices.

9月 012015
 

Will the Internet of Things (IoT) create a web of connected devices that make our lives better or an infinite infestation of annoying devices invading our privacy for no good reason? I don't know. I do know that the answer is going to depend less on the technology and more […]

The Internet of Things at your local library was published on SAS Voices.

5月 282015
 

SAS software is used around the world in some of the most sophisticated ways, like ATM fraud detection and cancer research. But recently, I used it for a practical, and much needed, task -- replacing our break room coffee machine. Now, this is no ordinary coffee machine. It also makes […]

The post How analytics saved the break room coffee machine appeared first on The SAS Training Post.

5月 302014
 

I wish I had a nickle for every time I heard this question at SAS Global Forum:

"So, does this SAS Studio thing replace SAS Enterprise Guide?"

SAS Studio is a pretty big deal. It's groundbreaking in several ways:

  • It's a web-based programming interface to SAS. It runs in your browser, which means that end users don't have to install anything (when connecting to a remote SAS session).
  • It's an HTML5-based application, so there are no browser plugins needed. It runs on Windows, Macs, and even the iPad.
  • It's the basis for new offerings from SAS, most notably the SAS University Edition. This offering is free to just about any learner for non-commercial use. The SAS University Edition includes SAS, running in a virtual machine, packaged with SAS Studio as the user interface. Since its launch earlier this week, people have been downloading it like crazy.

You're going to be hearing a lot about SAS Studio. It was even the theme for this month's SAS Tech Report.
If you haven't seen SAS Studio, take a few minutes and watch my SAS Tech Talk interview with Shannon Smith, the SAS R&D testing manager for the product:

 

So what does this mean for those of us who have invested our skills and processes in SAS Enterprise Guide? If you read this blog regularly, you know that includes me! Does this "new app on the block" replace our beloved SAS Enterprise Guide? The answer is No -- and Yes.

No, SAS Studio isn't a direct SAS Enterprise Guide replacement. SAS Enterprise Guide continues to get new features, mostly targeting productivity enhancements and integration with other SAS offerings, such as SAS Visual Analytics. Many thousands of users around the world use SAS Enterprise Guide to manage process flows, reporting and analytics, database access, and custom processes. SAS Studio doesn't have all of that infrastructure (at least, not yet), and cannot step in to replace all of that.

But also, Yes: SAS Studio can replace some uses of SAS Enterprise Guide. If you use SAS Enterprise Guide simply as way to manage SAS programs in your SAS environment, then you can certainly use SAS Studio instead (or as well) to develop and maintain those programs. SAS Studio also includes some tasks for non-programmers, similar to those found in SAS Enterprise Guide -- but for now the library isn't as rich as what you'll find in SAS Enterprise Guide. And with the SAS University Edition, SAS Studio will represent the first SAS experience for the next generation of SAS programmers.

Sometimes SAS users ask me (usually in a hushed tone): Why does SAS create these different applications that seem to compete with each other? Is there some sort of contest in SAS R&D to see which teams can outdo the others? My answer: while these apps might have a certain amount of overlap, they really do serve different purposes and different audiences. Our goal is to enable SAS users -- regardless of discipline, industry, or expertise -- with the tools that are most fit for their particular purpose. One size does not fit all (though some diehard PC SAS fans might disagree with me).

Plus, here's another secret: the same developers have built all of these applications. The SAS Studio development team includes people who worked on SAS Display Manager (you know, "PC SAS") and SAS Enterprise Guide. This is a direct benefit of SAS being such a great workplace: nobody leaves. That means that the lessons learned from customers and developers are carried over and applied in each successive "app generation". If developers are competing, then they are mostly competing with the proven work they've done in the past. But since the teams always have new technology and techniques at their disposal, it's the end users who win.

tags: SAS Enterprise Guide, SAS Studio, SASAnalyticsU, Tech Talk