One of the hidden gems of SAS Studio is the ability to run process flows in parallel. This feature really shines when used in a grid environment. Let’s discuss this one step at a time.
First, what is a process flow? When working in the Visual Programmer perspective, you have access to process flows. A process flow is a graphical representation of a process, where each object, be it a SAS program, a SAS Studio task, a query, and so on, is represented by a node. Nodes are connected by links that instruct SAS Studio how to move from one node to the next.
Note: Click on the images to enlarge them.
On the Properties tab of the current process flow, you can set the execution mode of the nodes. With the default setting, SAS Studio runs the nodes in the order in which they were added to the process flow. If node 2 is dependent on node 1, node 1 must run completely before node 2 will run.
You can change the execution mode to Parallel as shown in Display 2. When this value is set, SAS Studio uses multiple workspace servers to run the nodes concurrently, always enforcing the correct dependencies.
When you use this feature in a SAS Grid environment, if the administrator has configured the workspace server sessions to be grid launched, you can achieve the benefits and the performance improvements of multi-machine parallel load balancing, without having to code any SAS/CONNECT statement. It’s a real point-and-click parallel execution engine!
Display 3 shows the process flow presented in Display 1 running in parallel execution mode. The pane is grayed out because it is not possible to interact with it until the execution is complete.
We can see that the List Data node is still running, while the Partition Data node has already finished. Thus, the two Filter Data nodes were able to start in parallel.
In this scenario, we would guess three workspace server sessions are concurrently running our code. However, if we monitor what is happening on the back-end hosts, we notice something unexpected. There are actually five workspace server sessions running. Why? As soon as you sign in to SAS Studio, it starts two SAS sessions. These are used only for the default execution mode. If a process flow is run in parallel mode, up to three additional SAS session are started, for a total of five. Once the process flow is finished, the three additional SAS processes terminate, if there is no further activity for 30 seconds, in order to release resources.
An administrator can use a configuration property, webdms.maxParallelWorkspaces, to specify the maximum number of workspaces that can be used when SAS is running in parallel mode. The default value is 3. The maximum value is 8.
I hope you enjoy running multiple tasks concurrently. If you have already started using parallel processing, you might want to check out my earlier blog, How to avoid the pitfalls of parallel jobs!