SAS Studio

9月 092016
 

If you’re doing data processing in the cloud or using container-enabled infrastructures to deploy your software, you’ll want to learn more about SAS Analytics for Containers. This new solution puts SAS into your existing container-enabled environment – think Docker or Kubernetes – giving data scientists and analysts the ability to perform sophisticated analyses using SAS, and all from the cloud.

The product’s coming out party is the Analytics Experience 2016 conference, September 12-14, 2016 at the Bellagio in Las Vegas. In advance of that event, I sat down with Product Manager Donna DeCapite to learn a little more about SAS Analytics for Containers and find out why it’s such a big deal for organizations who use containers for their applications.

Larry LaRusso: Before we get into details around the solution, and with apologies for my ignorance, let me start out with a really basic question. What are containers?

Donna DeCapite: Cloud containers are all the rage in the IT world. They’re an alternative to virtual machines.  They allow applications and any of its dependencies to be deployed and run in isolated space. Organizations will build and deploy in the container environment because it allows you to build only the necessary system libraries and functions to run a given piece of software. IT prefer it because it’s easy to replicate, and it’s faster and easier to deploy.

LL: And SAS Analytics for Containers will allow organizations to run SAS’ analytics in this environment, the containers?

DD: In short, yes. SAS Analytics for Containers provides a powerful set of data access, analysis and graphical tools to organizations within a container-based infrastructure like Docker. This takes advantage of the build once, run anywhere flexibility of the container environment, making it easier and faster to use SAS Analytics in the cloud.

LL: Who, within an organization, will be the primary users?

DC: Really anyone working with containers in the cloud or anyone working with DevOps. Data scientists will embrace SAS Analytics for Containers because it allows them to access data from nearly any source and easily perform sophisticated analyses using SAS Studio, our browser-based interface, or Jupyter Notebook, an open source notebook-style interface. For SAS developers, the product allows them to quickly provision IT resource to sandbox development ideas. And as I mentioned earlier, IT will appreciate the ease with which applications can be deployed, distributed and managed.

LL: You mentioned data scientists, so now I’m curious; we’re talking complex analyses here, yes?

DC: Definitely. Regressions, decision trees, Bayesian analysis, spatial point pattern analysis, missing data analysis, and many additional statistical analysis techniques can be performed with SAS Analytics for Containers. And in addition to sophisticated statistical and predictive analytics, there are a ton of prebuilt SAS procedures included to handle common tasks like data manipulation, information storage, and report writing, all available via SAS Studio and its assistive nature.

LL: What about organizations that have massive amounts of data? Can they use SAS Analytics for Containers as well?

DC: Yes, SAS Analytics for Containers allows you to take advantage of the processing power of your Hadoop Cluster by leveraging the SAS Accelerators for Hadoop like Code and Scoring Accelerator.

LL: Thank you so much for your time Donna. I know you’ve educated me quite a deal! How about more information. Where can individuals learn more about SAS Analytics for Containers?

DC: If you’re going to Analytics Experience 2016, consider coming to Michael Ames’ table talk, Cloud Computing: How Does It Work? It’s scheduled for Monday, September 12 from 4:30-5:00pm Vegas time. If you’re not going to the event, the SAS website is the best place for more information. Probably the best place to start is the SAS Analytics for Containers home page.

tags: analytics conference, analytics experience, cloud computing, SAS Analytics for Containers, sas studio

Analytics in the Cloud gets a whole lot easier with SAS Analytics for Containers was published on SAS Users.

8月 312016
 

update_to_SASStudio_ 3.5Update-in-place supports the ability to update a SAS Deployment within a major SAS release. Updates often provide new versions of SAS products. However, when using the SAS Deployment Wizard to perform an update-in-place you cannot selectively update a machine or product. As a general rule if you want to update one product in a SAS Deployment you have to update the whole deployment. With the latest version of SAS Studio, that’s not the case.  You can now update from version 3.4 to version 3.5 of SAS Studio without updating any other part of your SAS deployment.

SAS Studio 3.5 contains some interesting new functionality:

  • A new batch submit feature.
  • The ability to create global settings for all SAS Studio users at your site.
  • A new Messages window that displays information about the programs, tasks, queries, and process flows that you run.
  • A new table of contents in results.
  • New keyboard shortcuts to add and insert code snippets.
  • Many new tasks for statistical process control, multivariate analysis, econometric analysis, and power and sample size analysis. For more information, see SAS Studio Tasks.

For my purposes, I was really interested in using the batch submit feature. Using “Batch Submit” a user can run a saved SAS program in batch mode, which means that the program will run in the background while you continue to use SAS Studio. When you run a program in batch mode, you can view the status of programs that have been submitted, and you can cancel programs that are currently running.

So how does this “selective update” work? Somewhat unusual for a product update, it is available via a hot fix documented in the note 57898: Upgrade SAS® Studio 3.4 to SAS® Studio 3.5 without upgrading other products.

SAS Studio is available in three different deployment flavors: SAS Studio Mid-Tier (the enterprise edition), SAS Studio Basic, and SAS Studio Single-User. The hot fix is available for the enterprise and basic edition. In addition, in order to apply the hot fix the current deployment must be at SAS 9.4 M3. For SAS Studio Single-User, an MSI file has been added to the downloads section of support.sas.com to allow users to download SAS Studio 3.5 to run against their existing Windows desktop SAS for releases 9.4M1 and higher.

The hot fix is a container hot fix, meaning the hot fix delivers one or more “MEMBER” hot fixes in one downloadable unit. Container hot fixes have some special rules you must follow when applying them.

  • They must be applied separately to each machine. The installation process will apply only those MEMBER hot fixes which are applicable based on the SAS Deployment Registry for each specific machine.
  • They may contain MEMBER hot fixes for multiple operating systems. The SAS Deployment Manager will apply only those MEMBER hot fixes which are applicable for the operating system on each specific machine.
  • They often contain pre and/or post installation steps outlined in the instructions provided.

A review of the hot fix instructions shows that to complete the update for the SAS Studio Mid-Tier the web application must be rebuilt and redeployed.

To apply the container hot fix on my three tier deployment, which has a Windows metadata server, LINUX compute tier and LINUX middle tier, I downloaded the hot fix to a network accessible location and followed the process documented in the hot fix instructions. To summarize:

Create a deployment registry report on each machine. The reports showed that:

SAS Studio Basic is installed on the Linux compute tier.

Update to SAS Studio 3.5

SAS Studio Enterprise is installed on the Linux middle-tier.

Update to SAS Studio 3.5_1

Update SAS Studio Basic

Stop all SAS servers in the deployment. Run the SAS Deployment Manager on the LINUX compute tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Basic. A review of the hot fix documentation shows no post-deployment steps are required for SAS Studio Basic.

Update to SAS Studio 3.5_2

Update SAS Studio Mid-Tier (Enterprise)

Run the SAS Deployment Manager on the LINUX middle-tier tier and select Apply Hot fixes and then select the directory where the hot fix was downloaded. The Wizard updates SAS Studio Mid-Tier.

Update to SAS Studio 3.5_3

A review of the hot fix documentation shows that, to complete the update, the SAS Studio Web Application must be rebuilt and redeployed.

Update to SAS Studio 3.5_4

Start the SAS Metadata Server and use the SAS Deployment Manager on the middle-tier to rebuild just the SAS Studio Middle-Tier. Start all SAS Servers and use the SAS Deployment Manager on the middle-tier machine to redeploy just the SAS Studio Middle Tier.

When the redeploy is completed, I logon to SAS Studio. Selecting Help > About shows that now I have SAS Studio 3.5.

Update to SAS Studio 3.5_5

If I navigate the folder tree and select a SAS program I can now right-click on the program and select “Batch Submit” to run the program in the background.

Update to SAS Studio 3.5_6

If you are excited about the new functionality of SAS Studio 3.5, I think you will agree that the hot fix provides an easy path to update the software.

tags: deployment, SAS Administrators, SAS architecture, SAS Professional Services, sas studio

A quick way to update to SAS Studio 3.5 was published on SAS Users.

5月 122016
 

These days SAS programmers have more choices than ever before about how to run SAS.  They can use the old Display Manager interface, or SAS Enterprise Guide, or the new kid on the block: SAS StudioAll of these are included with Base SAS.

DisplayManager9-4window

SAS Display Manager

EG7-12window

SAS Enterprise Guide

SASStudio3-5window

SAS Studio

Once upon a time, the only choices were Display Manager (officially named the SAS windowing environment), or batch.  Then along came SAS Enterprise Guide.  (Ok, I know there were a few others, but I don’t count SAS/ASSIST which was rightly spurned by SAS users, or the Analyst application which was just a stopover on the highway to SAS Enterprise Guide.)

I recently asked a SAS user, “Which interface do you use for SAS programming?”

She replied, “Interface?  I just install SAS and use it.”

“You’re using Display Manager,” I explained, but she had no idea what I was talking about.

Trust me.  This person is an extremely sophisticated SAS user who does a lot of leading-edge mathematical programming, but she didn’t realize that Display Manager is not SAS.  It is just an interface to SAS.

This is where old timers like me have an advantage.  If you can remember running SAS in batch, then you know that Display Manager, SAS Enterprise Guide, and SAS Studio are just interfaces to SAS–wonderful, manna from heaven–but still just interfaces.  They are optional.  You could write SAS programs in Word or Notepad or some other editor, and submit them in batch–but why would you?  (I know someone is going to tell me that they do, in fact, do that, but the point is that it is not mainstream.  Only mega-nerds with the instincts of a true hacker do that these days.)

Each of these interfaces has advantages and disadvantages.  I’m not going to list them all here, because this is a blog not an encyclopedia, but the tweet would be

“DM is the simplest, EG has projects, SS runs in browsers.”

Personally, I think all of these interfaces are keepers.  At least for the near future, all three of these interfaces will continue to be used.  What we are seeing here is a proliferation of choices, not displacement of one with another.

So what’s your SAS interface?

 


4月 062016
 

parallel2One of the hidden gems of SAS Studio is the ability to run process flows in parallel. This feature really shines when used in a grid environment. Let’s discuss this one step at a time.

First, what is a process flow? When working in the Visual Programmer perspective, you have access to process flows. A process flow is a graphical representation of a process, where each object, be it a SAS program, a SAS Studio task, a query, and so on, is represented by a node. Nodes are connected by links that instruct SAS Studio how to move from one node to the next.

Note: Click on the images to enlarge them.

SAS Studio Parallel Process Flows1

Display 1. SAS Studio Process Flow

On the Properties tab of the current process flow, you can set the execution mode of the nodes.  With the default setting, SAS Studio runs the nodes in the order in which they were added to the process flow. If node 2 is dependent on node 1, node 1 must run completely before node 2 will run.

You can change the execution mode to Parallel as shown in Display 2. When this value is set, SAS Studio uses multiple workspace servers to run the nodes concurrently, always enforcing the correct dependencies.

SAS Studio Parallel Process Flows2

Display 2. Setting the Execution Mode to Parallel

When you use this feature in a SAS Grid environment, if the administrator has configured the workspace server sessions to be grid launched, you can achieve the benefits and the performance improvements of multi-machine parallel load balancing, without having to code any SAS/CONNECT statement. It’s a real point-and-click parallel execution engine!

Display 3 shows the process flow presented in Display 1 running in parallel execution mode. The pane is grayed out because it is not possible to interact with it until the execution is complete.

We can see that the List Data node is still running, while the Partition Data node has already finished. Thus, the two Filter Data nodes were able to start in parallel.

SAS Studio Parallel Process Flows3

Display 3. Tasks Running in Parallel in a Process Flow

In this scenario, we would guess three workspace server sessions are concurrently running our code. However, if we monitor what is happening on the back-end hosts, we notice something unexpected. There are actually five workspace server sessions running. Why? As soon as you sign in to SAS Studio, it starts two SAS sessions. These are used only for the default execution mode. If a process flow is run in parallel mode, up to three additional SAS session are started, for a total of five. Once the process flow is finished, the three additional SAS processes terminate, if there is no further activity for 30 seconds, in order to release resources.

An administrator can use a configuration property, webdms.maxParallelWorkspaces, to specify the maximum number of workspaces that can be used when SAS is running in parallel mode. The default value is 3. The maximum value is 8.

I hope you enjoy running multiple tasks concurrently. If you have already started using parallel processing, you might want to check out my earlier blog, How to avoid the pitfalls of parallel jobs!

 

 

tags: parallel processing, SAS Professional Services, SAS Programmers, sas studio

SAS Studio Parallel Process Flows was published on SAS Users.

3月 022016
 

parallelI recently received a call from a colleague that is using parallel processing in a grid environment; he lamented that SAS Enterprise Guide did not show in the work library any of the tables that were successfully created in his project.

The issue was very clear in my mind, but I was not able to find any simple description or picture to show him: so why not put it all down in a blog post so everyone can benefit?

Parallel processing can speed up your projects by an incredible factor, especially when programs consist of subtasks that are independent units of work and can be distributed across a grid and executed in parallel. But when these parallel execution environments are not kept in sync, it can also introduce unforeseen problems.

This specific issue of “disappearing” temporary tables can happen using different client interfaces, because it does not depend upon using a certain software, but rather on the business logic that is implemented. Let’s look at two practical examples.

SAS Studio

We want to run an analysis – here a simple proc print – on two independent subsets of the same table. We decide to use two parallel grid sessions to partition the data and then we run the analysis in the parent session.  The code we submit in SAS Studio could be similar to the following:

%let rc = %sysfunc( grdsvc_enable(_all_, server= SASApp));
signon grid1;
signon grid2;
proc datasets library=work noprint;
delete sedan SUV;
run;
rsubmit grid1 wait=no ;
data sedan;
set sashelp.cars;
where Type=”Sedan”;
run;
endrsubmit;
rsubmit grid2 wait=no ;
data SUV;
set sashelp.cars;
where Type=”SUV”;
run;
endrsubmit;
waitfor _ALL_ grid1 grid2;
proc print data=sedan;
run;
proc print data=SUV;
run;

After submitting the code by pressing F3 or clicking Run, we do not get the expected RESULT window and the LOG window shows some errors:

ThePitfallsofParallelJobs

SAS Enterprise Guide

Suppose we have a project similar to the following.

ThePitfallsofParallelJobs2

Many items can be created independently of others. The orange arrows illustrate the potential tasks which can be executed in parallel.

Let’s try to run these project tasks in parallel. Open File, Project Properties. Select Code Submission and flag “Allow parallel execution on the same server.”

ThePitfallsofParallelJobs3

This property enables SAS Enterprise Guide to create one or more additional workspace server connections so that parallel process flow paths can be run in parallel.

Note: Despite the description, when used in a grid environment the additional workspace server sessions do not always execute on the same server. The grid master server decides where these sessions start.

After we select Run, Run Process Flow to submit the code, SAS Enterprise Guide submits the tasks in parallel respecting the required dependencies.  Unfortunately, some tasks fail and red X’s appear over the top right corner of the task icons. In the log summary we may find the following:

ERROR: File WORK.QUERY_FOR_PRODUCTS.DATA does not exist.

What's going on?

The WORK library is the temporary library that is automatically defined by SAS at the beginning of each SAS session or job. The WORK  library stores temporary SAS files that are written by a data step or a procedure and then read as input of subsequent steps. After enabling the parallel execution on the grid, tasks run in multiple SAS sessions, and each grid session has its own dedicated WORK library that is not shared with any other grid session, or with the parent work session which started it.

In the SAS Studio example, the data steps outputs their results – the SEDAN and SUV tables – in the WORK library of a SAS session, then the PROC PRINT tries to read those tables from the WORK library of a different SAS session. Obviously the tables are not there, and the task fails.

ThePitfallsofParallelJobs4

This is a quite common issue when dealing with multiple sessions – even without a grid. One simple solution is to avoid using the WORK library and any other non-shared resources. It is possible to assign a common library in many ways, such as in autoexec files or in metadata.
The issue is solved:

ThePitfallsofParallelJobs5

No shared resources, am I safe?

Well, maybe not. Coming back to the original issue presented at the opening of this post, sometimes we oversee what ‘shared’ means. I’ll show you with this very simple SAS Enterprise Guide project: it’s just a simple query that writes a result table in the WORK library. You can test it on your laptop, without any grid.

ThePitfallsofParallelJobs6

After running it, the FILTER_FOR_AIR table appears in the server’s pane:

ThePitfallsofParallelJobs7

Now, let’s say we have to prepare for a more complex project and we follow again the steps to “Allow parallel execution on the same server. “ Just to be safe, we resubmit the project to test what happens. All seems unchanged, so we save everything and close SAS Enterprise Guide. Say I forgot to ask you to write down something about the results. We reopen the project and knowing the result table in the WORK library was temporary, we rerun the project to recreate it.

This time something is wrong.

ThePitfallsofParallelJobs8

The Output Data pane shows an error, and the Servers pane does not list the FILTER_FOR_AIR table anymore.
Even if we rerun the project, the table will not reappear.

The reason lies, again, in the realm of shared v.s. local libraries.

As soon as we enable “Allow parallel execution on the same server,” SAS Enterprise Guide starts at least one additional SAS session to process the code, even if there is nothing to parallelize. Results are saved only there, but SAS Enterprise Guide always tries to read them from the original, parent session. So we are again in the trap of local WORK libraries.

ThePitfallsofParallelJobs9

Why didn’t we uncover the issue the first time we ran the project? If you run your code, at least once, without the “Allow parallel execution on the same server” option, the results are saved in the parent session. And, they remain there even after enabling parallelization. As a result, we actually have two copies of the FILTER_FOR_AIR table!
As soon as we close SAS Enterprise Guide, both tables are deleted. So, on the next run, there is nothing in the parent session WORK library to send to SAS Enterprise Guide!

The solution? Same as before – only use shared libraries.

Is this all?

As you might have guessed, the answer is no. Libraries are not the only objects that should be shared across sessions. Every local setting – be it the value of an option, a macro, a format – has to be shared across all parallel sessions. Not difficult, but we have to remember to do it!

 

 

 

 

 

 

 

tags: parallel processing, sas enterprise guide, SAS Grid Manager, SAS Professional Services, SAS Programmers, sas studio

Avoid the pitfalls of parallel jobs was published on SAS Users.

11月 142015
 

I brushed aside some sawdust on the workbench and set my laptop down. It wasn’t really mine. SAS Library Services had kindly lent me a new laptop for the “Making Sense of Sensor Data” workshop at UNC’s BEaM Makerspace. I had just set the laptop down…in sawdust. Like any normal […]

Sawdust and SAS Studio: Thoughts on a liberal arts education during an IoT workshop was published on SAS Voices.

9月 012015
 

Will the Internet of Things (IoT) create a web of connected devices that make our lives better or an infinite infestation of annoying devices invading our privacy for no good reason? I don't know. I do know that the answer is going to depend less on the technology and more […]

The Internet of Things at your local library was published on SAS Voices.

5月 282015
 

SAS software is used around the world in some of the most sophisticated ways, like ATM fraud detection and cancer research. But recently, I used it for a practical, and much needed, task -- replacing our break room coffee machine. Now, this is no ordinary coffee machine. It also makes […]

The post How analytics saved the break room coffee machine appeared first on The SAS Training Post.

5月 302014
 

I wish I had a nickle for every time I heard this question at SAS Global Forum:

"So, does this SAS Studio thing replace SAS Enterprise Guide?"

SAS Studio is a pretty big deal. It's groundbreaking in several ways:

  • It's a web-based programming interface to SAS. It runs in your browser, which means that end users don't have to install anything (when connecting to a remote SAS session).
  • It's an HTML5-based application, so there are no browser plugins needed. It runs on Windows, Macs, and even the iPad.
  • It's the basis for new offerings from SAS, most notably the SAS University Edition. This offering is free to just about any learner for non-commercial use. The SAS University Edition includes SAS, running in a virtual machine, packaged with SAS Studio as the user interface. Since its launch earlier this week, people have been downloading it like crazy.

You're going to be hearing a lot about SAS Studio. It was even the theme for this month's SAS Tech Report.
If you haven't seen SAS Studio, take a few minutes and watch my SAS Tech Talk interview with Shannon Smith, the SAS R&D testing manager for the product:

 

So what does this mean for those of us who have invested our skills and processes in SAS Enterprise Guide? If you read this blog regularly, you know that includes me! Does this "new app on the block" replace our beloved SAS Enterprise Guide? The answer is No -- and Yes.

No, SAS Studio isn't a direct SAS Enterprise Guide replacement. SAS Enterprise Guide continues to get new features, mostly targeting productivity enhancements and integration with other SAS offerings, such as SAS Visual Analytics. Many thousands of users around the world use SAS Enterprise Guide to manage process flows, reporting and analytics, database access, and custom processes. SAS Studio doesn't have all of that infrastructure (at least, not yet), and cannot step in to replace all of that.

But also, Yes: SAS Studio can replace some uses of SAS Enterprise Guide. If you use SAS Enterprise Guide simply as way to manage SAS programs in your SAS environment, then you can certainly use SAS Studio instead (or as well) to develop and maintain those programs. SAS Studio also includes some tasks for non-programmers, similar to those found in SAS Enterprise Guide -- but for now the library isn't as rich as what you'll find in SAS Enterprise Guide. And with the SAS University Edition, SAS Studio will represent the first SAS experience for the next generation of SAS programmers.

Sometimes SAS users ask me (usually in a hushed tone): Why does SAS create these different applications that seem to compete with each other? Is there some sort of contest in SAS R&D to see which teams can outdo the others? My answer: while these apps might have a certain amount of overlap, they really do serve different purposes and different audiences. Our goal is to enable SAS users -- regardless of discipline, industry, or expertise -- with the tools that are most fit for their particular purpose. One size does not fit all (though some diehard PC SAS fans might disagree with me).

Plus, here's another secret: the same developers have built all of these applications. The SAS Studio development team includes people who worked on SAS Display Manager (you know, "PC SAS") and SAS Enterprise Guide. This is a direct benefit of SAS being such a great workplace: nobody leaves. That means that the lessons learned from customers and developers are carried over and applied in each successive "app generation". If developers are competing, then they are mostly competing with the proven work they've done in the past. But since the teams always have new technology and techniques at their disposal, it's the end users who win.

tags: SAS Enterprise Guide, SAS Studio, SASAnalyticsU, Tech Talk