Jupyter

9月 192019
 

Recent updates to SAS Grid Manager introduced many interesting new features, including the ability to handle more diverse workloads. In this post, we'll take a look at the steps required to get your SAS Grid Manager environment set up accept jobs from outside of traditional SAS clients. We'll demonstrate the process of submitting Python code for execution in the SAS Grid.

Preparing your SAS Grid

Obviously, we need a SAS Grid Manager (SAS 9.4 Maintenance 6 or later) environment to be installed and configured. Once the grid is deployed, there's not a whole lot more to do on the SAS side. SAS Workspace Servers need to be configured for launching on the grid by converting them to load balanced Workspace Servers, changing the algorithm to 'Grid', and then selecting the relevant checkbox in SASApp – Logical Workspace Server properties to launch on the grid as shown below.

The only other things that might need configuring are, if applicable, .authinfo files and Grid Option Sets. Keep reading for more information on these.

Preparing your client machine

In this example scenario, the client is the Windows desktop machine where I will write my Python code. SAS is not installed here. Rather, we will deploy and use Jupyter Notebook as our IDE to write Python code, which we'll then submit to the SAS Grid. We can get Jupyter Notebook by installing Anaconda, which is a free, bundled distribution of Python and R that comes with a solid package management system. (The following steps are courtesy of my colleague. Allan Tham.)

First, we need to download and install Anaconda.

Once deployed, we can open Anaconda Navigator from the Start menu and from there, we can launch Jupyter Notebook and create a notebook.

Now it's time to configure the connection from Python to our SAS environment. To do this, we use the SAS-developed open-source SASPy module for Python, which is responsible for converting Python code to SAS code and running it in SAS. Installing SASPy is simple. First, we need to download and install a Java Runtime Environment as a prerequisite. After that, we can launch Anaconda Prompt (a command line interface) from the Start menu and use pip to install the SASPy package.

pip install saspy

Connection parameters are defined in a file named sascfg.py. In fact, the best practice is to use sascfg.py as a template, and create a copy of it, which should be named sascfg_personal.py. SASPy will read and look for connections in both files. For the actual connection parameters, we need to specify the connection method. This depends largely on your topology.

In my environment, I used a Windows client to connect to a Linux server using a standard IOM connection. The most appropriate SASPy configuration definition is therefore 'winiomlinux', which relies on a Java-based connection method to talk to the SAS workspace. This needs to be defined in the sascfg_personal.py file.

SAS_config_names=['winiomlinux']

We also need to specify the parameters for this connection definition as shown below.

# build out a local classpath variable to use below for Windows clients   CHANGE THE PATHS TO BE CORRECT FOR YOUR INSTALLATION 
cpW  =  "C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.svc.connection.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\log4j.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.security.sspi.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.core.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\saspyiom.jar"

Note that we are referencing a number of JAR files in the classpath that normally come with a SAS deployment. Since we don't have SAS on the client machine, we can copy these from the Linux SAS server. In my lab, these were copied from /SASDeploymentManager/9.4/products/deploywiz__94526__prt__xx__sp0__1/deploywiz/.

Further down in the file, we can see the connection parameters to the SAS environment. Update the path to the Java executable, the SAS compute server host, and the port for the Workspace Server.

winiomlinux = {'java'   : 'C:\\Program Files\\Java\\jre1.8.0_221\\bin\\java',
            'iomhost'   : 'sasserver01.mydemo.sas.com',
            'iomport'   : 8591,
            'encoding'  : 'latin1',
            'classpath' : cpW
            }

Note: thanks to a recent contribution from the community, SASPy now also supports a native Windows connection method that doesn't require Java. Instead, it uses the SAS Integration Technologies client -- a component that is free to download from SAS. You'll already have this component if you've installed SAS Enterprise Guide, SAS Add-In for Microsoft Office, or Base SAS for Windows on your client machine. See instructions for this configuration in the SASPy documentation.

With the configuration set, we can import the SASPy module in our Jupyter Notebook.

Although we're importing sascfg_personal.py explicitly here, we can just call sascfg.py, as SASPy actually checks for sascfg_personal.py and uses that if it finds it. If not, it uses the default sascfg.py.

To connect to the grid from Jupyter Notebook using SASPy, we need to instantiate a SASsession object and specify the configuration definition (e.g. winiomlinux). You'll get prompted for credentials, and then a message indicating a successful connection will be displayed.

It's worth nothing that we could also specify the configuration file to use when we start a session here by specifying:

sas = saspy.SASsession(cfgfile='/sascfg_personal.py')

Behind the scenes, SAS will find an appropriate grid node on which to launch a Workspace Server session. The SASPy module works in any SAS 9.4 environment, regardless of whether it is a grid environment or not. It simply runs in a Workspace Server; in a SAS Grid environment, it just happens to be a Grid-launched Workspace Server.

Running in the Grid

Now let's execute some code in the Workspace Server session we have launched. Note that not all Python methods are supported by SASPy, but because the module is open source, anyone can add to or update the available methods. Refer to the API doc for more information.

In taking another example from Allan, let's view some of the content in a SASHELP data set.

The output is in ODS, which is converted (by SASPy) to a Pandas data frame to display it in Jupyter Notebook.

Monitoring workloads from SAS Grid Manager

SAS Workload Orchestrator Web interface allows us to view the IOM connections SASPy established from our client machine. We can see the grid node on which the Workspace Server was launched, and view some basic information. The job will remain in a RUNNING state until the connection is terminated (i.e. the SASsession is closed by calling the endsas() method. By the same token, creating multiple SASsessions will result in multiple grid jobs running.

To see more details about what actually runs, Workspace Server logs need to first be enabled. When we run myclass.head(), which will display the first 5 rows of the data set, we see the following written to Workspace Server log.

2019-08-14T02:47:34,713 INFO  [00000011] :sasdemo - 239        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
2019-08-14T02:47:34,713 INFO  [00000011] :sasdemo - 239      ! ods graphics on / outputfmt=png;
2019-08-14T02:47:34,779 INFO  [00000011] :sasdemo - NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
2019-08-14T02:47:34,780 INFO  [00000011] :sasdemo - 240        ;*';*";*/;
2019-08-14T02:47:34,782 INFO  [00000045] :sasdemo - 241        data _head ; set sashelp.prdsale (obs=5 ); run;
2019-08-14T02:47:34,782 INFO  [00000045] :sasdemo -
2019-08-14T02:47:34,783 INFO  [00000045] :sasdemo - NOTE: There were 5 observations read from the data set SASHELP.PRDSALE.
2019-08-14T02:47:34,784 INFO  [00000045] :sasdemo - NOTE: The data set WORK._HEAD has 5 observations and 5 variables.
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo - NOTE: DATA statement used (Total process time):
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -       real time           0.00 seconds
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -       cpu time            0.00 seconds
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -

We can see the converted code, which in this case was a data step which creates a work table based on PRDSALE table with obs=5. Scrolling down, we also see the code that prints the output to ODS and converts it to a data frame.

Additional Options

Authinfo files

The sascfg_personal.py file has an option for specifying an authkey, which is an identifier that maps to a set of credentials in an .authinfo file (or _authinfo on Windows). This can be leveraged to eliminate the prompting for credentials. For example, if your authinfo file looks like:

IOM_GELGrid_SASDemo user sasdemo password lnxsas

your configuration defintion in your sascfg_personal.py should look like:

winiomlinux = {'java'   : 'C:\\Program Files\\Java\\jre1.8.0_221\\bin\\java',
            'iomhost'   : 'sasserver01.mydemo.sas.com',
            'iomport'   : 8591,
            'encoding'  : 'latin1',
            'authkey' : 'IOM_GELGrid_SASDemo'
            'classpath' : cpW
            }

There are special rules for how to secure the authinfo file (making it readable only by you), so be sure to refer to the instructions.

Grid Option Sets

What if you want your code to run in the grid with certain parameters or options by default? For instance, say you want all Python code to be executed in a particular grid queue. SASPy can do this by leveraging Grid Option Sets. The process is outlined here, but in short, a new SASPy 'SAS Application' has to be configured in SAS Management Console, which is then used to the map to the Grid Options Set (created using the standard process).

More Information

My sincere thanks to Allan Tham and Greg Wootton for their valued contributions to this post.

Please refer to the official SAS Grid documentation for more information on SAS Grid Manager in SAS 9.4M6.

Thank you for reading. I hope the information provided in this post has been helpful. Please feel free to comment below to share your own experiences.

Using Python to run jobs in your SAS Grid was published on SAS Users.

2月 062018
 

Good news learners! SAS University Edition has gone back to school and learned some new tricks.

With the December 2017 update, SAS University Edition now includes the SASPy package, available in its Jupyter Notebook interface. If you're keeping track, you know that SAS University Edition has long had support for Jupyter Notebook. With that, you can write and run SAS programs in a notebook-style environment. But until now, you could not use that Jupyter Notebook to run Python programs. With the latest update, you can -- and you can use the SASPy library to drive SAS features like a Python coder.

Oh, and there's another new trick that you'll find in this version: you can now use SAS (and Python) to access data from HTTPS websites -- that is, sites that use SSL encryption. Previous releases of SAS University Edition did not include the components that are needed to support these encrypted connections. That's going to make downloading web data much easier, not to mention using REST APIs. I'll show one HTTPS-enabled example in this post.

How to create a Python notebook in SAS University Edition

When you first access SAS University Edition in your web browser, you'll see a colorful "Welcome" window. From here, you can (A) start SAS Studio or (B) start Jupyter Notebook. For this article, I'll assume that you select choice (B). However, if you want to learn to use SAS and all of its capabilities, SAS Studio remains the best method for doing that in SAS University Edition.

When you start the notebook interface, you're brought into the Jupyter Home page. To get started with Python, select New->Python 3 from the menu on the right. You'll get a new empty Untitled notebook. I'm going to assume that you know how to work with the notebook interface and that you want to use those skills in a new way...with SAS. That is why you're reading this, right?

Move data from a pandas data frame to SAS

pandas is the standard for Python programmers who work with data. The pandas module is included in SAS University Edition -- you can use it to read and manipulate data frames (which you can think of like a table). Here's an example of retrieving a data file from GitHub and loading it into a data frame. (Read more about this particular file in this article. Note that GitHub uses HTTPS -- now possible to access in SAS University Edition!)

import saspy
import pandas as pd
 
df = pd.read_csv('https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv')
df.describe()

Here's the result. This is all straight Python stuff; we haven't started using any SAS yet.

Before we can use SAS features with this data, we need to move the data into a SAS data set. SASPy provides a dataframe2sasdata() method (shorter alias: df2sd) that can import your Python pandas data frame into a SAS library and data set. The method returns a SASdata object. This example copies the data into WORK.PROBLY in the SAS session:

sas = saspy.SASsession()
probly = sas.df2sd(df,'PROBLY')
probly.describe()

The SASdata object also includes a describe() method that yields a result that's similar to what you get from pandas:

Drive SAS procedures with Python

SASPy includes a collection of built-in objects and methods that provide APIs to the most commonly used SAS procedures. The APIs present a simple "Python-ic" style approach to the work you're trying to accomplish. For example, to create a SAS-based histogram for a variable in a data set, simply use the hist() method.

SASPy offers dozens of simple API methods that represent statistics, machine learning, time series, and more. You can find them documented on the GitHub project page. Note that since SAS University Edition does not include all SAS products, some of these API methods might not work for you. For example, the SASml.forest() method (representing

In SASPy, all methods generate SAS program code behind the scenes. If you like the results you see and want to learn the SAS code that was used, you can flip on the "teach me SAS" mode in SASPy.

sas.teach_me_sas('true')

Here's what SASPy reveals about the describe() and hist() methods we've already seen:

Interesting code, right? Does it make you want to learn more about SCALE= option on PROC SGPLOT?

If you want to experiment with SAS statements that you've learned, you don't need to leave the current notebook and start over. There's also a built-in %%SAS "magic command" that you can use to try out a few of these SAS statements.

%%SAS
proc means data=sashelp.cars stackodsoutput n nmiss median mean std min p25 p50 p75 max;run;

Python limitations in SAS University Edition

SAS University Edition includes over 300 Python modules to support your work in Jupyter Notebook. To see a complete list, run the help('modules') command from within a Python notebook. This list includes the common Python packages required to work with data, such as pandas and NumPy. However, it does not include any of the popular Python-based machine learning modules, nor any modules to support data visualization. Of course, SASPy has support for most of this within its APIs, so why would you need anything else...right?

Because SAS University Edition is packaged in a virtual machine that you cannot alter, you don't have the option of installing additional Python modules. You also don't have access to the Jupyter terminal, which would allow you to control the system from a shell-like interface. All of this is possible (and encouraged) when you have your own SAS installation with your own instance of SASPy. It's all waiting for you when you've outgrown the learning environment of SAS University Edition and you're ready to apply your SAS skills and tech to your official work!

Learn more

The post Coding in Python with SAS University Edition appeared first on The SAS Dummy.

11月 152016
 

Rick Wicklin showed us how to visualize the ages of US Presidents at the time of their inaugurations. That's a pretty relevant thing to do, as the age of the incoming president can indirectly influence aspects of the president's term, thanks to health and generational factors.

As part of his post, Rick supplied the complete data set for US Presidents and their birthdays. He challenged his readers to create their own interesting visualizations, and that's what I'm going to do here. I'm going to show you the distribution of US Presidents by their astrological signs.

Now, you might think that "your sign" is not as relevant of a factor as Age, and I certainly hope that you're correct about that. But past presidents have sought the advice of astrologers, and zodiac signs can influence the counsel such astrologers might offer. (Famously, Richard Nixon took advice from celebrity psychic Jeane Dixon. First Lady Nancy Reagan also sought her advice, and we know that Mrs. Reagan in turn influenced President Reagan.)

Like any good analyst, I mostly reused existing work to produce my results. First, I used the DATA step that Rick provided to create the data set of presidents and birthdays. Next, I reused my own work to create a SAS format that displays a zodiac sign for each date. And finally, I wrote write a tiny bit of PROC FREQ code to create my table and frequency plot.

data signs;
 /* So this column appears first */
 retain President;
 length sign 8;
 /* SIGN. format created earlier with PROC FORMAT */
 format sign sign.;
 set presidents (keep=President BirthDate InaugurationDate);
 /* convert birthday to our normalized SIGN date */
 sign = mdy(month(birthdate),day(birthdate),2000);
run;
 
ods graphics on;
proc freq data=signs order=freq;
tables sign / plots=freqplot;
run;

To keep things a bit fresh, I did all of this work in SAS University Edition using the Jupyter Notebook interface. Here's a glimpse of what it looks like:

procprintsigns
And here's the distribution you've all been waiting to see. When he takes office, Donald Trump will join George H. W. Bush and JFK in the Gemini column.

signspres
I've shared the Jupyter Notebook file as a public gist on GitHub. You can download and import into your own instance if you have SAS and Jupyter Notebook working together. (Having trouble rendering the notebook file? Try looking at it through the nbviewer service. That usually works.)

tags: Jupyter, proc format, SAS University Edition, zodiac

The post Zodiac signs of US Presidents appeared first on The SAS Dummy.

8月 072016
 

A few months ago I shared the news about Jupyter notebook support for SAS. If you have SAS for Linux, you can install a free open-source project called sas-kernel and begin running SAS code within your Jupyter notebooks. In my post, I hinted that support for this might be coming in the SAS University Edition. I'm pleased to say that this is one time where my crystal ball actually worked -- Jupyter support has arrived!

(Need to learn more about SAS and Jupyter? Watch this 7-minute video from SAS Global Forum.)

Start coding in the notebook format

If you download or update your instance of SAS University Edition, you'll be able to point your browser to a slightly different URL and begin running SAS programs in Jupyter. Of course, you can continue to use SAS Studio to learn SAS programming skills. Having trouble deciding which to use? You don't have to choose: you can use both!

jupyter_ue
If you've started SAS University Edition within Oracle Virtual Box, you can find SAS Studio at its familiar address: http://localhost:10080/. And you can find the Jupyter notebook environment at: http://localhost:18888/. (If you're using VMWare, the URLs are slightly different. Check the documentation.)

Why did SAS add support for Jupyter notebooks? The answer is simple: you asked for it. While we believe that SAS Studio provides a better environment for producing and managing SAS code, Jupyter notebooks are widely used by students and data scientists who want to package their code, results, and documentation in the convenient notebook format. Notebook files (*.ipynb format) are even supported on GitHub, easily shareable and viewable by others.

Now, what are the limitations?

jupyter_uemenuWithin SAS University Edition, the Jupyter environment supports only SAS programs. The Jupyter project can support other languages, including Python, Julia, and R (the namesake languages) and dozens of others with published language kernels. However, because of the virtual-machine core of the SAS University Edition, those other languages are not available.

Support for other languages (as well as for the Jupyter console) is available when you use Jupyter in a standalone SAS environment. In fact, the sas_kernel project recently received some exciting updates. You can now host the Jupyter environment on a different machine than your SAS server (although Linux is still the only supported SAS host), and the installation process has been streamlined. See more on the sassoftware GitHub home for the sas_kernel project.

Where can you learn more about Jupyter in SAS University Edition?

Check out the help topics for SAS University Edition, beginning with this one: What is Jupyter Notebook in SAS University Edition?

And if you need help or advice about how to make the best use of SAS University Edition, check out the SAS Analytics U community. There are plenty of experts in the forum who would love to help you learn!

tags: github, Jupyter, SAS University Edition

The post Using Jupyter and SAS together with SAS University Edition appeared first on The SAS Dummy.

4月 252016
 

We've just celebrated Earth Day, but I'm here to talk about Jupyter -- and the SAS open source project that opens the door for more learning. With this new project on the github.com/sassoftware page, SAS contributes new support for running SAS from within Jupyter Notebooks -- a popular browser-based environment used by professors and data scientists.

My colleague Amy Peters announced this during a SAS Tech Talk show at SAS Global Forum 2016. If you want to learn more about Jupyter and see the SAS support in action, then you can watch the video here.

Visit the project on GitHub: sas_kernel by sassoftware

Within Jupyter, the sas_kernel provides multiple ways to access SAS programming methods. The most natural method is to create a new SAS notebook, available from the New menu in the Jupyter Home window and from the File menu in an active notebook:

newsasnotebook
From a SAS notebook, you can enter and run SAS code directly from a cell:

sasnotebook
There is even a Notebook extension (./nbextensions/showSASLog) that can show you the SAS log.

The second way that you can run SAS code is by using special Python "magics" supported by the sas_kernel. These magic commands look almost just like SAS macro calls (imagine that!). From within a Python language notebook, you can inject your SAS program code and pull in SAS results. This allows you to move easily between Python and SAS in a single environment. Here's a simple example:

%%SAS
proc means data=sashelp.cars;
run;
ods graphics / height=500 width=800;
proc sgplot data=sashelp.cars;
histogram msrp;
run;

How to get started

Currently, to run SAS with Jupyter you need:

  • SAS 9.4 or later running on Linux
  • Python 3 installed on the same machine (that's basically part of Linux)
  • Admin rights to be able to install/configure the Jupyter Notebook infrastructure and the sas_kernel.

End users of Jupyter Notebook do not need special privileges -- you need those only to install and configure the pieces that make it work. The GitHub project has all of the doc and step-by-step instructions for installation.

What's next for SAS and Jupyter?

This is just the start for SAS in the Jupyter world. Amy says that she has already received lots of interest and feedback, and SAS is working to make the Jupyter Notebook approach available in something like SAS University Edition and SAS OnDemand for Academics. Stay tuned!

tags: Jupyter, open source, Python, SAS global forum

The post How to run SAS programs in Jupyter Notebook appeared first on The SAS Dummy.