Developers

7月 072020
 

Are you looking for a specific CAS action to use in your project? Maybe you need to create a linear or logistic regression and can't seem to find the CAS action? In this post in the Getting Started with Python Integration to SAS® Viya® series, we are going to look at exploring and loading CAS action sets.

In this example, I have already made my connection to CAS using SAS Viya for Learners, and my connection object is named conn. Visit Part 1 of the series to learn more about making a connection to CAS.

Exploring CAS Action Sets

To explore CAS action sets you can use the actionSetInfo CAS action. This should look familiar from Part 2 - Working with CAS Actions and CASResults Objects.

conn.actionsetinfo()['setinfo']

Your results might differ depending on your SAS Viya installation.

The results show the default CAS action sets loaded with SAS Viya. However, many more are available. To see all available CAS action sets use the parameter all=True. The results from running the command with the all=True parameter includes both loaded and available action sets.

conn.actionsetinfo(all=True)['setinfo']

Your results might differ depending on your SAS Viya installation.

As a result of the all=True parameter, we can see over a hundred available CAS action sets. The next question is, how do I load an action set?

Loading a CAS Action Set

I think of loading an action set like importing a package in Python.  To load an action set you need to complete two tasks:

  1. Find the action set
  2. Load the action set

Find the action set

First, let's find the action set. Let's assume we need to complete a logistic regression but are unsure of where the necessary CAS action is located. Instead of of manually scrolling through the CASResults object from the actionSetInfo CAS action, we can instead search the SASDataFrame (summarized data from CAS) for keywords.

First,  I'll set the SASDataFrame from the CASResults object equal to the variable df by calling the setinfo key.

df = conn.actionsetinfo(all=True)['setinfo']

Next, I'll use the loc method on the SASDataFrame to search for any action set that contains the string regression. There are a variety of ways to do this; I'll use the string contains method. I'll make the search case insensitive by using case=False.

df.loc[df['actionset'].str.contains('regression', case=False), :]

Great! In the results I see a regression CAS action set. That looks exactly like what I need. Next, it's time to load the action set.

Load the action set

To load the action use the loadActionSet action with the actionSet='regression' parameter.

conn.loadActionSet(actionSet='regression')


That's it! You have now loaded a CAS action set! Finally, let's explore the actions inside the regression CAS action set.

Exploring CAS Actions in a CAS Action Set

Once a CAS action set is loaded, you can explore the available CAS actions by using the help action with the parameter actionSet='regression'.

conn.help(actionSet='regression')

The results display a list of all the CAS actions in the regression action set. I see the logistic CAS action that I need!

Summary

In conclusion, exploring and loading CAS action sets is important when working on your projects in SAS Viya. A couple of key points to remember:

  • The actionSetInfo CAS action returns all loaded CAS action sets.
  • The actionSetInfo parameter all=True returns all available CAS action sets.
  • The loadActionSet CAS action loads a CAS action set.

In the next post we will talk about exploring data stored in the CAS environment in Exploring Caslibs.

If you enjoyed this blog post, share it with a friend!

Additional Information and Resources

SAS® Viya® 3.5 Actions and Action Sets by Name and Product

Getting Started with Python Integration to SAS® Viya® Index

Getting Started with Python Integration to SAS® Viya® - Part 3 - Loading a CAS Action Set was published on SAS Users.

6月 192020
 

In the second post of the Getting Started with Python Integration to SAS® Viya® series we will learn about Working with CAS Actions and CASResults Objects. CAS actions are commands sent to the CAS server to run a task, and CASResults objects contain information returned from the CAS server. This post will cover a few basic CAS actions, and how to easily work with the information returned.

CAS Actions Overview

First, you need to understand CAS actions. From performing data preparation, modeling, imputing missing values, or even retrieving information about your CAS session, CAS actions perform a single task on the CAS server. CAS actions are organized with other CAS actions in a CAS action set. CAS action sets contain actions that are based on common functionality.

In the end, I like to think of CAS action sets as a package, and all the CAS actions inside an action set as a method.

Getting Started with CAS Actions

We will start with a basic CAS action to view all available loaded action sets in the current CAS session. Before you use any CAS action you must connect to the CAS server. For more information on connecting to the CAS server, visit Part 1 of the series.

I have already made my connection to CAS using SAS Viya for Learners, and my connection object is named conn.

display(conn)
CAS('svflhost.demo.sas.com', 5570, 'peter.test@sas.com', protocol='cas', name='py-session-1', session='efff4323-a862-bd6e-beea737b4249')

Next, to view all available action sets loaded in your CAS session use the actionSetInfo CAS action from the Builtins action set on the CAS connection object. CAS actions and CAS action sets are case insensitive, and you do not need to qualify the CAS action with the action set name (although it is best practice to include both). In the example below I qualify the CAS action.

The actionSetInfo CAS action returns the following output:

Your output information might differ depending on your SAS Viya installation.

The output shows a list of the loaded action sets on the CAS server with additional information for each. With a quick scan, it almost looks like a Pandas DataFrame. However, let's go back and run the actionSetInfo CAS action, but this time use the Python type function to see the data type of the output.

type(conn.builtins.actionsetinfo())
swat.cas.results.CASResults

The actionSetInfo CAS action returns a CASResults object. The million dollar question is, what exactly is a CASResults object?

CASResults Object

CAS actions return a CASResults object. A CASResults object is an ordered Python dictionary with additional methods and attributes. There are no specific rules about how many keys are contained in the CASResults object, or what values the keys return. That information depends on the CAS action used.

One thing to note is that CASResults objects are local on your client machine. That is, the data is not in CAS anymore, it has been processed by CAS and returned locally. This is something to keep in mind when you are working with big data in CAS and try to return more data than your local computer can handle.

Let's continue by examining the output from our previous example. The CASResults object returns one key named setinfo, and one value. You can quickly tell by looking at the output.

However, another method to view all the keys in the CAResults object is to use the Python dictionary keys method. In this example, it will return one key named setinfo.

conn.actionsetinfo().keys()
odict_keys(['setinfo'])

Be aware of how I specified the CAS action in the above example. This time I did not qualify it with the Builtins CAS action set. While you can specify with or without the CAS action set, I recommend to be consistent. I typically do not type the CAS action set. I'll keep that method moving forward.

To view a specific value of a CASResults object you can call the key like a typical Python dictionary. In this example you can call the setinfo key to return the key's value.

conn.actionsetinfo()['setinfo']

As mentioned earlier, the output seems to resemble a Pandas DataFrame. To confirm our suspicion, we can use the type function to see the data type of the value returned by the setinfo key,

type(conn.actionsetinfo()['setinfo'])
swat.dataframe.SASDataFrame

Interesting! The CASResults object contains a SASDataFrame for the setinfo key. What exactly is a SASDataFrame?

Understanding a SASDataFrame

A SASDataFrame is a subclass of a Pandas DataFrame. As a result, you can work with them as you normally would a Pandas DataFrame! Another thing to note is a SASDataFrame is local data. Again, as you use CAS you must be aware CAS can handle more data than your local computer can handle. When bringing data locally make sure it's usable on your local machine. We will discuss this in future posts.

Let's work with the SASDataFrame. First, I will create a new variable named df that holds the SASDataFrame from the actionSetInfo action.

df = conn.actionsetinfo()['setinfo']

Next, I'll use some Pandas methods on the df object. Let's start with the head method to view the first five rows.

df.head()

Or filter the data using the loc method. In this scenario I want to find all action sets with the name simple, and return the actionset and label columns.

df.loc[df['actionset'] == 'simple', ['actionset', 'label']]

Or check unique values of the product_name column by using the value_counts method.

df['product_name'].value_counts()
tkcas        8
crsstat      3
crssearch    1

Or...

No that's it. I think you get my point! You can work with a SASDataFrame object like you would a Pandas DataFrame!

CASResults Object With Multiple Keys

Lastly, let's use another CAS action, but this time the CASResults object contains multiple keys, with multiple value types.

In this example I use the serverStatus CAS action from the Builtins CAS action set. The CAS action returns the status of the server.

conn.serverstatus()

From the looks of the output, I see there are three keys; About, server, and nodestatus. However, let's check our assumption by using the keys method on the CASResults object.

conn.serverstatus().keys()
odict_keys(['About', 'server', 'nodestatus'])

In the output there are three keys as expected. Let's look at the data type of each key by writing a quick Python loop to print the name of the key and the value type returned by that key.

for key,value in conn.serverstatus().items():
      print('Key : {}, Value Type : {}'.format(key,type(value)))
Key : About, Value Type : <class 'dict'>
Key : server, Value Type : <class 'swat.dataframe.SASDataFrame'>
Key : nodestatus, Value Type : <class 'swat.dataframe.SASDataFrame'>

The serverStatus CAS action contains three keys, one key contains a dictionary object, and the other two keys contain a SASDataFrame.

Summary

In conclusion, understanding CAS actions and the CASResults objects are essential when working with CAS. A couple of key points to remember:

  • CAS actions reside in CAS action sets and perform a single task on the CAS server.
  • CAS actions return a CASResults object, which is simply a Python dictionary.
  • A CASResults object contains a single or multiple keys, corresponding to a value or values of any data type.
  • A SASDataFrame resides in a CASResults object, and is a subclass of a Pandas DataFrame.

That was a quick overview of a few basic CAS actions, and how to work with the CASResults objects they return.  You can check out the SAS documentation below for all available CAS Actions.

Additional Resources

SAS® Viya® 3.5 Actions and Action Sets by Name and Product

Getting Started with Python Integration to SAS® Viya® Series Index

Getting Started with Python Integration to SAS® Viya® - Part 2 - Working with CAS Actions and CASResults Objects was published on SAS Users.

6月 192020
 

Index of articles on Getting Started with Python Integration to SAS® Viya®.
Part 1 - Making a Connection
Part 2 - Working with CAS Actions and CASResults Objects
Part 3 - Loading a CAS Action Set
Part 4 - Exploring Caslibs
Part 5 - Uploading Data into CAS
Part 6 - Session vs Global Scope Tables
Part 7 - Exploring Tables? CAS Actions vs SWAT API?

Getting Started with Python Integration to SAS® Viya® - Index was published on SAS Users.

6月 042020
 

Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

Hunter’s story

Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

What you’ll find here and what you won’t

This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

A LITTLE about Jupyter notebooks

A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

Install Anaconda

The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

Install additional language kernels

In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

Available kernels list

File conversion methods

Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

Download (Export) option

Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

"Download As"  option in Jupyter notebook

"Export Notebook As" option in JupyterLab

HTML format output

Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

SAS code in a Jupyther notebook

Corresponding SAS code notebook in html form

SAS (aka script) format output

Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

SAS code saved from a notebook displayed in Enterprise Guide

One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

PDF format output

There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

Once I’ve completed creating the files, the new files appear in my File Explorer.

New SAS file in Windows File Explorer

Cheaters may never win, but they can create a PDF quickly

Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

Inline code option

Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

import os
os.system("jupyter nbconvert myNotebook.ipynb --to html")

An example with Python

The example below is from a Python notebook. The "0" out code represents success.

Code to create a PDF file from a Python notebook

An example with SAS

As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

Code to create a PDF file from a SAS notebook

The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

Jupyter file system display in a web browser

An example with R

Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

Code to create an HTML file from an R notebook

The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

The command line

The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

An example with a SAS notebook

Convert sasNotebook.ipynb to a SAS file.

>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
 
>> jupyter nbconvert --to script sasNotebook.ipynb
[NbConvertApp] Converting notebook sasNotebook.ipynb to script
[NbConvertApp] Writing 351 bytes to sasNotebook.sas
 
>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
-rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

An example with a Python notebook

Convert 1_load_data.ipynb to a PDF file

>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
 
>> jupyter nbconvert 1_load_data.ipynb --to pdf
[NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
[NbConvertApp] Writing 27341 bytes to .\notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
 
>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
-rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

An example with an R notebook

Convert HR_R.ipynb to an R file.

>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
 
>> jupyter nbconvert HR_R.ipynb --to script
[NbConvertApp] Converting notebook HR_R.ipynb to script
[NbConvertApp] Writing 981 bytes to HR_R.r
 
>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
-rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

Wrapping things up

Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

How to save Jupyter notebooks in assorted formats was published on SAS Users.

6月 042020
 

Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

Hunter’s story

Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

What you’ll find here and what you won’t

This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

A LITTLE about Jupyter notebooks

A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

Install Anaconda

The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

Install additional language kernels

In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

Available kernels list

File conversion methods

Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

Download (Export) option

Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

"Download As"  option in Jupyter notebook

"Export Notebook As" option in JupyterLab

HTML format output

Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

SAS code in a Jupyther notebook

Corresponding SAS code notebook in html form

SAS (aka script) format output

Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

SAS code saved from a notebook displayed in Enterprise Guide

One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

PDF format output

There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

Once I’ve completed creating the files, the new files appear in my File Explorer.

New SAS file in Windows File Explorer

Cheaters may never win, but they can create a PDF quickly

Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

Inline code option

Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

import os
os.system("jupyter nbconvert myNotebook.ipynb --to html")

An example with Python

The example below is from a Python notebook. The "0" out code represents success.

Code to create a PDF file from a Python notebook

An example with SAS

As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

Code to create a PDF file from a SAS notebook

The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

Jupyter file system display in a web browser

An example with R

Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

Code to create an HTML file from an R notebook

The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

The command line

The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

An example with a SAS notebook

Convert sasNotebook.ipynb to a SAS file.

>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
 
>> jupyter nbconvert --to script sasNotebook.ipynb
[NbConvertApp] Converting notebook sasNotebook.ipynb to script
[NbConvertApp] Writing 351 bytes to sasNotebook.sas
 
>> ls -la |grep sasNotebook
-rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
-rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

An example with a Python notebook

Convert 1_load_data.ipynb to a PDF file

>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
 
>> jupyter nbconvert 1_load_data.ipynb --to pdf
[NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
[NbConvertApp] Writing 27341 bytes to .\notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
 
>> ls -la |grep 1_load
-rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
-rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

An example with an R notebook

Convert HR_R.ipynb to an R file.

>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
 
>> jupyter nbconvert HR_R.ipynb --to script
[NbConvertApp] Converting notebook HR_R.ipynb to script
[NbConvertApp] Writing 981 bytes to HR_R.r
 
>> ls -la | grep HR
-rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
-rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

Wrapping things up

Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

How to save Jupyter notebooks in assorted formats was published on SAS Users.

4月 152020
 

Welcome to the first post for the Getting Started with Python Integration to SAS Viya series! With the popularity of the Python programming language for data analysis and SAS Viya's ability to integrate with Python, I thought, why not create tutorials for users integrating the two?

To begin the series I want to talk about the most important step, making a connection to SAS Viya through your favorite Python client. In the examples I will use a Jupyter notebook, but the method would be the same on any Python client interface. Before I begin diving into code and connections, I want to provide a brief, high level overview of SAS Viya.

What is SAS Viya?

SAS Viya extends the SAS Platform, operates in the cloud (as well as in hybrid and on-prem solutions) and is open source-friendly. For better performance, SAS Viya operates on in-memory data, removing the read/write data transfer overhead. Data processing and analytic procedures use the SAS Cloud Analytic Services (CAS), the engine behind SAS Viya. Further, it enables everyone in an organization to collaborate and work with data by providing a variety of products & solutions running in CAS.

What exactly is CAS? Let's consider the image below.

CAS distributes heavy workloads among multiple computing instances for fast and efficient processing. The environment consists of a controller and a set of worker nodes allowing data storage and processing. Let's take a simple example. A table with 300GB of data is uploaded to the CAS environment. The controller parses out 100GB chunks of the data to each of the worker nodes. The data is loaded into memory on each of the worker nodes, and each node processes their 100GBs of data.

Additionally, CAS uses modern, dynamic algorithms to rapidly perform analytical processing on data of any size. Throughout the series I will refer to the CAS distributed environment as the CAS server, or simply CAS.

For more information about the Cloud Analytic Services architecture visit SAS® Cloud Analytic Services 3.5: Fundamentals.

Furthermore, SAS Viya is open. Business analysts and data scientists can explore, prepare and manage data to provide insights, create visualizations or analytical models using the SAS programming language or a variety of open source languages like Python, R, Lua, or Java. Because of this, programmers can easily process data in CAS, using a language of their choice.

Now that the high level overview of SAS Viya is out of the way, let's discuss the why.

Why do I want to integrate Python to SAS Viya?

When working with data on your local computer you are typically constrained to your computer's resources. For example, when you are working with smaller data (generally think around 1GB) you most likely will not have any resource issues. Alternatively, what if your data is 100GB? A terabyte? What do you do then?

The solution is simple; integrate Python to SAS Viya! At the highest level, the SAS Viya architecture is meant to work with large data your client machine cannot handle. You can load your large data into CAS which distributes chunks of the data to each of the worker nodes. The data is loaded into memory on the worker nodes and processed using Python. Sounds great right? I haven't even told you the best part.

Many of the familiar Pandas methods are available through the Scripting Wrapper for Analytics Transfer (SWAT) package. The SWAT package provides functionality and syntax having the feel of open source code, but simply wraps up CAS actions to send to the server. CAS actions are small units of work the CAS server understands. They load and transform data, compute statistics, perform analytics and create output.

Compare the simple code samples below. You see the commands written in SAS, Python, or R native code. When sent from a client to CAS, SWAT translates the command, performs the action on the CAS server, and returns a response to the client.

Another great feature is the ability to transfer summarized data from the CAS server back to the client machine. Having data on the client machine allows you to use any familiar Python packages like Pandas, Matplotlib, Seaborn, scikit-learn and many more!

So how do you get started?

Connecting to the CAS server

To connect to the CAS server, complete the following two steps:

  1. Install the SWAT package
  2. Make a Connection to the CAS Server

Install the SWAT package

First, we need to install the SWAT package. The SWAT package provides a means for Python users to submit code to the CAS server. The SWAT package translates the Python code to CAS actions the server understands. Install the SWAT package using the pip command as follows:

pip install swat

Or, if you are using Anaconda:

conda install -c sas-institute swat

For more information on the installing the SWAT package for a specific platform and version visit sassoftware/python-swat GitHub page or the documentation.

Make a connection to the CAS server

Next, it's time to make a connection from your local client to the CAS server. This step is required and there are some variations in how this is implemented. I have prepared two methods to connect to CAS. The first method employs a username and password. The second uses token authentication with the use of environment variables. Explore other authentication methods from links provided in the Additional Resources section at the end of this post.

Connecting to CAS using a username and password

First we'll import the SWAT package.

import swat

Next we'll use the swat.CAS constructor to create a connection object to the CAS server. I name this new connection object conn. Feel free to name the object whatever you would like. I've seen it named s in some documentation. I prefer conn since it's my connection to CAS.

The CAS constructor requires the host name and the listening port of the CAS controller. We also need to authenticate -- that is, we need to tell CAS who we are. In this example, we use a username and password. If you do not know any of this connection information, speak with your administrator.

conn = swat.CAS(hostname="https//:server.demo.sas.com/cas-shared-default-http/",
              port=8777, username="student", password="MetaData0")

Let's quickly investigate the output of our conn object.

display(conn)
CAS('server.demo.sas.com', 8777, 'student', protocol='http', name='py-session-1', session='c8091979--483f-8cde-3c97a1f372a3')

We are now connected! Notice the conn object holds all of our connection information. We will use this connection object going forward.

Finally, let's look at the new object's type.

type(conn)
swat.cas.connection.CAS

Alert! I hope you noticed something. You should be thinking, "did we just type our authentication information in plain text?". The answer is yes. However, when you are using password authentication, you should NEVER type your information in plain text. In this example I am using a training machine and I want to demonstrate the simplest connection procedure. For real world scenarios, there are options like setting Python environment variables, creating an authinfo file, or creating a hashed version. My recommendation is to follow your company policy for authentication.

Great! Everything looks ready and we are connected to CAS! Let's now look at a second method of connecting.

Connecting to CAS using token authentication and environment variables

In this example, I am using SAS Viya for Learners. SAS Viya for Learners is a SAS Viya implementation created for educators and their students. To access a Jupyter notebook when using SAS Viya for Learners, log in using your username and password and launch the application. Once in the application, look at the bottom of the screen in the orange box and select the Jupyter notebook icon, as seen below.

After Jupyter opens, select the Python kernel and begin.

Now it's time to make the connection to CAS. First, I'm going to import the SWAT and os packages. The os package allows me to obtain environment variable values necessary to connect to CAS.

import swat
import os

Next, I want to obtain the values of a few environment variables. SAS Viya for Learners comes configured with the needed environment variables. To start, I'm going to use the os.environ.get method to obtain the CASHOST, CASPORT and SAS_VIYA_TOKEN values and set them equal to new variables.

hostValue = os.environ.get('CASHOST')
portValue = os.environ.get('CASPORT')
passwordToken = os.environ.get('SAS_VIYA_TOKEN')

The variables hostValue, portValue and passwordToken all contain the necessary values to connect to CAS. Let's use the swat.CAS constructor again with the newly created variables.

conn = swat.CAS(hostname=hostValue, port=portValue, password=passowrdToken)

Let's view the output of the conn object.

display(conn)
CAS('svflhost.demo.sas.com', 5570, email='joe.test@sas.com', protocol='cas', name='py-session-1', session='efff4323-a862-bd6e-beea737b4249')

That's it. We're connected!

Summary

In summary, the most important thing to know is that there are multiple ways to connect to the CAS server. For additional approaches, see links below and also Joe Furbee's blog post, Authentication to SAS Viya: a couple of approaches. If you're not sure of the proper method, I recommend you discuss how to connect to CAS with the administrator of your environment.

That was just the start of the series, but it's the most important step! You can't do anything else, unless you first authenticate and connect. Once you are connected to the CAS server the next question should be, what do I do now? In the next post we will talk about sending commands to the CAS server in Working with CAS Actions and CASResults Objects.

Additional information and resources

Getting Started with Python Integration to SAS® Viya® - Part 1 - Making a Connection was published on SAS Users.

4月 032020
 

Whether you like it or not, Microsoft Excel is still a big hit in the data analysis world. From small to big customers, we still see fit for daily routines such as filtering, generating plots, calculating items on ad-hoc analysis or even running statistical models. Whenever I talk to customers, there is always someone who will either ask: Can this be exported to excel or can we import data from excel?. Recently, other questions started to come up more often: Can we run Python within SAS? How do I allow my team to choose their language of preference? How do I provide an interface that looks like Microsoft Excel, but has SAS functionalities?.

Well… good news is: we can answer YES to all of these questions. With the increase in number of users performing analytics and the number of analytical tools available, for me it was clear that we would end up having lots of disparate processes. For a while this was a problem, but naturally, companies started developing ways to integrate these siloed teams.

In the beginning of last decade, SAS developed SAS Add-in for Microsoft Office. The tool allows customers to run/embed SAS analytic capabilities inside Microsoft Office applications. More recently, SAS released a new version of PROC FCMP allowing users to write Python code and call, if as a function, inside SAS programs.

These advancements provide users the ability to run Python inside Excel. When I say inside, I really mean from within Excel's interface.

Before we jump to how we can do it, you may ask yourself: Why is this relevant to me? If I know SAS, I import the dataset and work with the data in SAS; If I know Python, I open a Jupyter notebook, import the data set and do my thing. Well… you are kind of right, but let me tell you a story.

The use case

Recently I worked with a customer and his business process was like this: I have a team of data scientists that is highly technical and knowledgeable in Python and SAS. Additionally, I have a team of analysts with little Python knowledge, but are always working with Excel to summarize data, create filters, graphs, etc. My teams need to communicate and collaborate. The normal chain of events follows:

  1. the Python team works on the data, and exports the results to Excel
  2. the analytics team picks up the data set, and runs SAS scripts and excel formulas

This is a problem of inefficiency for the customer. Why can't the data scientist pass his or her code to the analyst to execute it on the same project without having to wait on the Python specialist to run the code?

I know this sounds overly complicated, but as my SAS colleague Mike Zizzi concludes in his post SAS or Python? Why not use both? Using Python functions inside SAS programs, at the end of the day what matters is that you get your work done. No matter which language, software or IDE you are using. I highly recommend Mike's article if you want a deep dive on what PROC FCMP has to offer.

The process

Let's walk through a data scoring scenario similar to my customer's story. Imagine I am a SAS programmer using Excel to explore data. I am also part of a team that uses Python, creating scoring data code using analytical models developed in Python. My job is to score and analyze the data on Excel and pass the results to the service representative, so they can forward the response to the customer.

Importing data

The data set we'll work with in this example will help me analyze which customers are more likely to default on a loan. The data and all code used in this article are in the associated GitHub repository. The data dictionary for the data set is located here. First, I open the data set as seen on Sheet1 below in Excel.

Upload data to SAS

Before we jump to the coding part with SAS and Python, I need to send the data to SAS. We'll use the SAS add-in, in Excel to send data to the local server. I cover the steps in detail below.

I start by selecting the cells I want to upload to the library.

Next, I move to the SAS tab and select the Copy to SAS Server task.

A popup shows up where I confirm the selected cells.

After I click OK, I configure column, table, naming and location options.

SAS uploads the table to the requested library. Additionally, a new worksheet with the library.table name displays the results. As you can see on the image below, the sheet created follows the name WORK.IMPORTED_DATA we setup on the previous step. This represents the table in the SAS library memory. Notice, however, we are still working in Excel.

The next step is to incorporate the code sent from my teammate.

The Python code

The code our colleague sent is pure Python. I don't necessarily have to understand the code details, just what it does. The Python code below imports and scores a model and returns a score. Note: if you're attempting this in your own environment, make sure to update the hmeq_model.sav file location in the # Import model pickle file section.

def score_predictions(CLAGE, CLNO, DEBTINC,DELINQ, DEROG, LOAN, MORTDUE, NINQ,VALUE, YOJ):
	"Output: scored"
	# Imporing libraries
	import pandas as pd
	from sklearn.preprocessing import OneHotEncoder
	from sklearn.compose import ColumnTransformer
	from sklearn.externals import joblib
 
	# Create pandas dataframe with input vars
	dataset = pd.DataFrame({'CLAGE':CLAGE, 'CLNO':CLNO, 'DEBTINC':DEBTINC, 'DELINQ':DELINQ, 'DEROG':DEROG, 'LOAN':LOAN, 'MORTDUE':MORTDUE, 'NINQ':NINQ, 'VALUE':VALUE, 'YOJ':YOJ}, index=[0])
 
	X = dataset.values
 
	# Import model pickle file
	loaded_model = joblib.load("C://assets/hmeq_model.sav")
 
	# Score the input dataframe and get 0 or 1 
	scored = int(loaded_model.predict_proba(X)[0,1])
 
	# Return scored dataframe
	return scored

My SAS code calls this Python code from a SAS function defined in the next section.

The SAS code

Turning back to Excel, in the SAS Add-in side of the screen, I click on Programs. This displays a code editor, and as explained on this video, is like any other SAS code editor.

We will use this code editor to write, run and view results from our code.

The code below defines a FCMP function called Score_Python, that imports the Python script from my colleague and calls it from a SAS datastep. The output table, HMEQ_SCORED, is saved on the WORK library in SAS. Note: if you're attempting this in your own environment, make sure to update the script.py file location in the /* Getting Python file */ section.

proc fcmp outlib=work.fcmp.pyfuncs;
 
/* Defining FCMP function */
proc fcmp outlib=work.fcmp.pyfuncs;
	/* Defining name and arguments of the Python function to be called */
 
	function Score_Python(CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
		/* Python object */
		declare object py(python);
 
		/* Getting Python file  */
		rc = py.infile("C:\assets\script.py");
 
		/* Send code to Python interpreter */
		rc = py.publish();
 
		/* Call python function with arguments */
		rc = py.call("score_predictions",CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
 
		/* Pass Python results to SAS variable */
		MyFCMPResult = py.results["scored"];
 
		return(MyFCMPResult);
	endsub;
run;
 
options cmplib=work.fcmp;
 
/* Calling FCMP function from data step */
data work.hmeq_scored;
	set work._excelexport;
	scored_bad = Score_Python(CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
	put scored_bad=;
run;

 

I place my code in the editor.

We're now ready to run the code. Select the 'Running man' icon in the editor.

The output

The result represents the outcome of the Python model scoring code. Once the code completes the WORK.HMEQ_SCORED worksheet updates with a new column, scored_bad.

The binary value represents if the customer is likely (a '1') or unlikely (a '0') to default on his or her loan. I could now use any built in Excel features to filter or further analyze the data. For instance, I could filter all the customers likely to default on their loans and pass a report on to the customer management team.

Final thoughts

In this article we've explored how collaboration between teams with different skills can streamline processes for efficient data analysis. Each team focuses on what they're good at and all of the work is organized and completed in one place. It is a win-win for everyone.

Related resources

Extending Excel with Python and SAS Viya was published on SAS Users.

3月 182020
 

Let’s be honest, there is a lot of SAS content available on the web. Sometimes it gets difficult to navigate through everything to find what you need, especially if you are looking for complimentary resources.

Training budgets can be limited or already used for the year, but you’re still interested in learning a new SAS product or diving deeper into a specific subject to facilitate any current projects you are working on. Or you’re a real over-achiever (go, you!) and you’re looking to expand your personal SAS skills outside of your day-to-day work.

You start asking, “How do I find what I need?”

Don’t worry, SAS has you covered!

SAS learn & support

Let’s start with a favorite resource (in a Customer Success Manager’s opinion) – SAS’ learn and support pages. SAS recently released updated learn and support pages for SAS products. These pages provide a great overview of SAS’ product offerings, and they provide resources for those who are new to SAS or those looking to expand their knowledge. The learn and support pages cover the most current product release, information on getting started, tutorials, training courses, books, and documentation for current and past releases.

Not sure how to locate the learn and support page for the SAS product you are using? Search the SAS Product Support A to Z page and select the product of your choice.

SAS documentation

Browsing the web for resources is a great way to find answers to your SAS questions. But as mentioned previously, it can sometimes get tricky to find what you are looking for.

A great place to start your search is on the SAS documentation site. You can use the search bar to enter what you are looking for, or browse by products, titles or system requirements.

What’s new in SAS

You may have heard the saying, “There are three ways to do anything in SAS.” (Or four, or five or six!) Which raises the question, “How do I know what I’m doing is the most efficient?”

One way to stay on top of the most efficient way to do things is to stay current with your SAS knowledge. Knowing what’s new in SAS helps users know and understand what new features and enhancements are available. When a SAS product release occurs, SAS provides documentation on what’s new.

To know what’s new in the SAS release you’re using, check out the What’s New documentation. The documentation is broken into two parts: SAS 9.4 and SAS Viya 3.5. You can use the ‘Version’ tab on the left-hand side of the page to select the version currently installed at your organization.

If you are not sure what version you are running, you can run PROC PRODUCT_STATUS. This PROC will return what version numbers are running for the SAS products installed.

proc product_status;
run;

Another great resource to stay on top of what’s new from SAS is to check out SAS webinars. SAS offers live and on-demand webinars hosted by SAS experts. There are topics for every level of SAS user and every level of an organization, from SAS programmers to executives.

To attend a live webinar, select the webinar of your choice, register to attend, and you will be sent an email with the calendar invite.

If you’re interested in checking out an on-demand webinar, you can search by topic or industry to find a topic that fits what you’re looking for.

Looking for a webinar that focuses on a SAS tool? Check out the SAS Ask the Expert webinars. These are one-hour live and on-demand webinars for SAS users and administrators. The sessions cover a wide range of topics from what’s new in new releases of SAS products, to overviews on getting started, to tips and tricks that help take your SAS knowledge to the next level.

With SAS’ extensive catalog of webinars to choose from you will be a SAS pro in no time!

SAS training and education

Did you know that SAS offers free e-learning for some of our training courses? These courses are self-paced and cover a wide range of topics. With 180 days of access to these courses, it allows you to work through them at your own speed. It’s also very easy to get started!

Step 1: Select a course from the course library

Step 2: Sign into your SAS profile or create one

Step 3: Activate your product(s) and review the License Agreement

Step 4: Work through the course lessons

Step 5: Complete the course and receive your SAS digital Learn Badge and Course Completion Certificate

Leverage expertise worldwide

SAS recently released SAS Analytics Explorer. This is an interactive way to connect with other SAS professionals, expand your SAS knowledge, and access private SAS events and resource all while earning points that can be exchanged for rewards.

Are you up for the challenge? No really, are you? The SAS Analytics Explorer has fun and educational challenges that allow you to showcase your SAS skills to climb the ranks in the network. Show off your SAS talent and get some cool rewards while you’re at it!

Interested in joining? Fill out the form on the bottom of the SAS Analytics Explorer page to request an invitation.

Don’t forget about the SAS Communities! Connect with other SAS professionals and experts to ask questions, assist other SAS professionals with their questions, connect with users, and see what’s going on at SAS.

You can also connect with SAS on our website using the chat feature. We love SAS users, and we are here to help you!

Tips and resources for making the most of your SAS experience was published on SAS Users.

3月 162020
 

As a long-time SAS 9 programmer, I typically accomplish my data preparation tasks through some combination of the DATA Step, Proc SQL, Proc Transpose and some housekeeping procs like Proc Contents and Proc Datasets. With the introduction of SAS Viya, SAS released a new scripting language called CASL – a language that interacts with SAS Cloud Analytics Services (CAS).

CASL statements include actions, logically organized into action sets based on common functionality. For example, the Table action set allows you to load a table in CAS, view table metadata, change table metadata such as drop or rename a column, fetch (print) sample rows, save or drop a table from CAS, among other things. Steven Sober provides a great overview of CASL in his 2019 SAS Global Forum paper.

Learning CASL is a good idea assuming you want to leverage the power of CAS, because CASL is the language of CAS. While you can continue to use Viya-enabled procs for many of your data processing needs, certain new functionality is only available through CASL. CAS actions also provide a more granular access to options which otherwise may not be available as procedure options. But old habits die hard, and for a while I found myself bouncing between SAS 9.4 and CASL. I'd pull the data down from CAS just to get it to process in the SAS Programming Runtime Environment (SPRE) because it took less effort than figuring out how to get it done properly in CAS.

Then I started a project with a seriously large data set and quickly hit the limit on how much data I could pull down to process in SPRE. And although I could adjust the DATALIMIT option to retrieve more data than the default limit, I was wasting time and server resources unnecessarily moving the data between CAS and SPRE. All this, just so I could process the data “old school.”

I decided to challenge myself to do ALL my data preparation in CASL. I love a good challenge! I started collecting various useful CASL code snippets. In this post, I am sharing the tidbits I’ve accumulated, along with some commentary. Note, you can execute CAS actions from multiple clients, including SAS, Python, R, Lua and Java. Since my objective was to transition from traditional SAS code to CASL, I’ll focus solely on CAS actions from the SAS client perspective. While I used SAS Viya 3.5 for this work, most of the code snippets should work on prior versions as well.

The sections below cover: how to submit CASL code; loading, saving, dropping and deleting data; exploring data; table metadata management; and data transformation. Feel free to jump ahead to any section of interest.

How do you submit CASL code?

You use PROC CAS to submit CASL code from a SAS client. For example:

proc cas;
   <cas action 1>;
   <cas action 2>;
   …;
quit;

Similarly to other interactive procs that use run-group processing, separate CAS actions by run; statements. For example:

proc cas;
   <cas action 1>;
   run;
   <cas action 2>;
   run;
quit;

In fact, you can have the entire data preparation and analysis pipeline wrapped inside a single PROC CAS, passing data and results in the form of CASL variables from one action to the next. It can really be quite elegant.

Moving Data Using PROC CAS

Loading SASHDAT data in CAS

Your data must be in the SASHDAT format for CAS to process it. To load a SASHDAT table into CAS, use the table.loadtable CAS action. The code below assumes your SASHDAT table is saved to a directory on disk associated with your current active caslib, and you are loading it into the same caslib. (This usually occurs when you already performed the conversion to SASHDAT format, but the data has been unloaded. If you are just starting out and are wondering how to get your data into the SASHDAT format in the first place, the next session covers it, so keep reading.)

proc cas; 
     table.loadtable / path="TABLE_NAME.sashdat" casOut="TABLE_NAME"; 
     table.promote /name="TABLE_NAME" drop=true; 
quit;

The table.promote action elevates your newly loaded CAS table to global scope, making it available to other CAS sessions, including any additional sessions you start, or to other users assuming they have the right privileges. I can’t tell you how many times I forgot to promote my data, only to find that my hard-earned output table disappeared because I took a longer coffee break than expected! Don’t forget to promote or save off your data (or both, to be safe).

If you are loading from a directory other than the one associated with your active caslib, modify the path= statement to include the relative path to the source directory – relative to your active caslib. If you are looking to load to a different caslib, modify the casOut= statement by placing the output table name and library in curly brackets. For example:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME" caslib="CASLIB2"};
    table.promote /name="TABLE_NAME" drop=true;
quit;

You can also place a promote=true option inside the casOut= curly brackets instead of calling the table.promote action, like so:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" 
                      casOut={name="TABLE_NAME" caslib="CASLIB2" promote=true};
quit;

Curly brackets are ubiquitous in CASL (and quite unusual for SAS 9.4). If you take away one thing from this post, make it “watch your curly brackets.”

Loading SAS7BDAT, delimited data, and other file formats in CAS

If you have a SAS7BDAT file already on disk, load it in CAS with this code:

proc cas;
    table.loadtable /path="TABLE_NAME.sas7bdat" casout="TABLE_NAME" 
                     importoptions={filetype="basesas"};
quit;

Other file formats load similarly – just use the corresponding filetype= option to indicate the type of data you are loading, such as CSV, Excel, Document (.docx, .pdf, etc.), Image, Video, etc. The impressive list of supported file types is available here.

proc cas;
    table.loadtable / path="TABLE_NAME.csv" casout="TABLE_NAME" 
                      importoptions={filetype="csv"};
    run;
quit;

You can include additional parameters inside the importOptions= curly brackets, which differ by the file type. If you don’t need any additional parameters, use the filetype=”auto” and let CAS determine the best way to load the file.

When loading a table in SAS7BDAT, delimited or some other format, the table.loadtable action automatically converts your data to SASHDAT format.

Loading data in CAS conditionally

Imagine you are building a script to load data conditionally – only if it’s not already loaded. This is handy if you have a reason to believe the data might already be in CAS. To check if the data exists in CAS and load conditionally, you can leverage the table.tableExists action in combination with if-then-else logic. For example:

proc cas;
    table.tableExists result =r / name="TABLE_NAME";
    if r=0  then do;
        table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME"};
        table.promote /name="YOUR_TABLE_NAME" drop=true;
    end;
    else print("Table already loaded");
quit;

Notice that the result=r syntax captures the result code from the tableExists action, which is evaluated before the loadtable and promote actions are executed. If the table is already loaded in CAS, “Table already loaded” is printed to the log. Otherwise, the loadtable and promote actions are executed.

The ability to output CAS action results to a CASL variable (such as result=r in this example) is an extremely powerful feature of CASL. I include another example of this further down, but you can learn more about this functionality from documentation or this handy blog post.

Saving your CAS data

Let’s pretend you’ve loaded your data, transformed it, and promoted it to global scope. You or your colleagues can access it from other CAS sessions. You finished your data preparation, right? Wrong. As the header of this section suggests, you also need to save your prepared CAS data. Why? Because up to this point, your processed and promoted data exists only in memory. You will lose your work if your SAS administrator reboots the server or restarts the CAS controller. If you need to quickly reload prepared data, you must back it up to a caslib’s data source. See the CAS data lifecycle for more details.

To save off CAS data, naturally, you use the table.save action. For example:

proc cas;
    table.save / table="TABLE_NAME" name="TABLE_NAME.sashdat" replace=true;
quit;

In this example, you save off the CAS table to disk as a SASHDAT file, defaulting to the location associated with your active caslib. You can modify the table.save parameters to save or export the data to an alternative data storage solution with full control over the file format (including but not limited to such popular options as HDFS, Oracle, SQL Server, Salesforce, Snowflake and Teradata), compression, partitioning and other options.

Dropping and deleting data

To drop a table from CAS, execute a table.droptable action. For example:

proc cas;
    table.droptable / name="TABLE_NAME" quiet=true;
quit;

The quiet=true option prevents CAS from generating an error if the table does not exist in CAS. Dropping a table deletes it from memory. It’s a good practice to drop tables you no longer need, particularly the one you have promoted. Local-scope tables disappear on their own when the session expires, whereas global tables will stay in memory until they are unloaded.

Dropping a table does not delete the underlying source data. To delete the source of a CAS table, use the table.deleteSource action. For example:

proc cas;
    table.deletesource / source="TABLE_NAME.sashdat" quiet=true;
quit;

Exploring Data Using PROC CAS

After taking a close look at moving the data using PROC CAS, let’s look at some useful ways to start exploring and manipulating CAS data.

Fetching sample data

When preparing data, I find it useful to look at sample data. The table.fetch action is conceptually similar to PROC PRINT and, by default, outputs the first 20 rows of a CAS table:

proc cas;
table.fetch / table="Table_Name";
quit;

You can modify the table.fetch options to control which observations and variables to display and how to display them. For example:

proc cas;
table.fetch / table={name="TABLE_NAME" where="VAR1 in ('value1','value2')"},              /*1*/
	      orderby={{name="VAR1"},                                                     /*2*/
                             {name="VAR2", order="descending"}
                             },	
	     fetchvars={{name="VAR1", label="Variable 1"},                                /*3*/
	                     {name="VAR2", label="Variable 2"}, 
 		             {name="VAR3", label="Variable 3", format=comma12.1}
                             },
	     to=50,                                                                       /*4*/
	     index=false;							          /*5*/
quit;

In the code snippet above:

  • #1 – where= statement limits the records to those meeting the where criteria.
  • #2 – orderby= option defines the sort order. Ascending is the default and is not required. If sorting by more than one variable, put them in a list inside curly brackets, as shown in this example. If a list item has a subparameter (such as order= here), encase each item in curly brackets.
  • #3 – fetchvars= option defines the variables to print as well as their display labels and formats. If you select more than one variable, put them in a list inside curly brackets, as shown here. And again, if a list item includes a subparmeter, then enclose each list item in curly brackets.
  • #4 – to= option defines the number of rows to print.
  • #5 – index= false option deactivates the index column in the output (the default is index=true). This is similar to the noobs option in PROC PRINT.

As mentioned earlier, make sure to watch your curly brackets!

Descriptive statistics and variable distributions

The next step in data exploration is looking at descriptive statistics and variable distributions. I would need a separate blog post to cover this in detail, so I only touch upon a few of the many useful CAS actions.

To look at statistics for numeric variables, use the simple.summary action, which computes standard descriptive statistics, such as minimum, maximum, mean, standard deviation, number missing, and so on. For example:

proc cas;
    simple.summary / table="TABLE_NAME";
quit;

Among its other features, the simple.summary action allows analysis by one or more group-by variables, as well as define the list of desired descriptive statistics. For example:

proc cas;
simple.summary / table={name="TABLE_NAME", groupBy="VAR1", vars={"NUMVAR1","NUMVAR2”}},
                 subSet={"MAX", "MIN", "MEAN", "NMISS"};
quit;

Another useful action is simple.topK, which selects the top K and bottom K values for variables in a data set, based on a user-specified ranking order. The example below returns the top 5 and bottom 5 values for two variables based on their frequency:

proc cas;
simple.topk / table="TABLE_NAME" 
              aggregator="N",                        
              inputs={"VAR1","VAR2"},
              topk=5,
              bottomk=5;
quit;

Simple is a rich action set with heaps of useful options covered in the documentation.

You may be wondering – what about crosstabs and frequency tables? The simple action set includes freq and crosstab actions. In addition, the action closely imitating the functionality of the beloved PROC FREQ is freqTab.freqTab. For example, the code snippet below creates frequency tables for VAR1, VAR2 and a crosstab of the two.

proc cas;
freqtab.freqtab / table="TABLE_NAME"
                  tabulate={"VAR1","VAR2",
                  {vars={"VAR1","VAR2"}}
                  };
quit;

Managing CAS Table Variables

Changing table metadata

One of the basic tasks after exploring your data is changing table metadata, such as dropping unnecessary variables, renaming tables and columns, and changing variable formats and labels. The table.altertable action helps you with these housekeeping tasks. For example, the code snippet below renames the table, drops two variables and renames and changes labels for two variables:

proc cas;
    table.altertable / table="TABLE_NAME" rename="ANALYTIC_TABLE"
                       drop={"VAR1",”VAR2”}
                       columns={{name="VAR3" rename="ROW_ID" label="Row ID"},
                                {name="VAR4" rename="TARGET" label="Outcome Variable"}
                                }
                       ;
quit;

Outputting variable list to a data set

Another useful trick I frequently use is extracting table columns as a SAS data set. Having a list of variables as values in a data set makes it easy to build data-driven scripts leveraging macro programming. The code snippet below provides an example. Here we encounter another example of capturing action result as a CASL variable and using it in further processing – I can’t stress enough how helpful this is!

proc cas;
    table.columninfo r=collinfo / table={name="TABLE_NAME"};       /*1*/
    collist=collinfo["ColumnInfo"];                                /*2*/
    saveresult collist casout="collist";                           /*3*/
quit;

In the snippet above:

  • #1 - the columninfo action collects column information. The action result is passed to a CASL variable collinfo. Notice, instead of writing out result=, I am using an alias r =.
  • #2 - the portion of the a CASL variable collinfo containing column data is extracted into another CASL variable collist.
  • #3 - the saveresult statement sends the data to a CAS table collist. If you want to send the results to a SAS7BDAT data set, replace casout= with dataout=, and provide the library.table_name information.

Transforming the Data

Lastly, let’s look at some ways to use CAS actions to transform your data. Proc SQL and DATA step are the two swiss-army knives in SAS 9 developers’ toolkit that take care of 90% of the data prep. The good news is you can execute both DATA Step and SQL directly from PROC CAS. In addition, call the transpose action to transpose your data.

Executing DATA Step code

The dataStep.runCode action enables you to run DATA step code directly inside PROC CAS. You must enclose your DATA step code in quotation marks after the code= statement. For example:

proc cas;
    dataStep.runCode /
    code="
        data table_name;
        set table_name;
        run;
        ";
quit;

Running DATA step code in CAS allows access to sophisticated group-by processing and the use of such popular programming techniques as first- and last-dot. Refer to the documentation for important nuances related to processing in a distributed, multi-threaded environment of CAS.

Executing FedSQL

To run SQL in CASL, use the fedSQL.execDirect action. Enclose the SQL query in quotation marks following the query= statement. Optionally, you can use the casout= statement to save the results to a CAS table. For example:

proc cas;
    fedsql.execDirect/
    query=
         "
          select  *
          from TABLE1 a  inner join TABLE2 b
          on a.VAR1 = b.VAR1
         "
    casout={name="TABLE3", replace=True};
quit;

Similarly to DATA step, be aware of the many nuances when executing SQL in CAS via FedSQL. Brian Kinnebrew provides an excellent overview of FedSQL in his SAS Communities article, and the documentation has up-to-date details on the supported functionality.

Transposing data

Transposing data in PROC CAS is a breeze. The example below uses transpose.transpose action to restructure rows into columns.

proc cas;
    transpose.transpose /
          table={name="TABLE_NAME", groupby={"VAR1"}}
          transpose={"VAR2"}
          id={"VAR3"}
          prefix="Prefix"
    casout={name="TRANSPOSED" replace=true};
run;

You can transpose multiple variables in the same transpose action. Simply place additional variables inside the curly brackets following transpose=, in quotes, separated by a comma.

Conclusion

PROC CAS is a wrapper procedure enabling you to leverage SAS’ new programming language - CASL. CASL enables you to submit CAS actions directly to SAS Cloud Analytic Services engine from a SAS client. This post provided examples of loading, managing, exploring and transforming your data through CAS actions. Certain new functionality in CAS is only available through CAS actions, so getting comfortable with CASL makes sense. Fear not, and let the curly brackets guide the way 😊.

Acknowledgement

I would like to thank Brian Kinnebrew for his thoughtful review and generous help on my journey learning CASL.

Challenge accepted: Learning data prep in CASL was published on SAS Users.

12月 122019
 

Bringing the power of SAS to your Python scripts can be a game changer. An easy way to do that is by using SASPy, a Python interface to SAS allowing Python developers to use SAS® procedures within Python. However, not all SAS procedures are included in the SASPy library. So, what do you do if you want to use those excluded procedures? Easy! The SASPy library contains functionality enabling you to add SAS procedures to the SASPy library. In this post, I'll explain the process.

The basics for adding procedures are covered in the Contributing new methods section in the SASPy documentation. To further assist you, this post expands upon the steps, providing step-by-step details for adding the STDIZE procedure to SASPy. For a hands-on application of the use case refer the blog post Machine Learning with SASPy: Exploring and Preparing your data - Part 3.

This is your chance to contribute to the project! Whereas, you can choose to follow the steps below as a one-off solution, you also have the choice to share your work and incorporate it in the SASPy repository.

Prerequisites

Before you add a procedure to SASPy, you need to perform these prerequisite steps:

  1. Identify the SAS product associated with the procedure you want to add, e.g. SAS/STAT, SAS/ETS, SAS Enterprise Miner, etc.
  2. Locate the SASPy file (for example, sasstat.py, sasets.py, and so on) corresponding to the product from step 1.
  3. Ensure you have a current license for the SAS product in question.

Adding a SAS procedure to SASPy

SASPy utilizes Python Decorators to generate the code for adding SAS procedures. Roughly, the process is:

  1. define the procedure
  2. generate the code to add
  3. add the code to the proper SASPy file
  4. (optional)create a pull request to add the procedure to the SASPy repository

Below we'll walk through each step in detail.

Create a set of valid statements

Start a new python session with Jupyter and create a list of valid arguments for the chosen procedure. You determine the arguments for the procedure by searching for your procedure in the appropriate SAS documentation. For example, the PROC STDIZE arguments are documented in the SAS/STAT® 15.1 User's Guide, in the The STDIZE Procedure section, with the contents:

The STDIZE procedure

 
 
 
 
 
 
 
 
 
 

For example, I submitted the following command to create a set of valid arguments for PROC STDIZE:

lset = {'STDIZE', 'BY', 'FREQ', 'LOCATION', 'SCALE', 'VAR', 'WEIGHT'}

Call the doc_convert method

The doc_convert method takes two arguments: a list of valid statements (method_stmt) and the procedure name (stdize).

import saspy
 
print(saspy.sasdecorator.procDecorator.doc_convert(lset, 'STDIZE')['method_stmt'])
print(saspy.sasdecorator.procDecorator.doc_convert(lset, 'STDIZE')['markup_stmt'])

The command generates the method call and the docstring markup like the following:

def STDIZE(self, data: [SASdata', str] = None,
   by: [str, list] = None,
   location: str = None,
   scale: str = None,
   stdize: str = None,
   var: str = None,
   weight: str = None,
   procopts: str = None,
   stmtpassthrough: str = None,
   **kwargs: dict) -> 'SASresults':
   Python method to call the STDIZE procedure.
 
   Documentation link:
 
   :param data: SASdata object or string. This parameter is required.
   :parm by: The by variable can be a string or list type.
   :parm freq: The freq variable can only be a string type.
   :parm location: The location variable can only be a string type.
   :parm scale: The scale variable can only be a string type.
   :parm stdize: The stdize variable can be a string type.
   :parm var: The var variable can only be a string type.
   :parm weight: The weight variable can be a string type.
   :parm procopts: The procopts variable is a generic option avaiable for advanced use It can only be a string type.
   :parm stmtpassthrough: The stmtpassthrough variable is a generic option available for advanced use. It can only be a string type.
   :return: SAS Result Object

Update SASPy product file

We'll take the output and add it to the appropriate product file (sasstat.py in this case). When you open this file, be sure to open it with administrative privileges so you can save the changes. Prior to adding the code to the product file, perform the following tasks:

  1. add @procDecorator.proc_decorator({}) before the function definition
  2. add the proper documentation link from the SAS Programming Documentation site
  3. add triple quotes ("""") to comment out the second section of code
  4. include any additional details others might find helpful

The following output shows the final code to add to the sasstat.py file:

@procDecorator.proc_decorator({})
def STDIZE(self, data: [SASdata', str] = None,
   by: [str, list] = None,
   location: str = None,
   scale: str = None,
   stdize: str = None,
   var: str = None,
   weight: str = None,
   procopts: str = None,
   stmtpassthrough: str = None,
   **kwargs: dict) -> 'SASresults':
   """
   Python method to call the STDIZE procedure.
 
   Documentation link:
   https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statug_stdize_toc.htm&locale=en
   :param data: SASdata object or string. This parameter is required.
   :parm by: The by variable can be a string or list type.
   :parm freq: The freq variable can only be a string type.
   :parm location: The location variable can only be a string type.
   :parm scale: The scale variable can only be a string type.
   :parm stdize: The stdize variable can be a string type.
   :parm var: The var variable can only be a string type.
   :parm weight: The weight variable can be a string type.
   :parm procopts: The procopts variable is a generic option avaiable for advanced use It can only be a string type.
   :parm stmtpassthrough: The stmtpassthrough variable is a generic option available for advanced use. It can only be a string type.
   :return: SAS Result Object
   """

Update sasdecorator file with the new method

Alter the sasdecorator.py file by adding stdize in the code on line 29, as shown below.

if proc in ['hplogistic', 'hpreg', 'stdize']:

Important: The update to the sasdecorator file is only a requirement when you add a procedure with no plot options. The sasstat.py library assumes all procedures produce plots. However, PROC STDIZE does not include them. So, you should perform this step ONLY when your procedure does not include plot options. This will more than likely change in a future release, so please follow the Github page for any updates.

Document a test for your function

Make sure you write at least one test for the procedure. Then, add the test to the appropriate testing file.

Finally

Congratulations! All done. You now have the knowledge to add even more procedures in the future.

After you add your procedure, I highly recommend you contribute your procedure to the SASPy GitHub library. To contribute, follow the outlined instructions on the Contributing Rules GitHub page.

Adding SAS® procedures to the SASPy interface to Python was published on SAS Users.