Open Source

12月 122019
 

Parts 1 and 2 of this blog post discussed exploring and preparing your data using SASPy. To recap, Part 1 discussed how to explore data using the SASPy interface with Python. Part 2 continued with an explanation of how to prepare your data to use it with a machine-learning model. This final installment continues the discussion about preparation by explaining techniques for normalizing your numerical data and one-hot encoding categorical variables.

Normalizing your data

Some of the numerical features in the examples from parts 1 and 2 have different ranges. The variable Age spans from 0-100 and Hours_per_week from 0-80. These ranges affect the calculation of each feature when you apply them to a supervised learner. To ensure the equal treatement of each feature, you need to scale the numerical features.

The following example uses the SAS STDIZE procedure to scale the numerical features. PROC STDIZE is not a standard procedure available in the SASPy library.  However, the good news is you can add any SAS procedure to SASPy! This feature enables Python users to become a part of the SAS community by contributing to the SASPy library and giving them the chance to use the vast number of powerful SAS procedures. To add PROC STDIZE to SASPy, see the instructions in the blog post Adding SAS procedures to the SASPy interface to Python.

After you add the STDIZE to SASPy, run the following code to scale the numerical features. The resulting values will be between 0 and 1.

# Creating a SASPy stat objcect
stat = sas.sasstat()
# Use the stdize function that was added to SASPy to scale our features
stat_result = stat.stdize(data=cen_data_logTransform,
                         procopts = 'method=range out=Sasuser.afternorm',
			 var = 'age education_num capital_gain capital_loss hours_per_week')

To use the STDIZE procedure in SASPy we need to specify the method and the output data set in the statement options. For this we use the "procopts" option and we specify range as our method and our "out" option to a new SAS data set, afternorm.

After running the STDIZE procedure we assign the new data set into a SAS data object.

norm_data = sas.sasdata('afternorm', libref='SASuser')

Now let's verify if we were successful in transforming our numerical features

norm_data.head(obs=5)


 

 

 

 

 
The output looks great! You have normalized the numerical features. So, it's time to tackle the last data-preparation step.

One-Hot Encoding

Now that you have adjusted the numerical features, what do you do with the categorical features? The categories Relationship, Race, Sex, and so on are in string format. Statistical models cannot interpret these values, so you need to transform the values from strings into a numerical representation. The one-hot encoding process provides the transformation you need for your data.

To use one-hot encoding, use the LOGISTIC procedure from the SASPy Stat class. SASPy natively includes the LOGISTIC procedure, so you can go straight to coding. To generate the syntax for the code below, I followed the instructions from Usage Note 23217: Saving the coded design matrix of a model to a data set.

stat_proc_log = stat.logistic(data=norm_data, procopts='outdesign=SASuser.test1 outdesignonly',
            cls = "workclass education_level marital_status occupation relationship race sex native_country / param=glm",
	    model = "age = workclass education_level marital_status occupation relationship race sex native_country / noint")

To view the results from this code, create a SAS data object from the newly created data set, as shown in this example:

one_hot_data = sas.sasdata('test1', libref='SASuser')
display(one_hot_data.head(obs=5))

The output:

 

 

 

 

Our data was successfully one-hot encoded! For future reference, due to SAS’ analytical power, this step is not required. When including a categorical feature in a class statement the procedure automatically generates a design matrix with the one-hot encoded feature. For more information, I recommend reading this post about different ways to create a design matrix in SAS.

Finally

You made it to the end of the journey! I hope everyone who reads these blogs can see the value that SASPy brings to the machine-learning community. Give SASPy  a try, and you'll see the power it can bring to your SAS solutions.

Stay curious, keep learning, and (most of all) continue innovating.

Machine Learning with SASPy: Exploring and Preparing your data - Part 3 was published on SAS Users.

12月 122019
 

Bringing the power of SAS to your Python scripts can be a game changer. An easy way to do that is by using SASPy, a Python interface to SAS allowing Python developers to use SAS® procedures within Python. However, not all SAS procedures are included in the SASPy library. So, what do you do if you want to use those excluded procedures? Easy! The SASPy library contains functionality enabling you to add SAS procedures to the SASPy library. In this post, I'll explain the process.

The basics for adding procedures are covered in the Contributing new methods section in the SASPy documentation. To further assist you, this post expands upon the steps, providing step-by-step details for adding the STDIZE procedure to SASPy. For a hands-on application of the use case refer the blog post Machine Learning with SASPy: Exploring and Preparing your data - Part 3.

This is your chance to contribute to the project! Whereas, you can choose to follow the steps below as a one-off solution, you also have the choice to share your work and incorporate it in the SASPy repository.

Prerequisites

Before you add a procedure to SASPy, you need to perform these prerequisite steps:

  1. Identify the SAS product associated with the procedure you want to add, e.g. SAS/STAT, SAS/ETS, SAS Enterprise Miner, etc.
  2. Locate the SASPy file (for example, sasstat.py, sasets.py, and so on) corresponding to the product from step 1.
  3. Ensure you have a current license for the SAS product in question.

Adding a SAS procedure to SASPy

SASPy utilizes Python Decorators to generate the code for adding SAS procedures. Roughly, the process is:

  1. define the procedure
  2. generate the code to add
  3. add the code to the proper SASPy file
  4. (optional)create a pull request to add the procedure to the SASPy repository

Below we'll walk through each step in detail.

Create a set of valid statements

Start a new python session with Jupyter and create a list of valid arguments for the chosen procedure. You determine the arguments for the procedure by searching for your procedure in the appropriate SAS documentation. For example, the PROC STDIZE arguments are documented in the SAS/STAT® 15.1 User's Guide, in the The STDIZE Procedure section, with the contents:

The STDIZE procedure

 
 
 
 
 
 
 
 
 
 

For example, I submitted the following command to create a set of valid arguments for PROC STDIZE:

lset = {'STDIZE', 'BY', 'FREQ', 'LOCATION', 'SCALE', 'VAR', 'WEIGHT'}

Call the doc_convert method

The doc_convert method takes two arguments: a list of valid statements (method_stmt) and the procedure name (stdize).

import saspy
 
print(saspy.sasdecorator.procDecorator.doc_convert(lset, 'STDIZE')['method_stmt'])
print(saspy.sasdecorator.procDecorator.doc_convert(lset, 'STDIZE')['markup_stmt'])

The command generates the method call and the docstring markup like the following:

def STDIZE(self, data: [SASdata', str] = None,
   by: [str, list] = None,
   location: str = None,
   scale: str = None,
   stdize: str = None,
   var: str = None,
   weight: str = None,
   procopts: str = None,
   stmtpassthrough: str = None,
   **kwargs: dict) -> 'SASresults':
   Python method to call the STDIZE procedure.
 
   Documentation link:
 
   :param data: SASdata object or string. This parameter is required.
   :parm by: The by variable can be a string or list type.
   :parm freq: The freq variable can only be a string type.
   :parm location: The location variable can only be a string type.
   :parm scale: The scale variable can only be a string type.
   :parm stdize: The stdize variable can be a string type.
   :parm var: The var variable can only be a string type.
   :parm weight: The weight variable can be a string type.
   :parm procopts: The procopts variable is a generic option avaiable for advanced use It can only be a string type.
   :parm stmtpassthrough: The stmtpassthrough variable is a generic option available for advanced use. It can only be a string type.
   :return: SAS Result Object

Update SASPy product file

We'll take the output and add it to the appropriate product file (sasstat.py in this case). When you open this file, be sure to open it with administrative privileges so you can save the changes. Prior to adding the code to the product file, perform the following tasks:

  1. add @procDecorator.proc_decorator({}) before the function definition
  2. add the proper documentation link from the SAS Programming Documentation site
  3. add triple quotes ("""") to comment out the second section of code
  4. include any additional details others might find helpful

The following output shows the final code to add to the sasstat.py file:

@procDecorator.proc_decorator({})
def STDIZE(self, data: [SASdata', str] = None,
   by: [str, list] = None,
   location: str = None,
   scale: str = None,
   stdize: str = None,
   var: str = None,
   weight: str = None,
   procopts: str = None,
   stmtpassthrough: str = None,
   **kwargs: dict) -> 'SASresults':
   """
   Python method to call the STDIZE procedure.
 
   Documentation link:
   https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statug_stdize_toc.htm&locale=en
   :param data: SASdata object or string. This parameter is required.
   :parm by: The by variable can be a string or list type.
   :parm freq: The freq variable can only be a string type.
   :parm location: The location variable can only be a string type.
   :parm scale: The scale variable can only be a string type.
   :parm stdize: The stdize variable can be a string type.
   :parm var: The var variable can only be a string type.
   :parm weight: The weight variable can be a string type.
   :parm procopts: The procopts variable is a generic option avaiable for advanced use It can only be a string type.
   :parm stmtpassthrough: The stmtpassthrough variable is a generic option available for advanced use. It can only be a string type.
   :return: SAS Result Object
   """

Update sasdecorator file with the new method

Alter the sasdecorator.py file by adding stdize in the code on line 29, as shown below.

if proc in ['hplogistic', 'hpreg', 'stdize']:

Important: The update to the sasdecorator file is only a requirement when you add a procedure with no plot options. The sasstat.py library assumes all procedures produce plots. However, PROC STDIZE does not include them. So, you should perform this step ONLY when your procedure does not include plot options. This will more than likely change in a future release, so please follow the Github page for any updates.

Document a test for your function

Make sure you write at least one test for the procedure. Then, add the test to the appropriate testing file.

Finally

Congratulations! All done. You now have the knowledge to add even more procedures in the future.

After you add your procedure, I highly recommend you contribute your procedure to the SASPy GitHub library. To contribute, follow the outlined instructions on the Contributing Rules GitHub page.

Adding SAS® procedures to the SASPy interface to Python was published on SAS Users.

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

9月 192019
 

Recent updates to SAS Grid Manager introduced many interesting new features, including the ability to handle more diverse workloads. In this post, we'll take a look at the steps required to get your SAS Grid Manager environment set up accept jobs from outside of traditional SAS clients. We'll demonstrate the process of submitting Python code for execution in the SAS Grid.

Preparing your SAS Grid

Obviously, we need a SAS Grid Manager (SAS 9.4 Maintenance 6 or later) environment to be installed and configured. Once the grid is deployed, there's not a whole lot more to do on the SAS side. SAS Workspace Servers need to be configured for launching on the grid by converting them to load balanced Workspace Servers, changing the algorithm to 'Grid', and then selecting the relevant checkbox in SASApp – Logical Workspace Server properties to launch on the grid as shown below.

The only other things that might need configuring are, if applicable, .authinfo files and Grid Option Sets. Keep reading for more information on these.

Preparing your client machine

In this example scenario, the client is the Windows desktop machine where I will write my Python code. SAS is not installed here. Rather, we will deploy and use Jupyter Notebook as our IDE to write Python code, which we'll then submit to the SAS Grid. We can get Jupyter Notebook by installing Anaconda, which is a free, bundled distribution of Python and R that comes with a solid package management system. (The following steps are courtesy of my colleague. Allan Tham.)

First, we need to download and install Anaconda.

Once deployed, we can open Anaconda Navigator from the Start menu and from there, we can launch Jupyter Notebook and create a notebook.

Now it's time to configure the connection from Python to our SAS environment. To do this, we use the SAS-developed open-source SASPy module for Python, which is responsible for converting Python code to SAS code and running it in SAS. Installing SASPy is simple. First, we need to download and install a Java Runtime Environment as a prerequisite. After that, we can launch Anaconda Prompt (a command line interface) from the Start menu and use pip to install the SASPy package.

pip install saspy

Connection parameters are defined in a file named sascfg.py. In fact, the best practice is to use sascfg.py as a template, and create a copy of it, which should be named sascfg_personal.py. SASPy will read and look for connections in both files. For the actual connection parameters, we need to specify the connection method. This depends largely on your topology.

In my environment, I used a Windows client to connect to a Linux server using a standard IOM connection. The most appropriate SASPy configuration definition is therefore 'winiomlinux', which relies on a Java-based connection method to talk to the SAS workspace. This needs to be defined in the sascfg_personal.py file.

SAS_config_names=['winiomlinux']

We also need to specify the parameters for this connection definition as shown below.

# build out a local classpath variable to use below for Windows clients   CHANGE THE PATHS TO BE CORRECT FOR YOUR INSTALLATION 
cpW  =  "C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.svc.connection.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\log4j.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.security.sspi.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.core.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\saspyiom.jar"

Note that we are referencing a number of JAR files in the classpath that normally come with a SAS deployment. Since we don't have SAS on the client machine, we can copy these from the Linux SAS server. In my lab, these were copied from /SASDeploymentManager/9.4/products/deploywiz__94526__prt__xx__sp0__1/deploywiz/.

Further down in the file, we can see the connection parameters to the SAS environment. Update the path to the Java executable, the SAS compute server host, and the port for the Workspace Server.

winiomlinux = {'java'   : 'C:\\Program Files\\Java\\jre1.8.0_221\\bin\\java',
            'iomhost'   : 'sasserver01.mydemo.sas.com',
            'iomport'   : 8591,
            'encoding'  : 'latin1',
            'classpath' : cpW
            }

Note: thanks to a recent contribution from the community, SASPy now also supports a native Windows connection method that doesn't require Java. Instead, it uses the SAS Integration Technologies client -- a component that is free to download from SAS. You'll already have this component if you've installed SAS Enterprise Guide, SAS Add-In for Microsoft Office, or Base SAS for Windows on your client machine. See instructions for this configuration in the SASPy documentation.

With the configuration set, we can import the SASPy module in our Jupyter Notebook.

Although we're importing sascfg_personal.py explicitly here, we can just call sascfg.py, as SASPy actually checks for sascfg_personal.py and uses that if it finds it. If not, it uses the default sascfg.py.

To connect to the grid from Jupyter Notebook using SASPy, we need to instantiate a SASsession object and specify the configuration definition (e.g. winiomlinux). You'll get prompted for credentials, and then a message indicating a successful connection will be displayed.

It's worth nothing that we could also specify the configuration file to use when we start a session here by specifying:

sas = saspy.SASsession(cfgfile='/sascfg_personal.py')

Behind the scenes, SAS will find an appropriate grid node on which to launch a Workspace Server session. The SASPy module works in any SAS 9.4 environment, regardless of whether it is a grid environment or not. It simply runs in a Workspace Server; in a SAS Grid environment, it just happens to be a Grid-launched Workspace Server.

Running in the Grid

Now let's execute some code in the Workspace Server session we have launched. Note that not all Python methods are supported by SASPy, but because the module is open source, anyone can add to or update the available methods. Refer to the API doc for more information.

In taking another example from Allan, let's view some of the content in a SASHELP data set.

The output is in ODS, which is converted (by SASPy) to a Pandas data frame to display it in Jupyter Notebook.

Monitoring workloads from SAS Grid Manager

SAS Workload Orchestrator Web interface allows us to view the IOM connections SASPy established from our client machine. We can see the grid node on which the Workspace Server was launched, and view some basic information. The job will remain in a RUNNING state until the connection is terminated (i.e. the SASsession is closed by calling the endsas() method. By the same token, creating multiple SASsessions will result in multiple grid jobs running.

To see more details about what actually runs, Workspace Server logs need to first be enabled. When we run myclass.head(), which will display the first 5 rows of the data set, we see the following written to Workspace Server log.

2019-08-14T02:47:34,713 INFO  [00000011] :sasdemo - 239        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
2019-08-14T02:47:34,713 INFO  [00000011] :sasdemo - 239      ! ods graphics on / outputfmt=png;
2019-08-14T02:47:34,779 INFO  [00000011] :sasdemo - NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
2019-08-14T02:47:34,780 INFO  [00000011] :sasdemo - 240        ;*';*";*/;
2019-08-14T02:47:34,782 INFO  [00000045] :sasdemo - 241        data _head ; set sashelp.prdsale (obs=5 ); run;
2019-08-14T02:47:34,782 INFO  [00000045] :sasdemo -
2019-08-14T02:47:34,783 INFO  [00000045] :sasdemo - NOTE: There were 5 observations read from the data set SASHELP.PRDSALE.
2019-08-14T02:47:34,784 INFO  [00000045] :sasdemo - NOTE: The data set WORK._HEAD has 5 observations and 5 variables.
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo - NOTE: DATA statement used (Total process time):
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -       real time           0.00 seconds
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -       cpu time            0.00 seconds
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -

We can see the converted code, which in this case was a data step which creates a work table based on PRDSALE table with obs=5. Scrolling down, we also see the code that prints the output to ODS and converts it to a data frame.

Additional Options

Authinfo files

The sascfg_personal.py file has an option for specifying an authkey, which is an identifier that maps to a set of credentials in an .authinfo file (or _authinfo on Windows). This can be leveraged to eliminate the prompting for credentials. For example, if your authinfo file looks like:

IOM_GELGrid_SASDemo user sasdemo password lnxsas

your configuration defintion in your sascfg_personal.py should look like:

winiomlinux = {'java'   : 'C:\\Program Files\\Java\\jre1.8.0_221\\bin\\java',
            'iomhost'   : 'sasserver01.mydemo.sas.com',
            'iomport'   : 8591,
            'encoding'  : 'latin1',
            'authkey' : 'IOM_GELGrid_SASDemo'
            'classpath' : cpW
            }

There are special rules for how to secure the authinfo file (making it readable only by you), so be sure to refer to the instructions.

Grid Option Sets

What if you want your code to run in the grid with certain parameters or options by default? For instance, say you want all Python code to be executed in a particular grid queue. SASPy can do this by leveraging Grid Option Sets. The process is outlined here, but in short, a new SASPy 'SAS Application' has to be configured in SAS Management Console, which is then used to the map to the Grid Options Set (created using the standard process).

More Information

My sincere thanks to Allan Tham and Greg Wootton for their valued contributions to this post.

Please refer to the official SAS Grid documentation for more information on SAS Grid Manager in SAS 9.4M6.

Thank you for reading. I hope the information provided in this post has been helpful. Please feel free to comment below to share your own experiences.

Using Python to run jobs in your SAS Grid was published on SAS Users.

7月 252019
 

Recommendations on SAS Support Communities

If you visit the SAS Support Communities and sign in with your SAS Profile, you'll experience a little bit of SAS AI with every topic that you view.

While it appears to be a simple web site widget, the "Recommended by SAS" sidebar is made possible by an application of the full Analytics Life Cycle. This includes data collection and prep, model building and test, API wrappers with a gateway for monitoring, model deployment in containers with orchestration in Kubernetes, and model assessment using feedback from click actions on the recommendations. We built this by using a combination of SAS analytics and open source tools -- see the SAS Global Forum paper by my colleague, Jared Dean, for the full list of ingredients.

Jared and I have been working for over a year to bring this recommendation engine to life. We discussed it at SAS Global Forum 2018, and finally near the end of 2018 it went into production on communities.sas.com. The engine scores user visits for new recommendations thousands of times per day. The engine is updated each day with new data and a new scoring model.

Now that the recommendation engine is available, Jared and I met again in front of the camera. This time we discussed how the engine is working and the efforts required to get into production. Like many analytics projects, the hardest part of the journey was that "last mile," but we (and the entire company, actually) were very motivated to bring you a live example of SAS analytics in action. You can watch the full video at (where else?) communities.sas.com. The video is 17 minutes long -- longer than most "explainer"-type videos. But there was a lot to unpack here, and I think you'll agree there is much to learn from the experience. Not ready to binge on our video? I'll use the rest of this article to cover some highlights.

Good recommendations begin with clean data

The approach of our recommendation engine is based upon your viewing behavior, especially as compared to the behavior of others in the community. With this approach, we don't need to capture much information about you personally, nor do we need information about the content you're reading. Rather, we just need the unique IDs (numbers) for each topic that is viewed, and the ID (again, a number) for the logged-in user who viewed it. One benefit of this approach is that we don't have to worry about surfacing any personal information in the recommendation API that we'll ultimately build. That makes the conversation with our IT and Legal colleagues much easier.

Our communities platform captures details about every action -- including page views -- that happens on the site. We use SAS and the community platform APIs to fetch this data every day so that we can build reports about community activity and health. We now save off a special subset of this data to feed our recommendation engine. Here's an example of the transactions we're using. It's millions of records, covering nearly 100,000 topics and nearly 150,000 active users.

Sample data records for the model

Building user item recommendations with PROC FACTMAC

Starting with these records, Jared uses SAS DATA step to prep the data for further analysis and a pass through the algorithm he selected: factorization machines. As Jared explains in the video, this algorithm shines when the data are represented in sparse matrices. That's what we have here. We have thousands of topics and thousands of community members, and we have a record for each "view" action of a topic by a member. Most members have not viewed most of the topics, and most of the topics have not been viewed by most members. With today's data, that results in a 13 billion cell matrix, but with only 3.3 million view events. Traditional linear algebra methods don't scale to this type of application.

Jared uses PROC FACTMAC (part of SAS Visual Data Mining and Machine Learning) to create an analytics store (ASTORE) for fast scoring. Using the autotuning feature, the FACTMAC selects the best combination of values for factors and iterations. And Jared caps the run time to 3600 seconds (1 hour) -- because we do need this to run in a predictable time window for updating each day.

proc factmac data=mycas.weighted_factmac  outmodel=mycas.factors_out;
   autotune maxtime=3600 objective=MSE 
       TUNINGPARAMETERS=(nfactors(init=20) maxiter(init=200) learnstep(init=0.001) ) ;
   input user_uid conversation_uid /level=nominal;
   target rating /level=interval;
   savestate rstore=mycas.sascomm_rstore;
run;

Using containers to build and containers to score

To update the model with new data each day and then deploy the scoring model as an ASTORE, Jared uses multiple SAS Viya environments. These SAS Viya environments need to "live" only for a short time -- for building the model and then for scoring data. We use Docker containers to spin these up as needed within the cloud environment hosted by SAS IT.

Jared makes the distinction between the "building container," which hosts the full stack of SAS Viya and everything that's needed to prep data and run FACTMAC, and the "scoring container", which contains just the ASTORE and enough code infrastructure (include the SAS Micro Analytics Service, or MAS) to score recommendations. This scoring container is lightweight and is actually run on multiple nodes so that our engine scales to lots of requests. And the fact that it does just the one thing -- score topics for user recommendations -- makes it an easier case for SAS IT to host as a service.

DevOps flow for the recommendation engine

Monitoring API performance and alerting

To access the scoring service, Jared built a simple API using a Python Flask app. The API accepts just one input: the user ID (a number). It returns a list of recommendations and scores. Here's my Postman snippet for testing the engine.

To provision this API as a hosted service that can be called from our community web site, we use an API gateway tool called Apigee. Apigee allows us to control access with API keys, and also monitors the performance of the API. Here's a sample performance report for the past 7 days.

In addition to this dashboard for reporting, we have integrated proactive alerts into Microsoft Teams, the tool we use for collaboration on this project. I scheduled a SAS program that tests the recommendations API daily, and the program then posts to a Teams channel (using the Teams API) with the results. I want to share the specific steps for this Microsoft Teams integration -- that's a topic for another article. But I'll tell you this: the process is very similar to the technique I shared about publishing to a Slack channel with SAS.

Are visitors selecting recommended content?

To make it easier to track recommendation clicks, we added special parameters to the recommended topics URLs to capture the clicks as Google Analytics "events." Here's what that data looks like within the Google Analytics web reporting tool:

You might know that I use SAS with the Google Analytics API to collect web metrics. I've added a new use case for that trick, so now I collect data about the "SAS Recommended Click" events. Each click event contains the unique ID of the recommendation score that the engine generated. Here's what that raw data looks like when I collect it with SAS:

With the data in SAS, we can use that to monitor the health/success of the model in SAS Model Manager, and eventually to improve the algorithm.

Challenges and rewards

This project has been exciting from Day 1. When Jared and I saw the potential for using our own SAS Viya products to improve visitor experience on our communities, we committed ourselves to see it through. Like many analytics applications, this project required buy-in and cooperation from other stakeholders, especially SAS IT. Our friends in IT helped with the API gateway and it's their cloud infrastructure that hosts and orchestrates the containers for the production models. Putting models into production is often referred to as "the last mile" of an analytics project, and it can represent a difficult stretch. It helps when you have the proper tools to manage the scale and the risks.

We've all learned a lot in the process. We learned how to ask for services from IT and to present our case, with both benefits and risks. And we learned to mitigate those risks by applying security measures to our API, and by limiting the execution scope and data of the API container (which lives outside of our firewall).

Thanks to extensive preparation and planning, the engine has been running almost flawlessly for 8 months. You can experience it yourself by visiting SAS Support Communities and logging in with your SAS Profile. The recommendations that you see will be personal to you (whether they are good recommendations...that's another question). We have plans to expand the engine's use to anonymous visitors as well, which will significantly increase the traffic to our little API. Stay tuned!

The post Building a recommendation engine with SAS appeared first on The SAS Dummy.

4月 082019
 

Have you ever wondered if love at first sight really exists? And if it exists, what qualities are people drawn too? Watch any romantic comedy and you’ll see this phenomenon play out on the big screen. Which begs the question, “If it can happen to them why not me?” Let’s [...]

Love at first sight: authentic or absurd? was published on SAS Voices by Melanie Carey

7月 252018
 

I recently joined SAS in a brand new role: I'm a Developer Advocate.  My job is to help SAS customers who want to access the power of SAS from within other applications, or who might want to build their own applications that leverage SAS analytics.  For my first contribution, I decided to write an article about a quick task that would interest developers and that isn't already heavily documented. So was born this novice's experience in using R (and RStudio) with SAS Viya. This writing will chronicle my journey from the planning stages, all the way to running commands from RStudio on the data stored in SAS Viya. This is just the beginning; we will discuss at the end where I should go next.

Why use SAS Viya with R?

From the start, I asked myself, "What's the use case here? Why would anyone want to do this?" After a bit of research discussion with my SAS colleagues, the answer became clear.  R is a popular programming language used by data scientists, developers, and analysts – even within organizations that also use SAS.  However, R has some well-known limitations when working with big data, and our SAS customers are often challenged to combine the work of a diverse set of tools into a well-governed analytics lifecycle. Combining the developers' familiarity of R programming with the power and flexibility of SAS Viya for data storage, analytical processing, and governance, this seemed like a perfect exercise.  For this purpose of this scenario, think of SAS Viya as the platform and the Cloud Analytics Server (CAS) is where all the data is stored and processed.

How I got started with SAS Viya

I did not want to start with the task of deploying my own SAS Viya environment. This is a non-trivial activity, and not something an analyst would tackle, so the major pre-req here is you'll need access to an existing SAS Viya setup.  Fortunately for me, here at SAS we have preconfigured SAS Viya environments available on a private cloud that we can use for demos and testing.  So, SAS Viya is my server-side environment. Beyond that, a client is all I needed. I used a generic Windows machine and got busy loading some software.

What documentation did I use/follow?

I started with the official SAS documentation: SAS Scripting Wrapper for Analytics Transfer (SWAT) for R.

The Process

The first two things I installed were R and RStudio, which I found at these locations:

https://cran.r-project.org/
https://www.rstudio.com/products/rstudio/download/

The installs were uneventful, so I'll won't list all those steps here. Next, I installed a couple of pre-req R packages and attempted to install the SAS Scripting Wrapper for Analytics Transfer (SWAT) package for R. Think of SWAT as what allows R and SAS to work together. In an R command line, I entered the following commands:

> install.packages('httr')
> install.packages('jsonlite')
> install.packages('https://github.com/sassoftware/R-swat/releases/download/v1.2.1/R-swat-1.2.1-> 
  linux64.tar.gz', repos=NULL, type='file')

When attempting the last command, I hit an error:

…
ERROR: dependency 'dplyr' is not available for package 'swat'
* removing 'C:/Program Files/R/R-3.5.1/library/swat'
In R CMD INSTALL
Warning message:
In install.packages("https://github.com/sassoftware/R-swat/releases/download/v1.2.1/R-swat-1.2.1-linux64.tar.gz",  :
installation of package 'C:/Users/sas/AppData/Local/Temp/2/RtmpEXUAuC/downloaded_packages/R-swat-1.2.1-linux64.tar.gz'
  had non-zero exit status

The install failed. Based on the error message, it turns out I had forgotten to install another R package:

> install.packages("dplyr")

(This dependency is documented in the R SWAT documentation, but I missed it. Since this could happen to anyone – right? – I decided to come clean here. Perhaps you'll learn from my misstep.)

After installing the dplyr package in the R session, I reran the swat install and was happy to hit a return code of zero. Success!

For the brevity of this post, I decided to not configure an authentication file and will be required to pass user credentials when making connections. I will configure authinfo in a follow-up post.

Testing my RStudio->SAS Viya connection

From RStudio, I ran the following command to connect to the CAS server:

> library(swat)
> conn <- CAS("mycas.company.com", 8777, protocol='http', user='user', password='password')

Now that I succeeded in connecting my R client to the CAS server, I was ready to load data and start making API calls.

How did I decide on a use case?

I'm in the process of moving houses, so I decided to find a data set on property values in the area to do some basic analysis, to see if I was getting a good deal. I did a quick google search and downloaded a .csv from a local government site. At this point, I was all set up, connected, and had data. All I needed now was to run some CAS Actions from RStudio.

CAS actions are commands that you submit through RStudio to tell the CAS server to 'do' something. One or more objects are returned to the client -- for example, a collection of data frames. CAS actions are organized into action sets and are invoked via APIs. You can find

> citydata <- cas.read.csv(conn, "C:\\Users\\sas\\Downloads\\property.csv", sep=';')
NOTE: Cloud Analytic Services made the uploaded file available as table PROPERTY in caslib CASUSER(user).

What analysis did I perform?

I purposefully kept my analysis brief, as I just wanted to make sure that I could connect, run a few commands, and get results back.

My RStudio session, including all of the things I tried

Here is a brief series of CAS action commands that I ran from RStudio:

Get the mean value of a variable:

> cas.mean(citydata$TotalSaleValue)
          Column     Mean
1 TotalSaleValue 343806.5

Get the standard deviation of a variable:

> cas.sd(citydata$TotalSaleValue)
          Column      Std
1 TotalSaleValue 185992.9

Get boxplot data for a variable:

> cas.percentile.boxPlot(citydata$TotalSaleValue)
$`BoxPlot`
          Column     Q1     Q2     Q3     Mean WhiskerLo WhiskerHi Min     Max      Std    N
1 TotalSaleValue 239000 320000 418000 343806.5         0    685000   0 2318000 185992.9 5301

Get boxplot data for another variable:

> cas.percentile.boxPlot(citydata$TotalBldgSqFt)
$`BoxPlot`
         Column   Q1   Q2   Q3     Mean WhiskerLo WhiskerHi Min   Max      Std    N
1 TotalBldgSqFt 2522 2922 3492 3131.446      1072      4943 572 13801 1032.024 5301

Did I succeed?

I think so. Let's say the house I want is 3,000 square feet and costs $258,000. As you can see in the box plot data, I'm getting a good deal. The house size is in the second quartile, while the house cost falls in the first quartile. Yes, this is not the most in depth statistical analysis, but I'll get more into that in a future article.

What's next?

This activity has really sparked my interest to learn more and I will continue to expand my analysis, attempt more complex statistical procedures and create graphs. A follow up blog is already in the works. If this article has piqued your interest in the subject, I'd like to ask you: What would you like to see next? Please comment and I will turn my focus to those topics for a future post.

Using RStudio with SAS Viya was published on SAS Users.

2月 062018
 

Good news learners! SAS University Edition has gone back to school and learned some new tricks.

With the December 2017 update, SAS University Edition now includes the SASPy package, available in its Jupyter Notebook interface. If you're keeping track, you know that SAS University Edition has long had support for Jupyter Notebook. With that, you can write and run SAS programs in a notebook-style environment. But until now, you could not use that Jupyter Notebook to run Python programs. With the latest update, you can -- and you can use the SASPy library to drive SAS features like a Python coder.

Oh, and there's another new trick that you'll find in this version: you can now use SAS (and Python) to access data from HTTPS websites -- that is, sites that use SSL encryption. Previous releases of SAS University Edition did not include the components that are needed to support these encrypted connections. That's going to make downloading web data much easier, not to mention using REST APIs. I'll show one HTTPS-enabled example in this post.

How to create a Python notebook in SAS University Edition

When you first access SAS University Edition in your web browser, you'll see a colorful "Welcome" window. From here, you can (A) start SAS Studio or (B) start Jupyter Notebook. For this article, I'll assume that you select choice (B). However, if you want to learn to use SAS and all of its capabilities, SAS Studio remains the best method for doing that in SAS University Edition.

When you start the notebook interface, you're brought into the Jupyter Home page. To get started with Python, select New->Python 3 from the menu on the right. You'll get a new empty Untitled notebook. I'm going to assume that you know how to work with the notebook interface and that you want to use those skills in a new way...with SAS. That is why you're reading this, right?

Move data from a pandas data frame to SAS

pandas is the standard for Python programmers who work with data. The pandas module is included in SAS University Edition -- you can use it to read and manipulate data frames (which you can think of like a table). Here's an example of retrieving a data file from GitHub and loading it into a data frame. (Read more about this particular file in this article. Note that GitHub uses HTTPS -- now possible to access in SAS University Edition!)

import saspy
import pandas as pd
 
df = pd.read_csv('https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv')
df.describe()

Here's the result. This is all straight Python stuff; we haven't started using any SAS yet.

Before we can use SAS features with this data, we need to move the data into a SAS data set. SASPy provides a dataframe2sasdata() method (shorter alias: df2sd) that can import your Python pandas data frame into a SAS library and data set. The method returns a SASdata object. This example copies the data into WORK.PROBLY in the SAS session:

sas = saspy.SASsession()
probly = sas.df2sd(df,'PROBLY')
probly.describe()

The SASdata object also includes a describe() method that yields a result that's similar to what you get from pandas:

Drive SAS procedures with Python

SASPy includes a collection of built-in objects and methods that provide APIs to the most commonly used SAS procedures. The APIs present a simple "Python-ic" style approach to the work you're trying to accomplish. For example, to create a SAS-based histogram for a variable in a data set, simply use the hist() method.

SASPy offers dozens of simple API methods that represent statistics, machine learning, time series, and more. You can find them documented on the GitHub project page. Note that since SAS University Edition does not include all SAS products, some of these API methods might not work for you. For example, the SASml.forest() method (representing

In SASPy, all methods generate SAS program code behind the scenes. If you like the results you see and want to learn the SAS code that was used, you can flip on the "teach me SAS" mode in SASPy.

sas.teach_me_sas('true')

Here's what SASPy reveals about the describe() and hist() methods we've already seen:

Interesting code, right? Does it make you want to learn more about SCALE= option on PROC SGPLOT?

If you want to experiment with SAS statements that you've learned, you don't need to leave the current notebook and start over. There's also a built-in %%SAS "magic command" that you can use to try out a few of these SAS statements.

%%SAS
proc means data=sashelp.cars stackodsoutput n nmiss median mean std min p25 p50 p75 max;run;

Python limitations in SAS University Edition

SAS University Edition includes over 300 Python modules to support your work in Jupyter Notebook. To see a complete list, run the help('modules') command from within a Python notebook. This list includes the common Python packages required to work with data, such as pandas and NumPy. However, it does not include any of the popular Python-based machine learning modules, nor any modules to support data visualization. Of course, SASPy has support for most of this within its APIs, so why would you need anything else...right?

Because SAS University Edition is packaged in a virtual machine that you cannot alter, you don't have the option of installing additional Python modules. You also don't have access to the Jupyter terminal, which would allow you to control the system from a shell-like interface. All of this is possible (and encouraged) when you have your own SAS installation with your own instance of SASPy. It's all waiting for you when you've outgrown the learning environment of SAS University Edition and you're ready to apply your SAS skills and tech to your official work!

Learn more

The post Coding in Python with SAS University Edition appeared first on The SAS Dummy.

1月 112018
 

The SAS® platform is now open to be accessed from open-source clients such as Python, Lua, Java, the R language, and REST APIs to leverage the capabilities of SAS® Viya® products and solutions. You can analyze your data in a cloud-enabled environment that handles large amounts of data in a variety of different formats. To find out more about SAS Viya, see the “SAS Viya: What's in it for me? The user.” article.

This blog post focuses on the openness of SAS® 9.4 and discusses features such as the SASPy package and the SAS kernel for Jupyter Notebook and more as clients to SAS. Note: This blog post is relevant for all maintenance releases of SAS 9.4.

SASPy

The SASPy package enables you to connect to and run your analysis from SAS 9.4 using the object-oriented methods and objects from the Python language as well as the Python magic methods. SASPy translates the objects and methods added into the SAS code before executing the code. To use SASPy, you must have SAS 9.4 and Python 3.x or later.
Note: SASPy is an open-source project that encourages your contributions.

After you have completed the installation and configuration of SASPy, you can import the SASPy package as demonstrated below:
Note: I used Jupyter Notebook to run the examples in this blog post.

1.   Import the SASPy package:

Openness of SAS® 9.4

2.   Start a new session. The sas object is created as a result of starting a SAS session using a locally installed version of SAS under Microsoft Windows. After this session is successfully established, the following note is generated:

Adding Data

Now that the SAS session is started, you need to add some data to analyze. This example uses SASPy to read a CSV file that provides census data based on the ZIP Codes in Los Angeles County and create a SASdata object named tabl:

To view the attributes of this SASdata object named tabl, use the PRINT() function below, which shows the libref and the SAS data set name. It shows the results as Pandas, which is the default result output for tables.

Using Methods to Display and Analyze Data

This section provides some examples of how to use different methods to interact with SAS data via SASPy.

Head() Method

After loading the data, you can look at the first few records of the ZIP Code data, which is easy using the familiar head() method in Python. This example uses the head() method on the SASdata object tabl to display the first five records. The output is shown below:

Describe() Method

After verifying that the data is what you expected, you can now analyze the data. To generate a simple summary of the data, use the Python describe() method in conjunction with the index [1:3]. This combination generates a summary of all the numeric fields within the table and displays only the second and third records. The subscript works only when the result is set to Pandas and does not work if set to HTML or Text, which are also valid options.

Teach_me_SAS() Method

The SAS code generated from the object-oriented Python syntax can also be displayed using SASPy with the teach_me_SAS() method. When you set the argument in this method to True, which is done using a Boolean value, the SAS code is displayed without executing the code:

ColumnInfo() Method

In the next cell, use the columnInfo() method to display the information about each variable in the SAS data set. Note: The SAS code is generated as a result of adding the teach_me_SAS() method in the last section:

Submit() Method

Then, use the submit() method to execute the PROC CONTENTS that are displayed in the cell above directly from Python. The submit method returns a dictionary with two keys, LST and LOG. The LST key contains the results and the LOG key returns the SAS log. The results are displayed as HTML. The HTML package is imported  to display the results.

The SAS Kernel Using Jupyter Notebook

Jupyter Notebook can run programs in various programming languages including SAS when you install and configure the SAS kernel. Using the SAS kernel is another way to run SAS interactively using a web-based program, which also enables you to save the analysis in a notebook. See the links above for details about installation and configuration of the SAS kernel. To verify that the SAS kernel installed successfully, you can run the following code: jupyter kernelspec list

From the command line, use the following command to start the Jupyter Notebook: Jupyter notebook. The screenshot below shows the Jupyter Notebook session that starts when you run the code. To execute SAS syntax from Jupyter Notebook, select SAS from the New drop-down list as shown below:

You can add SAS code to a cell in Jupyter Notebook and execute it. The following code adds a PRINT procedure and a SGPLOT procedure. The output is in HTML5 by default. However, you can specify a different output format if needed.

You can also use magics in the cell such as the %%python magic even though you are using the SAS kernel. You can do this for any kernel that you have installed.

Other SAS Goodness

There are more ways of interacting with other languages with SAS as well. For example, you can use the Groovy procedure to run Groovy statements on the Java Virtual Machine (JVM). You can also use the LUA procedure to run LUA code from SAS along with the ability to call most SAS functions from Lua. For more information, see “Using Lua within your SAS programs.” Another very powerful feature is the DATA step JavaObject, which provides the ability to instantiate Java classes and access fields and methods. The DATA step JavaObject has been available since SAS® 9.2.

Resources

SASPy Documentation

Introducing SASPy: Use Python code to access SAS

Come on in, we're open: The openness of SAS® 9.4 was published on SAS Users.

9月 282017
 

SAS Viya: What’s in it for me?If you’re in the field of analytics, you’ve undoubtedly heard about SAS Viya, our new, open analytic platform. Designed for all analytic professionals, regardless of skills or experience, SAS Viya seamlessly handles big, complex, diverse data and can bridge SAS 9.4. It also provides a tool that supports any programming language, allowing analysts to choose the tool that makes them most productive.

Recently a colleague of mine, Leo Sadovy, wrote the blog post SAS Viya: What’s in it for me? The business? This post describes the benefits of SAS Viya for the line of business owner. Spoiler alert: When it comes to analytics, SAS Viya provides the best of all worlds.

But what does SAS Viya mean to me … if I’m a current SAS user? As the communication manager for our existing SAS user base, Leo’s post inspired me to ask a similar question on behalf of our SAS users.

So, I hit the road, found a few smart colleagues (who know a lot more than I do about SAS Viya!) and recorded the Facebook Live video you’ll find attached below.

You’ll learn what SAS Viya is and what motivated us to create it, what it means to you as a SAS user (a new or longtime one), and what learning tools and other resources are available to you to learn even more.

Enjoy!

SAS Viya: What's in it for me? The user

Learn more about SAS Viya

And, if you have any other questions about SAS Viya, feel free to leave them in the comments field. I’ll get back to if I have the answer… or find someone else who can help, if I don't!

SAS Viya: What’s in it for me? The user. was published on SAS Users.