Developers

10月 312019
 

Buy my costume, Georgie!

While growing up in the 80's, I watched The Golden Girls on TV with my Grandma Betty. Now, when my sister visits, we binge watch reruns on TV Land. I was excited when I saw for this Halloween, you could buy Golden Girls costumes! Too bad they sold out right away, making them one this year's most popular costumes.

For the record, I wasn't planning to dress up tonight as a Golden Girl, but the news got me to thinking how Halloween costumes have changed over the years. What was popular when? Hence, this post. I explain how to use SAS REST APIs to append a table containing historic costume data with this year's most popular costumes (including the Golden Girls and Pennywise from It). While looking at costume data in this example, consider the append steps as a template to translate any table needing updates in SAS Viya, using REST APIs.

The data

There I am, in the middle

I created a data set containing the most popular Halloween costumes from the last 50 years (1968-2018). I compiled the data from several sources who couldn't seem to agree on the best-selling costume for a given year, so I combined the lists. Many years have two entries. The data here isn't as important as the append table procedure. What fun to review the costumes list! It was not hard to tell in what year certain movies (and their sequels) were released. Only one costume I wore made the list – my 1979 Ace Frehley outfit!

The process

The procedures in this example run on SAS Viya, utilizing the Cloud Analytics Services (CAS) REST APIs. CAS REST APIs use CAS actions to perform statistical methods across a variety of SAS products. You can learn more about CAS REST APIs and CAS Actions in the Using SAS Cloud Analytics Service REST APIs to run CAS Actions post or on developer.sas.com.

Below, I'll detail each REST call along with sample code. I originally used Postman to organize my calls. This allowed me to utilize pre and post-call scripting to handle responses and create variables. You can find the entire Postman collection here on GitHub. For ease-of-display purposes in this post, I'll use equivalent cURL commands.

Pre-requisites

I registered my client, obtained access token, and added it as an environment variable ACCESSTOKEN. For more information on registering a client or getting an access token, see my earlier post Authentication to SAS Viya: a couple of approaches.

Create a CAS session

Before running any CAS actions, we need to establish a connection to the SAS Viya server.

curl -X POST https://sasserver:8777/cas/sessions \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/vnd.sas.cas.session+json'

The result of this call is a session id in the form of a089ce2b-8116-7a40-b3e3-6e39b7b5566d. This will be used in all subsequent REST calls. You could easily create another variable for further use. In the examples below I substitute the actual session id with <session-id>. You will need to substitute this place holder when attempting the steps on your own.

Create a global Caslib HALLOWEEN

Data in CAS is stored in a Caslib. In the step below, I create a Caslib called HALLOWEEN and link it to a physical server path (/home/sasdemo/halloween), where the table is stored.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"HALLOWEEN","path":"/home/sasdemo/halloween","description":"HALLOWEEN","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Note that I created the directory ~/halloween and set permissions as needed. Further, since the Caslib is global, other users have access to the data. This step (and the next step) are one time requests. If you were to repeat this process you would not need to create the Caslib nor upload the data.

Copy data set costumesByYear into HALLOWEEN's path

Now that we have a Caslib and a path, we load the data table to the server. In this instance, I copy the costumesByYear.xlsx file into /home/sasdemo/halloween. There are multiple ways to upload data to the server. You can read more about the various methods in the SAS documentation.

Create a temporary Caslib TEMP

While our data lives in the HALLOWEEN Caslib, we want to create a temporary Caslib to run the append step. We will then save the appended table back into HALLOWEEN. The following code creates a new Caslib called TEMP.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.addCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"name":"TEMP","path":"/home/sasdemo/temp","description":"TEMP","subDirectories":"false","permission":"PUBLICWRITE","session":"false","dataSource":{"srcType":"path"},"createDirectory":"true","hidden":"false","transient":"false"}

Now we're ready to load the data into memory and append the table.

Load costumesByYear into memory

First, we load costumesByYear into memory in the TEMP Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.loadTable
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"path":"costumesByYear.xlsx","caslib":"HALLOWEEN","importOptions":{"fileType":"EXCEL"},"casOut":{"caslib":"TEMP","name":"costumesByYear","promote":"true"}}

Create a temporary table data2019 with containing append data

Next, we create a new table called data2019 with costume data for 2019 in TEMP.

curl -X PUT https://sasserver:8777/cas/sessions/<session-id>/actions/upload
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: text/plain' \
  -H 'JSON-Parameters: {"casOut":{"caslib":"TEMP","name":"data2019","promote":"true"},"importOptions":{"fileType":"CSV"}}' \
  --data-binary $'Year,Costume\n2019,The Golden Girls\n2019,Pennywise'

Run data step to append data2019 to costumesByYear table

Finally, we run data step code to append table data2019 to table costumesByYear.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/runCode \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"code": "data temp.costumesbyyear(append=force) ; set temp.data2019;run;"}

Save the costumesByYear table back to the HALLOWEEN CASlib

Now that we have successfully appended the costumesByYear table in the TEMP Caslib, we are ready to save back to the HALLOWEEN Caslib.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/actions/table.save \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"table":{"name":"costumesbyyear","caslib":"TEMP","singlePass":"false"},"name":"costumesbyyear","replace":"true","compress":"false","caslib":"HALLOWEEN","exportOptions":{"fileType":"BASESAS"}}

Delete TEMP Caslib

The TEMP Caslib is just that, temporary. With the code below we drop the Caslib and all its data.

curl -X POST https://sasserver:8777/cas/sessions/<session-id>/table.dropCaslib \
  -H 'Authorization: Bearer $ACCESSTOKEN' \
  -H 'Content-Type: application/json' \
  -d {"caslib":"TEMP"}

Delete the CAS session

The final step is to close our connection to the CAS server.

curl -X DELETE https://sasserver:8777/cas/sessions/<session-id> \
  -H 'Authorization: Bearer $ACCESSTOKEN'

Wrapping it up

There you have it. With a few simple commands we were able to load, append, and save a table. This example is fairly simple in scope, but translates into more complex use cases. The steps for my 2 x 50 table are the same as it would be for a 5GB table with hundreds of columns and millions of rows.

I have asked my mother to send the Polaroid photo of me as Ace in 1979. She just has to dig it out of a photo album. Check back in a week so you can gain fodder and poke fun at me.

Additional Resources

developer.sas.com - developers site for SAS
GitHub resources - GitHub repository for code used in this post

Append tables in SAS® Viya® with REST APIs – a treat, no tricks was published on SAS Users.

9月 282019
 

This article continues a series that began with Machine learning with SASPy: Exploring and preparing your data (part 1). Part 1 showed you how to explore data using SASPy with Python. Here, in part 2, you will learn how to begin to prepare your data to use it within a machine-learning model.

Review part 1 if needed and ensure you still have the ADULT data set ready to use. (The data set is available from the UCI Machine Learning Repository.) If not, take some time to download and explore the data again, as described in part 1.

Preparing your data

Preparing data is a necessary step to perform before applying the data toward a model. There are string values, skewed data, and missing data points to consider. In the data set, be sure to clear missing values, so you can jump into other methods.

For this exercise, you will explore how to transform skewed features using SASPy and Pandas.

First, you must separate the income data from the data set, because the income feature will later become your target variable to model.

Drop the income data and turn the pandas data frame back into a SAS data object, with the following code:

Now, let's take a second look at the numerical features. You will use SASPy to create a histogram of all numerical features. Typically, the Matplotlib library is used, but SASPy provides great opportunities to visualize the data.

The following graphs represent the expected output.

Taking a look at the numerical features, two values stick out. CAPITAL_GAIN and CAPITAL_LOSS are highly skewed. Highly skewed features can affect your model, as most models try to maintain a normally distributed curve. To fix this, you will apply a logarithmic transformation using pandas and then visualize the change using SASPy.

Transforming skewed features

First, you need to change the SAS data object back into a pandas data frame and assign the skewed features to a list variable:

Then, use pandas to apply the logarithmic transformation and convert the pandas data frame back into a SAS data object:

Display transformed data

Now, you are ready to visualize these changes using SASPy. In the previous section, you used histograms to display the data. To display this transformation, you will use the SASPy SASUTIL class. Specifically, you will use a procedure typically used in SAS, the UNIVARIATE procedure.

To use the SASUTIL class with SASPy, you first need to create a Python object that uses the SASUTIL class:

 

Now, use the univariate function from SASPy:

 

Using the UNIVARIATE procedure, you can set axis limits to the output histograms so that you can see the data in a clearer format. After running the selected code, you can use the dir() function to verify successful submission:

 

Here is the output:

 

 

 

The function calculates various descriptive statistics and plots. However, for this example, the focus is on the histogram.

 

Here are the results:

Wrapping up

You have now transformed the skewed data. Pandas applied the logarithmic transformation and SASPy displayed the histograms.

Up next

In the next and final article of this series, you will continue preparing your data by normalizing numerical features and one-hot encoding categorical features.

Machine learning with SASPy: Exploring and preparing your data (part 2) was published on SAS Users.

9月 092019
 

Editor's Note: This article was translated and edited by SAS USA and was originally written by Makoto Unemi. The original text is here.

SAS previously provided SAS Scripting Wrapper for Analytics Transfer (SWAT), a package for using SAS Viya functions from various general-purpose programming languages ​​such as Python.

In addition to SWAT, SAS launched Deep Learning Python (DLPy), a higher-level API package for Python, making it possible to use SAS Viya functions more efficiently from Python. In this article I outline more about what DLPy is and how it's implementation.

About DLPy

DLPy is a high-level package for the Python API created for deep learning and image action set after Viya3.3. DLPy provides an API similar to Keras to improve the efficiency of deep learning and image processing coding. With just a little rewriting of the existing Keras code, it is possible to execute the processing on SAS Viya.

For example, below is an example of a Convolutional Neural Network (CNN) layer definition; you can see that it is very similar to Keras.

The layers supported by DLPy are: InputLayer, Conv2d, Pooling, Dense, Recurrent, BN, Res, Proj, and OutputLayer. The following is an example of learning.

DLPy functions

Introducing DLPy's functions (partial excerpts), taking as an example the learning of multiple dolphins and giraffe images using CNN and applying test images to the model.

Implementation of major deep learning networks

DLPy offers the following pre-built deep learning models: VGG11/13/16/19, ResNet34/50/101/152, wide_resnet, and dense_net.

The following models also offer pre-trained weights using ImageNet data (these weights can be used for unique tasks by transfer learning): VGG16, VGG19, ResNet50, ResNet101, and ResNet152. The following is an example of transferring ResNet50 pre-trained weights.

CNN judgment basis information

Using the heat_map_analysis() method, you can output a colorful heat map and check where you focused on the image.

In addition, the get_feature_maps() method is used to get the feature map of each layer of CNN, and feature_maps.display() method is used to specify and display the obtained feature map layer and check can also do.

The following is the output result of layer 1 feature map.

The following is the output result of layer 18 feature map.

Deep learning & image processing related task support function

resize() method: Resize image data

as_patches() method: Image data expansion (generates a patch from the original image)

two_way_split() method: Data split (learning, testing)

plot_network() method: draws the structure of the defined deep learning layer (network) as a graphical diagram

plot_training_history() method: Iterative learning history display

predict() method: Display prediction (scoring) results

plot_predict_res() method: Display classification results

And of course, you can use DLPy to get data from a SAS Viya in-memory session, pass it to your local client, and convert it to common data formats like numpy arrays and Pandas DataFrames. The converted data can be smoothly supplied to models of other open source packages such as scikit-learn.

Regarding image classification using DLPy, videos are also available in the Deep Learning with Python (DLPy) Demo Series section of the DLPy product page.

SAS Viya: Package for Python API for deep learning and image processing: DLPy was published on SAS Users.

9月 062019
 

A few years ago I shared a method to publish content from SAS to a Slack channel. Since that time, our teams at SAS have gone "all in" on collaboration with Microsoft Office 365, including Microsoft Teams. Microsoft Teams is the Office suite's answer to Slack, and it's not a coincidence that it works in nearly the same way.

The lazy method: send e-mail to the channel

Before I cover the "deluxe" method for sending content to a Microsoft Teams channel, I want to make sure you know that there is a simple method that involves no coding, and no need for APIs. The message experience isn't as nice, but it does the job. You can simply "send e-mail" to the channel. If you're automating output from SAS, it's a simple, well-documented process to send e-mail from a SAS program. (Here's an example from me, using FILENAME EMAIL.)

When you send e-mail to a Microsoft Teams channel, the message notice includes the message subject line, sender, and the first bit of the message content. To see the entire message, you must click on the "View original e-mail" link in the notice. This "downloads" the message to your device so that you can open it with a local tool (such as your e-mail reader, Microsoft Outlook). My team uses this method to receive certain alerts from our communities.sas.com platform. Here's an example:

To get the unique e-mail address for a channel, right-click on the channel name and select Get email address. Any message that you send to that e-mail address will be distributed to the team.

Getting started with a Microsoft Teams webhook

In order to provide a richer, more integrated experience with Microsoft Teams, you can publish content using a webhook. A webhook is a REST API endpoint that allows you to post messages and notifications with more control over the appearance and interactive options within the messages. In SAS, you can publish to a webhook by using PROC HTTP.

To get started, you need to add and configure a webhook for your Microsoft Teams channel:

  1. Right-click on the channel name and select Connectors.
  2. Microsoft Teams offers built-in connectors for many different applications. To find the connector for Incoming Webhook, use the search field to narrow the list. Then click Add to add the connector to the channel.
  3. You must grant certain permissions to the connector to interact with your channel. In this case, you need to allow the webhook to send messages and notifications. Review the permissions and click Install.
  4. On the Configuration page, assign a name to this connector and optionally customize the image. The image will be the avatar that's used when the connector posts content to the channel. When you've completed these changes, select Create.
  5. The connector generates a unique (and very long) URL that serves as the REST API endpoint. You can copy the URL from this field -- you will need it later in your SAS program. You can always come back to these configuration settings to change the connector avatar or re-copy the URL.

    At this point, it's a good idea to test that you can publish a basic message from SAS. The "payload" for a Teams message is a JSON-formatted structure, and you can find examples in the Microsoft Teams reference doc. Here's a SAS program that publishes the simplest message. Add your webhook URL and run the code to verify the connector is working for your channel.

    filename resp temp;
    options noquotelenmax;
    proc http
      /* Substitute your webhook URL here */
      url="https://outlook.office.com/webhook/your-unique-webhook-address-it-is-very-long"
      method="POST"
      in=
      '{
          "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
          "type": "AdaptiveCard",
          "version": "1.0",
          "summary": "Test message from SAS",
          "text": "This message was sent by **SAS**!"
      }'
      out=resp;
    run;

    If successful, this step will post a simple message to your Teams channel:

    Design a message card for Microsoft Teams

    Now that we have the basic plumbing working, it's time to add some bells and whistles. Microsoft Teams calls these notifications "message cards", which are messages that can include interactive features such as images, data, action buttons, and more.

    Designing a simple message

    Microsoft Teams supports a large palette of building blocks (expressed in JSON) to create different card experiences. You can experiment with these cards in the MessageCard Playground that Microsoft hosts. The tool provides templates for several card varieties, and you can edit the JSON definitions to tweak and design your own.

    For one of my use cases, I designed a simple card to show the status of our recommendation engine on SAS Support Communities. (Read this article for more information about how we built and monitor the recommendation engine.) The engine runs as a service and is accessed with its own API. I wanted a periodic "health check" to post to our internal team that would alert us to any problems. Here's the JSON that I used in the MessageCard Playground to design it.

    Much of the JSON is boilerplate for the message. I drew the green blocks to indicate the areas that need to be dynamic -- that is, replaced with values from the real-time API call. Here's what the card looks like when rendered in the Microsoft Teams channel.

    Since my API call to the recommendation engine service creates a data set, I can run that data through PROC JSON to create the JSON segment I need:

    /* reading the results from my API call to the engine */
    libname results json fileref=resp;
     
    /* Prep a simple name-value data set with the results */
    data segment (keep=name value);
     set results.root;
     name="Score data updated (UTC)";
     value= astore_creation;
     output;
     name="Topics scored";
     value=left(num_topics);
     output;
     name="Number of users";
     value= left(num_users);
     output;
     name="Process time";
     value= process_time;
     output;
    run;
     
    /* use PROC JSON to create the segment */
    filename segment temp;
    proc json out=segment nosastags pretty;
     export segment;
    run;

    I shared a version of the complete program on GitHub. It should run as is -- but you would need to supply your own webhook endpoint for a channel that you can publish to.

    Design a message with actions

    I also use Microsoft Teams to share updates about the SAS Software GitHub organization. In a previous article I discussed how I use GitHub APIs to gather data from the GitHub service. Each day, my program summarizes the recent activity from github.com/sassoftware and publishes a message card to the team. Here's an example of a daily update:

    This card is fancier than my first example. I added action buttons that can direct the team members to the internal reports for more details and to the GitHub site itself. I used the Microsoft Teams documentation and the MessageCard Playground to design the experience:

    Messaging apps as part of a DevOps strategy

    Like many organizations, we (SAS) invest a considerable amount of time and energy into gathering metrics and building reports about our operations. However, reports are useful only when the intended audience is tuned in and refers to them regularly. With a small additional step, you can use SAS to bring your most interesting data forward to your team -- automatically.

    Whether you use Microsoft Teams or Slack, automated alerting and updates are a great opportunity to keep your teams informed. Each of these tools offers fit-for-purpose connectors that can tie in with information from other popular operational systems (Salesforce, GitHub, Yammer, JIRA, and many more). For cases where a built-in connector is not available, the webhook approach allows you to easily create your own.

The post How to publish to a Microsoft Teams channel using SAS appeared first on The SAS Dummy.

7月 262019
 

In a previous post, Zero to SAS in 60 Seconds- SAS Machine Learning on SAS Analytics Cloud, I documented my experience with a SAS free trial on the SAS Analytics Cloud. Well, the engineers at SAS have been busy and created another free trial. The new trial covers SAS Event Stream Processing (ESP).

This time last year (when just starting at SAS), I only knew ESP as extrasensory perception. I'm more enlightened now. Working through this exercise introduced me to how event stream processing is a powerful and effective tool for analyzing data using machine learning and streaming analytics to uncover insights for real-time decision making. In a nutshell, you create a model, stream your data, process the results, and make timely decisions based on the results.

The trial uses SAS ESPPy, allowing you to embed an ESP project inside a Python pipeline. To see ESPPy in action take a look at this video. To learn more about ESP and IoT see this article on the SAS Communities Library. In this article I chronicle my journey through the trial while introducing key concepts and operations of ESP.

Register and get started

The process to register and initial login are identical to the machine learning article. You must have a SAS Profile to participate in the trial. The only difference is you need to follow this link to sign up for the ESP trial. Please refer to the machine learning article for detailed steps of signing up and logging in.

The use case

SAS Solar Farm in Cary

The SAS Solar Farm sits on almost 12 acres of SAS Headquarters property. There are 10,276 solar panels producing more than 3.6 million kilowatt hours annually. That’s enough power for more than 325 average sized U.S. homes.

As part of the environment management, it is important to continuously monitor the operation of the solar panels to optimize configuration parameters, detect potential equipment failure, and accurately forecast the amount of energy generated. Factors considered include panel angles, time of day, seasons, and weather patterns as the energy generated depends directly of the amount of sun available to the panels.

The ESP project in this demo is pre-loaded in the trial and is run through a Jupyter notebook. The project shows the monitoring of energy (kWh) and power (kW) generated during a specific time interval eliminating localized outlier effects and triggering alerts when there is a pre-defined difference in the energy generated between subsequent time intervals.

Solar Farm Data represented as digital art

Take two minutes and watch this video on how SAS uses SAS software to create a work of art with solar farm data.

Disclaimer: no sheep were harmed during data collection or writing of this article.

Navigating the trial

Once logged into the trial, you see the Applications screen.

ESP trial Applications screen

The Data and Team options in the left pane behave exactly as those in the machine learning trial. These sections allow you to access data and manage a multi-user system. Select the SAS Event Stream Processing icon to start a JupyterLab session.

JupyterLab home screen

I will not go into the details of JupyterLab here. The left pane contains menus, file management, and other options. The pane on the right displays three options:

Python 3 Notebook - a blank Jupyter notebook - documents that combine live, runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output
Python 3 Console - a blank Python console - code consoles enable you to run code interactively in a kernel
Text File - basic text editor - enables you to edit text files in JupyterLab

For this article we're going to follow along and interact with the pre-loaded demo Solar Farm ESP project. To locate the Jupyter notebook double click the demo directory from the left pane.

Select the demo directory from the left pane

Next select Event_Stream_Processing. Before proceeding with the demo, I'd highly suggest opening the README.ipynb file.

Contents of the README notebook

Here you will find overview and environment organization information for the trial. The trial uses SAS ESPPy for designing, testing, and deploying projects on ESP Servers.

Step through the demo

Before starting the trial, I needed a little background on event stream processing. I located the SAS ESP product documentation. I recommend referring to it for details on the ESP model, objects, and workflow.

To access the demo, double click the demo directory from the left pane. The trial comes with five pre-loaded demos. Feel free to try any/all of them. Double click on ESP Basic Project - Solar Farm.ipynb to display the Solar Farm notebook. The notebook walks you through the ESP model creation and execution. To run a command place the cursor in a command cell and select the 'Run' button (triangle-shaped button at the top of the notebook). If no response returns when running the cell block, assume the commands ran successfully.

Below is a brief description of the steps in the project:

  1. Create the project and query used - this creates dedicated space and objects where the ESP process takes place
  2. Create input and aggregate windows - this action extracts desired data and creates data subsets from the stream
  3. Add a join window - this brings together lag and current values into the project
  4. Add a compute window - this calculates the difference between the previous and current event
  5. Add a filter window - this action filters occurrences outside a threshold value; this creates an alert for potential mechanical issues
  6. Define workflow connections - this defines the workflow between the various windows in the project
  7. Save the project - this generates an XML file for the project
  8. Load the project to the ESP Server - this loads the project and produces a graphical representation of the workflow

    Solar Farm project workflow

  9. Start streaming data - in this example, rather than streaming data in real time, the stream derives from the solar farm table data
  10. View solar farm data - this creates a graphical representation of streaming data

    Solar Farm graph for kW and kWh

While not included in the demo, the streaming data would pass through the filter and if a threshold breach occurs, an alert is created. Considering the graph above, alerts could very well have occurred just before 1:15 pm (IntkW drops from 185 to 150) and just before 2:30 pm (IntkW drops from 125 to 35).

Your turn

Now that you have a taste of ESP, feel free to step through the rest of the demos. You may also load your own data and create your own ESP models. Feel free to share your experience and what you create by leaving a comment.

SAS Event Stream Processing on SAS Analytics Cloud - my journey was published on SAS Users.

7月 252019
 

Recommendations on SAS Support Communities

If you visit the SAS Support Communities and sign in with your SAS Profile, you'll experience a little bit of SAS AI with every topic that you view.

While it appears to be a simple web site widget, the "Recommended by SAS" sidebar is made possible by an application of the full Analytics Life Cycle. This includes data collection and prep, model building and test, API wrappers with a gateway for monitoring, model deployment in containers with orchestration in Kubernetes, and model assessment using feedback from click actions on the recommendations. We built this by using a combination of SAS analytics and open source tools -- see the SAS Global Forum paper by my colleague, Jared Dean, for the full list of ingredients.

Jared and I have been working for over a year to bring this recommendation engine to life. We discussed it at SAS Global Forum 2018, and finally near the end of 2018 it went into production on communities.sas.com. The engine scores user visits for new recommendations thousands of times per day. The engine is updated each day with new data and a new scoring model.

Now that the recommendation engine is available, Jared and I met again in front of the camera. This time we discussed how the engine is working and the efforts required to get into production. Like many analytics projects, the hardest part of the journey was that "last mile," but we (and the entire company, actually) were very motivated to bring you a live example of SAS analytics in action. You can watch the full video at (where else?) communities.sas.com. The video is 17 minutes long -- longer than most "explainer"-type videos. But there was a lot to unpack here, and I think you'll agree there is much to learn from the experience. Not ready to binge on our video? I'll use the rest of this article to cover some highlights.

Good recommendations begin with clean data

The approach of our recommendation engine is based upon your viewing behavior, especially as compared to the behavior of others in the community. With this approach, we don't need to capture much information about you personally, nor do we need information about the content you're reading. Rather, we just need the unique IDs (numbers) for each topic that is viewed, and the ID (again, a number) for the logged-in user who viewed it. One benefit of this approach is that we don't have to worry about surfacing any personal information in the recommendation API that we'll ultimately build. That makes the conversation with our IT and Legal colleagues much easier.

Our communities platform captures details about every action -- including page views -- that happens on the site. We use SAS and the community platform APIs to fetch this data every day so that we can build reports about community activity and health. We now save off a special subset of this data to feed our recommendation engine. Here's an example of the transactions we're using. It's millions of records, covering nearly 100,000 topics and nearly 150,000 active users.

Sample data records for the model

Building user item recommendations with PROC FACTMAC

Starting with these records, Jared uses SAS DATA step to prep the data for further analysis and a pass through the algorithm he selected: factorization machines. As Jared explains in the video, this algorithm shines when the data are represented in sparse matrices. That's what we have here. We have thousands of topics and thousands of community members, and we have a record for each "view" action of a topic by a member. Most members have not viewed most of the topics, and most of the topics have not been viewed by most members. With today's data, that results in a 13 billion cell matrix, but with only 3.3 million view events. Traditional linear algebra methods don't scale to this type of application.

Jared uses PROC FACTMAC (part of SAS Visual Data Mining and Machine Learning) to create an analytics store (ASTORE) for fast scoring. Using the autotuning feature, the FACTMAC selects the best combination of values for factors and iterations. And Jared caps the run time to 3600 seconds (1 hour) -- because we do need this to run in a predictable time window for updating each day.

proc factmac data=mycas.weighted_factmac  outmodel=mycas.factors_out;
   autotune maxtime=3600 objective=MSE 
       TUNINGPARAMETERS=(nfactors(init=20) maxiter(init=200) learnstep(init=0.001) ) ;
   input user_uid conversation_uid /level=nominal;
   target rating /level=interval;
   savestate rstore=mycas.sascomm_rstore;
run;

Using containers to build and containers to score

To update the model with new data each day and then deploy the scoring model as an ASTORE, Jared uses multiple SAS Viya environments. These SAS Viya environments need to "live" only for a short time -- for building the model and then for scoring data. We use Docker containers to spin these up as needed within the cloud environment hosted by SAS IT.

Jared makes the distinction between the "building container," which hosts the full stack of SAS Viya and everything that's needed to prep data and run FACTMAC, and the "scoring container", which contains just the ASTORE and enough code infrastructure (include the SAS Micro Analytics Service, or MAS) to score recommendations. This scoring container is lightweight and is actually run on multiple nodes so that our engine scales to lots of requests. And the fact that it does just the one thing -- score topics for user recommendations -- makes it an easier case for SAS IT to host as a service.

DevOps flow for the recommendation engine

Monitoring API performance and alerting

To access the scoring service, Jared built a simple API using a Python Flask app. The API accepts just one input: the user ID (a number). It returns a list of recommendations and scores. Here's my Postman snippet for testing the engine.

To provision this API as a hosted service that can be called from our community web site, we use an API gateway tool called Apigee. Apigee allows us to control access with API keys, and also monitors the performance of the API. Here's a sample performance report for the past 7 days.

In addition to this dashboard for reporting, we have integrated proactive alerts into Microsoft Teams, the tool we use for collaboration on this project. I scheduled a SAS program that tests the recommendations API daily, and the program then posts to a Teams channel (using the Teams API) with the results. I want to share the specific steps for this Microsoft Teams integration -- that's a topic for another article. But I'll tell you this: the process is very similar to the technique I shared about publishing to a Slack channel with SAS.

Are visitors selecting recommended content?

To make it easier to track recommendation clicks, we added special parameters to the recommended topics URLs to capture the clicks as Google Analytics "events." Here's what that data looks like within the Google Analytics web reporting tool:

You might know that I use SAS with the Google Analytics API to collect web metrics. I've added a new use case for that trick, so now I collect data about the "SAS Recommended Click" events. Each click event contains the unique ID of the recommendation score that the engine generated. Here's what that raw data looks like when I collect it with SAS:

With the data in SAS, we can use that to monitor the health/success of the model in SAS Model Manager, and eventually to improve the algorithm.

Challenges and rewards

This project has been exciting from Day 1. When Jared and I saw the potential for using our own SAS Viya products to improve visitor experience on our communities, we committed ourselves to see it through. Like many analytics applications, this project required buy-in and cooperation from other stakeholders, especially SAS IT. Our friends in IT helped with the API gateway and it's their cloud infrastructure that hosts and orchestrates the containers for the production models. Putting models into production is often referred to as "the last mile" of an analytics project, and it can represent a difficult stretch. It helps when you have the proper tools to manage the scale and the risks.

We've all learned a lot in the process. We learned how to ask for services from IT and to present our case, with both benefits and risks. And we learned to mitigate those risks by applying security measures to our API, and by limiting the execution scope and data of the API container (which lives outside of our firewall).

Thanks to extensive preparation and planning, the engine has been running almost flawlessly for 8 months. You can experience it yourself by visiting SAS Support Communities and logging in with your SAS Profile. The recommendations that you see will be personal to you (whether they are good recommendations...that's another question). We have plans to expand the engine's use to anonymous visitors as well, which will significantly increase the traffic to our little API. Stay tuned!

The post Building a recommendation engine with SAS appeared first on The SAS Dummy.

6月 262019
 

In his article How to use CASL to develop and work with user-defined CAS actions, Brian Kinnebrew defines CASL as "a language specification used by the SAS client to interact with and provide easy access to Cloud Analytic Services (CAS). CASL is a statement-based scripting language with many uses and strengths." I can't come up with a better definition, so if you'd like to learn more about the basics of CASL, I encourage you to read Brian's post.

In a SAS Stored Process (or any traditional SAS program) the code has multiple DATA steps and procedures with a good dose of macros.

Over the last couple of years, the focus of my projects is the use of CAS actions with no traditional procedures in the mix. Most of these involved web applications calling multiple CAS actions. My initial approach was to make multiple http calls - one per action. This could get tedious.

Then I met my action hero: sccasl.runCasl.

My favorite SAS CASL action

This action executes a CASL "script" on the CAS server analogous to executing a SAS Stored Process. Running a CASL program with a mix of CAS actions and CASL statements on the CAS server has these benefits:

  1. Reduced the number of http calls to the server
  2. The client-side code is much easier to reason
  3. The returned values can be a dictionary that is suited for further consumption by the client, simplifying the client code

My personal name for these scripts is "CAS stored process".

Where art thou Macro?

In many applications, user input is passed to the code running on the server. In a SAS Stored Process, macros pass the parameters. CASL has no macros. My initial approaches to passing parameters to CASL programs were:

  1. Generate the final CASL program in JavaScript with the user input values inserted into the code. Sample code is available here.
    • Drawback: Debugging the code in SAS Studio requires a cut-and-paste of the generated code into SAS Studio.
  2. Load the data into a CAS table using table.upload action for programs needing an input table. Sample code is available here.
    • Drawback: This requires an additional http call to the server.

In developing the GraphQL approach to writing applications - GraphQL and SAS Viya applications - a good match - I addressed the two drawbacks listed above by creating two functions and put them in my utility belt.

    Superfriend functions

  • jsonToDict.js - generate a string having the CASL dictionary version of a JavaScript object
  • argsToTable - create a CAS table from a dictionary

The remainder of this article discusses these two functions. To demonstrate the functions' usage, I use code listings from the scoring example, covered in the GraphQL example.

The jsonToDict function

The result of this function produces a string with CASL dictionary suitable for inclusion in CASL code.

Function definition

jsonToDict ⇒ string

Returnsstring - returns the string containing the CASL dictionary

Param Type Description
obj object the JavScript object of interest
name string the name to assign to the dictionary

 

Example function code

The code below outlines the jsonToDict function usage.

obj = {x:1, b:2, c:['a', 'b']};
let r = jsonToDict(obj, '_appEnv_');
The result is:
r = `_appEnv_ = {x=1, b=2, c={"a", "b"}}'`;

The following lists the input parameters passed to the CASL code for scoring.

let input = {
        JOB    : 'J1',
        CLAGE  : 100, 
        CLNO   : 20, 
        DEBTINC: 20, 
        DELINQ : 2, 
        DEROG  : 0, 
        MORTDUE: 4000, 
        NINQ   : 1,
        YOJ    : 10,
        LOAN: 1000,
        ASSET: 100000
    };

Below is the Javascript code and the result.

let _args_ = jsonToDict(input, '_args_');
results in
 
args_ = `_args_ = {  JOB= "J1" ,CLAGE=100  ,CLNO=20  ,DEBTINC=20  ,DELINQ=2  ,DEROG=0  ,
MORTDUE=4000  ,NINQ=1  ,YOJ=10  ,LOAN=1000  ,ASSET=100000  };`;

To allow the use of different versions of the model, the name of the scoring model is passed in as a parameter. The Javascript for the model follows.

let env = {
     astore: {
            caslib: 'Public',
            name  : 'GRADIENT_BOOSTING___BAD_2'
        }
};
let _appEnv_= jsonToDict(env, '_appEnv_');
 
resulting in:
let _appEnv_ = `_appEnv_ = { astore = { caslib="Public", name="GRADIENT_BOOSTING___BAD_2"}};`;

Next, I prepend the strings to the CASL program in the client code with the following code snips.

let code = _args_ + _appEnv_ + `the CASL code shown below';
loadactionset "astore";
 
/* convert arguments to a cas table */
argsToTable(_args_, 'casuser', 'INPUTDATA' );
 
/* score */
action astore.score r=rc/
    table  = { caslib= 'casuser', name = 'INPUTDATA' } 
    rstore = { caslib= _appEnv_.astore.caslib,  name=_appEnv_.astore.name }
    casout  = { caslib = 'casuser', name = 'OUTPUTDATA' replace= TRUE};
 
/* fetch results */
action table.fetch r = result /
    table = {  caslib = 'casuser' name = 'OUTPUTDATA' } ;
 
/* extract the score and send it as a dictionary */
score = result.Fetch[1].P_BAD;
send_response({score = score});

Now the CASL program can access the incoming information using the two dictionaries _args_ and _appEnv_. Note: As a personal choice, I use a convention of _args_ for user input and _appEnv_ for application specific information. I use the restaf application framework to make the http calls as shown below.

let payload = {
        action: 'sccasl.runCasl',
        data  : { code: code}
    }
 let result = await store.runAction(session, payload); 
let score = result.items('results', 'score');

CASL coding - easier than saying Kilp-ill-skim

Notice the absence of string substitutions that look strange. Just simple, straight forward coding. Easier than saying your name backwards, forcing you back to the fifth dimension.

The argsToTable.casl function

The sample code above used the function argsToTable. As you may guess it converts the _args_ dictionary into a CAS table used in the scoring action. The argsToTable is the function in CASL handling this task.

Function definition

argsToTable ⇒ Load dictionary into a CAS Table

Param Type Description
input dictionary the data to load
caslib string caslib of output table
name string name of output table

 

The relevant CASL code from Step 1 is reproduced here:

argsToTable(_args_, 'casuser', 'INPUTDATA' );

The argsToTable function is either stored on a server or prepended to the CASL code sent to the runCasl action. This function removes the need to run a http call to load the data in the CAS table.

Returning data from CAS - the send_response function

The ultimate sidekick function

Any good super hero has a sidekick. The function send_response in CASL is very versatile - it allows one to return data in the form the application needs and allows more than one result. In many programs I return data in a form easily consumed by the client code.

For example, if you wanted to return just the rows of table you can do the following:

function resultsToDict(r);
    casResults = {};
    i = 1;
  do row over r;
     casResults[i] = row;
     i = i + 1;
   end;
  return casResults;
end;
 
/* and use it as follows: */
 
action table.fetch r = result /
    table = {  caslib = 'casuser',  name = 'mydata' };
/* extract the data and return it as a dictionary */
casResults = resultsToDict(result.Fetch);
send_response({casResults: casResults});

Finally

Using a combination of CASL, a couple of utility functions and the runCasl action you can develop some very efficient programs with minimal traffic between your client and the server. If you run multiple actions in sequence you should consider grouping them into a CASL program and executing them on the CAS server using the runCasl action.

Next

In my next article, which I hope to finish in a flash, I will discuss using the runCasl action to create a browser for CAS tables with support for pagination.

All comments are welcome. Please feel free to clone the code, make it better.

Cheers...
Deva

Let runCasl be your BFF and favorite action hero

"CAS Stored Process" with my Favorite Action Hero runCasl was published on SAS Users.

6月 252019
 

Everyone is talking about artificial intelligence. Unfortunately, a lot of what you hear about AI in the movies and on the TV is sensationalized for entertainment. Indeed, AI is overhyped. But AI is also real and powerful. Consider this: engineers worked for years on hand-crafted models for object detection, facial [...]

Meet 5 data pioneers developing AI solutions for the real world was published on SAS Voices by Oliver Schabenberger

6月 192019
 

As a data scientist, did you ever come to the point where you felt the need for an evolved analytics platform bringing together the disparate skills of open source and commercial software? A system that can enable advanced analytic capabilities. This is now possible and easy to implement. With many deployment possibilities, SAS Viya allows you to choose the data storage location where compute happens, and the deployment methods for models.

Let’s say you want to expand your model development process with SAS Viya analytical capabilities and you don’t want to wait for getting such environment up and running. Unfortunately, you have no infrastructure, nor the experience to install SAS Viya. Moving the traditional way, you could go for:

  • Protracted hardware procurement and provisioning
  • Deployment planning and coordination with IT
  • Effort and time required for software installation/configuration

This solution may be the right path for many organizations, but I think we all recognize this: the traditional approach could take days, weeks and yes sometimes months.

What if you could get up and running with a full SAS Viya platform in two hours? If you have some affinity for cloud-based solutions, SAS offers you the AWS SAS Viya Cloud Rapid Deployment tool. SAS released this AWS Quick Start as a rapid deployment architecture for SAS Viya on AWS. Deployable products include SAS Visual Data Mining and Machine Learning, SAS Visual Statistics and SAS Visual Analytics.

The goal of this article is to brief you how I launched such an AWS SAS Viya Quickstart. I strongly advise you to watch this related video by my colleague Erwan Granger. Much of what is covered here appears in Erwan's video. The recording predates the SAS Viya 3.4 release, but main concepts are still the same.

What you will need

The following is a list of items you need to complete this task.

  • AWS Account with appropriate creation privileges
  • A valid SAS Viya License; this means you will need a SAS Software Order Confirmation e-mail
  • Optional: you deploy with your own DNS Name and SSL Certificate. In that case you need to register a domain managed by Amazon Route 53. For instructions on registering the domain, see the Route 53 documentation. And you can request and register a certificate with AWS Certificate Manager.

Furthermore, it’s good to know this Quick Start provides two deployment options. You can deploy SAS Viya into a new Virtual Private Cloud (VPC) or into an existing VPC. The first option builds a new AWS environment consisting of the VPC, private and public subnets, NAT gateways, security groups, Ansible controllers, and other infrastructure components, and then deploys SAS Viya into this new VPC. The second option provisions SAS Viya in your existing AWS infrastructure. I decided to go for the first option.

What you will build

Here's an architectural overview of what we will build:

SAS Viya architecture on the AWS Cloud

You can find exactly the same architecture on the SAS Viya AWS Quick Start landing page.

Configure the build

We’ll be following the build process outlined in the Quick Start guide. On the landing page, next to the "What you’ll build tab" you can click on "How to deploy". From there launch the "Deploy into a new VPC" wizard.

Deploy into a new VPC wizard

Prerequisite prep

Make sure you sign in with your AWS account and you have chosen the region where you want to deploy. On that first screen you can leave the Amazon S3 template URL default. That template is the basics for the AWS CloudFormation we are launching. CloudFormation is a tool from AWS that allows you to spin up resources in the right order. The template is the blueprint document for your CloudFormation. By keeping the default template, we will build exactly the architecture displayed above.

Pre-req prep template

Now click "Next" and move to the page where we can specify more details and the required parameters of the CloudFormation parameters.

Cloudfourmation parameters

The first parameter is the SAS Viya Software Order file, which is the Amazon S3 location of the Software Order e-mail attachment.

SAS Viya install package location

In the Administration section, you provide parameters to configure your AWS architecture. That way, you control access, instance type, and if you will use a SAS Viya Mirror repository.

CloudFormation administration parameters

Administration parameter definitions:

  • The name of an Amazon EC2 key pair, so you can access the Ansible controller
  • The Amazon Availability zone for the public and private subnet
  • Allowable IP range for HTTP traffic; must be a valid IP CIDR range
  • Allowable IP Range for SSH traffic to the Ansible controller; must be a valid IP CIDR range
  • SAS Administrator password
  • Password for Default (sasuser) user
  • Amazon EC2 Instance type for CAS Compute VM
  • Amazon EC2 Instance type for SAS Viya Services VM
  • (Optional) Location of SAS Viya Deployment Repository data
  • (Optional) Operator Email

If you want to work with custom DNS names and SSL, you will need to provide the next three parameters as well.

DNS and SSL configuration (optional)

DNS and SSL parameters:

You may accept the defaults on the remaining parameters.

Optional parameters

After clicking "Next" another set of optional parameters are available. I mostly go with accepting the default parameters provided. The lone exception is the Rollback on failure.

Optional administration parameters

Based on what I’ve learned from Erwan's video, the safer choice is "No" on the Rollback option. This way, if the deployment process encounters issues, the log will identify in which step the error occurred. Of course this means you are responsible to manually delete AWS created resources that are not longer necessary. The easiest way to do this is by deleting the CloudFormation Stacks afterward.

Kick off the build

To conclude the deployment wizard, click "Next" once more and acknowledge the necessary AWS resources to create. By clicking "Create stack" the deployment process starts.

Start the build process

You can monitor the deployment log using AWS CloudWatch. In his video, Erwan demonstrates this at around minute 23.

After a successful formation you will find two AWS CloudFormation Stacks created. The Outputs gives you the direct links to SAStudioV and SASDrive.

SAS Studio and SAS Drive stacks

That’s it. You are deployed and ready to begin using your SAS Viya environment!

Additional Reference

Alexander Koller writes about SAS on AWS and takeaways for preparing for the AWS associate solution architect exam.

Your experiences and opinion matter

New forces are shaping the analytics ecosystem. Because of increased competition, rise in customer expectations and new, emerging technology such as AI and Machine Learning, challenging IT departments with evolving their analytic ecosystems to meet the demands of their business partners.

How is your organization doing this? How does your Analytics Cloud strategy compare to the market? And what do your peers think about migrating Analytics to the cloud? We can give you some insights and an industry benchmark on the topic.

Tell us about your experience in this 5 minute survey and we will be happy to share a detailed industry insight report with you, to answer these questions.

Deploy SAS Viya on AWS - Quick Start was published on SAS Users.

6月 112019
 

This article is a follow-on to a recent post from Jeff Owens, Getting started with SAS Containers. In that post, Jeff discussed building and running a single container for a SAS Viya runtime/IDE. Today we will go through how to build and run the full SAS Viya stack - visual components and all - in Kubernetes. Step 1 is building the container images and Step 2 is running the containers. For both steps, you can go to the sas-container-recipes GitHub repo for more detail and to obtain the tools needed to accomplish this task. An in-depth guide and more information is located on the wiki page in the repository.

The project development team at SAS has done an incredible job of making this new and intuitive way to dynamically create large collections of containers easy and foolproof, despite my long-winded explanation...

Building the Container Images

Keeping with the recipes theme, we are going to need to prepare a few ingredients to make this work. Of course, you will need a valid SAS_Viya_deployment_data.zip file containing your ordered products.

Build Machine

First, you need a Build Machine. This can be a lightweight server, but it needs to be running Linux. The build machine in this example is 2cpu x 8GB RAM, running RHEL 7.6. Hint – 2 cores is the minimum but the more you use for the build the better (faster). I have installed Docker version 18.09.5 here and I have a 100GB volume attached to my docker root (by default this is /var/lib/docker but you can easily change the location in your /etc/docker/daemon.json file).

You can review full system requirements in the GitHub repository here. This article covers the "multiple" or "full" deployment types so focus on that column in the table.
This build machine is going to execute the build script which builds each one of your containers, push them to your Docker Registry, and create the corresponding Kubernetes manifests files needed to launch your deployment.

Make sure you have cloned the sas-container-recipes repository to this machine.

Docker Registry

You will need access to a Docker registry. Your build machine must be able to push images into it, and your Kubernetes machines must be able to pull images from it. Prior to building, make sure you runt the docker login myregistry.com command using the build uid. This docker login will ensure a file is present at /home/.docker/config.json. This is a requirement whether you secure the registry with a form of authentication, or not. Note, if your registry does not respond to pings you will need to add the --skip-docker-url-validation parameter to the build command.

Mirror Repo (Optional)

Similar to the single containers build, it is a good idea to create a mirror repository to host your SAS rpms. A local mirror gives you consistent performance during installation and a consistent build. However, if your containers are able to connect to ses.sas.download then you can skip the mirror step. Beware of the network implications and the fluid nature of these repos.

LDAP

Just like any other SAS Viya environment, all users/groups/authentication/authorization are managed by connecting to an external LDAP. This could be a quick-and-dirty OpenLDAP server we stand up ourselves, or a corporate Active Directory server. Regardless, we will have to be able to make this connection if we want to use SAS Viya's visual interfaces. The easiest and best way to handle this connection is with a sitedefault.yml file. Below is a sample sitedefault.yml that would hypothetically connect to host.com's corporate LDAP. You need to construct your own sitedefault file using values for your LDAP. Consult SAS documentation (linked above) for further information.

config:
    application:
        sas.logon.initial.password: sasboot
        sas.identities.providers.ldap.connection:
            host: myldap.host.com
            port: 368
            userDN: 'CN=ldapadmin,DC=host,DC=com'
            password: ldappassword
        sas.identities.providers.ldap.group:
            baseDN: OU=Groups,DC=host,DC=com
        sas.identities.providers.ldap.user:
            baseDN: DC=host,DC=com
        sas.identities:
            administrator: youruserid

Additionally, we will need to make sure a few of our containers have "host integration" with this same LDAP (specifically, the CAS container and the programming container). The way we do that is with a standard sssd.conf file. You should hopefully be able to track down a valid sssd.conf file for your site from an administrator. Hint – it may be necessary to add homedir (/home/%u) and default shell (/bin/bash) overrides to this file depending on your LDAP configuration.

The way one would apply these two files here is:

  1. place sssd.conf in the add-ons/auth-sssd directory and include the --addons/auth-sssd option when you run build.sh, as we do in the example later.
  2. place sitedefault.yml in the top level of sas-container-recipes. If the recipe sees a sitedefault.yml file here, it will base64 encode it and embed it as a value in the consul.yml config map. If you didn't do this beforehand, you can add your sitedefault.yml file later. Remember the step below is optional, post-build. This is necessary if you did not include sitedefault.yml pre-build.
    cat sitedefault.yml | base64 --wrap=0

    Next, copy and paste the output into your consul.yml configmap (by default you can find this in builds/full/manifests/kubernetes/configmaps/consul.yml). You want to add a new key/value similar to the following:

    consul_key_value_data_enc: Y29uZmlnOgogICAgYXBwbGlj......XNvZW1zaXRlLERDPWNvbQo=[

Ingress

Ingress is a crucial component to make this come together because the only way to access your SAS Viya environment is through your Ingress. The recipe gives us an Ingress resource (one of the generated Kubernetes manifests files); however, an Ingress resource is simply an internal HTTP routing rule. We will need to make sure we have manually installed a valid Ingress controller inside of our Kubernetes environment which can be a little tricky if you are new to Kubernetes. The Ingress controller reads and applies routing rules (Ingress resources) such as the ones created by the recipes.

Traefik and Ngnix are the two most popular industry options. Or you might use native Ingresses offered by AWS, Azure, or GCP if you are running your Kubernetes cluster in the cloud. But to reiterate, you will need an Ingress controller up and running.

Once your Ingress controller is up, you need to edit the provided manifests_usermods.yml. You should set SAS_K8S_INGRESS_DOMAIN to be the DNS name that resolves to a Kubernetes node that can reach your Ingress controller. And while you have this file open you can also set a unique name for the Kubernetes namespaces you want these resources to deploy (the default is "sas-viya"). This manifests_usermods.yml file is available in the util/ directory, so if you are going to use this then you will first make a copy of that file in the top-level sas-container-recipes directory and edit it there.

Kubernetes namespace

Build.sh

With all this in place we are ready to build. To summarize, the “pre-build” config needed here are the files we touched in this sas-container-recipes project:

Relevent pre-build files

So, we can go ahead and launch the build script. I prefer using environment variables for easier readability along with copying and pasting when things change - new registries, mirrors, tags, etc.

SAS_VIYA_DEPLOYMENT_DATA_ZIP=/path/to/SAS_Viya_deployment_data.zip
MIRROR_URL=mymirror.com/myrepo #optional
DOCKER_REGISTRY_URL=myregistry.com
SAS_RECIPE_TYPE=full
DOCKER_REGISTRY_NAMESPACE=viya
SAS_DOCKER_TAG=prod
 
./build.sh --type $SAS_RECIPE_TYPE \
--mirror-url $MIRROR_URL \ #optional
--docker-registry-url $DOCKER_REGISTRY_URL \
--docker-registry-namespace $DOCKER_REGISTRY_NAMESPACE \
--zip $SAS_VIYA_DEPLOYMENT_DATA_ZIP \
--tag $SAS_DOCKER_TAG \
--addons "addons/auth-sssd"

Once complete:

  1. We store container images (30-40 of them depending on the software you have ordered) locally in the build host's docker images directory.
  2. All these images also are tagged and pushed to our Docker Registry. For your organizational reference, the naming convention used is:
    $DOCKER_REGISTRY_URL/$ DOCKER_REGISTRY_NAMESPACE/-:$SAS_DOCKER_TAG
  3. All our Kubernetes manifests files are available on the build machine in sas-container-recipes/builds/full/manifests/kubernetes. These fully configured manifest files are ready to use. They reference the images we have built and pushed.
  4. The build log gives us instructions for how to apply these resources to Kubernetes. These are simple commands you should be able to copy and paste to standup our Viya environment).

Build log instructions

For the curious
The list below is what happened during the build process. Feel free to skip this section, you do not need to know how any of this works to use the recipes:

  1. You, the builder invokes build.sh. This is a wrapper script around the greater build framework.  This script created a "builder container."  Check out the Dockerfile in the top level of the recipes directory.  This builder container builds from a golang base image as the build process, written in a few Go files (new as of April 2019).  Several files from the sas-container-recipes project copy into this container, including said Go files.
    • Note, we did not have to install Go on our build machine since Go is running inside a container.
    • If you are interested in seeing what the builder container looks like, you can run this command: docker run -it --rm --entrypoint /bin/bash sas-container-recipes-builder:$SAS_DOCKER_TAG.
    • A 'sas' user is created inside of this container - this user has the same uid as the user who invoked build.sh on the host.
  2. build.sh also created a new subdirectory on the host called 'builds/<buildtype>-<timestamp>'. This will contain logs, manifests, and various templates used during this specific build.
  3. build.sh then runs that builder container and the real work gets underway. The entry point for the builder is:  go run main.go container.go order.go.  All those arguments you specified when invoking build.sh pass right into this Go program.  Also, the newly created "builds" directory mounts into the container at /sas-container-recipes/builds.
    • The host's /var/run/docker.sock file mounts into this container - this allows the builder container to run docker (docker in docker)
  4. This Go program then:
    • Generates a playbook from your deployment data file (SOE zip) using the [sas-orchestration tool](https://support.sas.com/en/documentation/install-center/viya/deployment-tools/34/command-line-interface.html).
    • Creates Kubernetes manifests for the images set to build.
    • Gathers sets of Ansible roles to install in each container, based on the entitlement of your software order.
    • Generates a Dockerfile for each container, where each applicable Ansible role installs in a new Docker layer
    • Creates a "build context" for each container with the generated Dockerfile and the Ansible role files.
    • Starts a docker build process for each container. The Dockerfile installs ansible and executes the playbook "locally" (inside of each container).
    • Pushes these images into your registry as each build finishes.
    • Note, this happens inside of containers, and the builds execute concurrently. Recall this build machine has 2 cores, so only 2 containers build at a time and it took several hours.  If we used a 16-core machine, this whole build would go faster.  In another terminal, look at docker stats during the build.  Another significant “performance” impact is the network bandwidth between your build machine and your registry.

Running the Containers

We are going to run these containers inside of a Kubernetes environment. Here are the finishing touches needed to give us a completely containerized SAS Viya environment running in Kubernetes. Note, that by default this deploys into a new namespace inside of your Kubernetes cluster and isolates the resources from anything else running.

Kubernetes Environment

Since we built the full stack, we'll need to make sure we have sufficient resources to run all of these containers at the same time. We'll need a minimum of 8 cores and 80GB RAM available. Remember CAS is a multithreaded, in-memory runtime, so the more cores and RAM you provide, the more horsepower you'll have for doing actual analytical work with SAS and CAS.

Kubectl

Hopefully, if you've gotten this far you are familiar with kubectl, which is the client tool/interface used with a Kubernetes cluster. Consider it a cli wrapper around the Kubernetes API. But for thoroughness, you will launch your SAS Viya deployment from whatever machine from where you are running kubectl. If this happens to be the same machine you built on, then you can stay inside of the sas-container-recipes directory you started in, and copy and paste those kubectl apply -f... commands. Or you can copy your manifest files somewhere else and modify those commands accordingly. In either instance, once those commands run, your environment is up, and you should be able to access SAS Environment Manager and other SAS web apps. If you added your userid as an administrator in the sitedefault.yml file, then you can log in as yourself with admin access.
Apply the manifests:

Apply the manifests

And after a few minutes your pods should be up (first time takes the longest since images must be pulled). Note that the pod running doesn’t mean all your SAS Viya services are running. It may take up to 30 minutes for all services to be up and stabilized.

Pods list

With your Ingress and DNS rules set up correctly, you should be able to reach your environment:

SAS login screen

Based on properly configured sitedefault.yml and sssd.conf files, you should be able to log in as an LDAP user.

Miscellaneous Notes

Scaling

Once your SAS Viya environment is up and running in Kubernetes, the following kubectl command adds CAS worker nodes to scale out the capacity of our CAS server.

kubectl scale deployment sas-viya-cas-worker --replicas=5 -n sas-viya-prod

Note, there isn’t any value in adding any more workers than you have physical nodes in your cluster.

Performance

SAS is a powerful programming language designed to handle heavy workloads on large data. General hardware performance has historically been a chief concern to customers implementing SAS. Containers bring a whole new wrinkle to the concept of performance given the general notion of hardware abstraction. One performance related question is: how can we ever guarantee the IO provided by the underlying filesystem (SASWORK, CAS_DISK_CACHE)? Like Kubernetes and Storage/State in general, no easy answer exists. It falls back on the Kubernetes operator to make high performance filesystems (i.e. local SSD) available on all nodes a SAS programming or CAS container(s) might land on, and manually edit the corresponding manifest files to leverage those host disks. Alternatively, we can try to limit the burden on these scratch disk spaces. For CAS, this means ensuring we have more RAM available than data in use.

Amnesia

See the summary section below for a caveat about this deployment methodology – this is not quite a complete implementation for “production” types of environments. At least not without the understanding customer configuration requirements. You should have a discussion with your sales team about some of these details. But please be aware building/deploying as we did here leaves us with an “Amnesiac Viya” (this useful term coined by an astute SAS employee). That is, there is no state here. If and when you take your environment down, or scale pods to 0 across services, this will yield a "brand new" or "fresh" environment once brought back up. The good news is this also means if we run into any issues, we can easily delete the whole namespace and restart. If you want to persist any user data, config, reports, code, etc. you will have to manually attach storage to a few locations.

Full vs Multiple

Note, here we used SAS_DEPLOYMENT_TYPE=full. This built the entire Viya stack, visual interfaces, microservices and all. Alternatively, if we set the deployment type to "multiple" we get three container images – programming, httpproxy, and cas. This would be all we need if we wanted to write SAS code, whether we wanted to use SAS Studio or an external IDE like Jupyter. And we could still scale out our CAS cluster the same way as we did in our full environment.

Summary

Just like everyone else, the SAS container strategy is quickly evolving. SAS Viya, as a scalable, highly available services-oriented architecture, is a perfect fit to run in containers inside of the Kubernetes orchestration framework. Kubernetes brings tremendous operational benefits to the table for this type of software. Smoother deployments, higher uptime, instant scale, much more efficient hardware usage to name a few.

As you will see in the build log when running the recipe, this is an "EXPERIMENTAL" deployment process. The recipes are an excellent way to get your hands on a Kubernetes version of SAS Viya early. Future releases of SAS Viya will be fully "containerized" and "kubernetes-ized" so customers won’t be building their own containers in this manner. Rather, SAS will provide a Helm chart to customers that will pull container images straight from SAS and apply them into their Kubernetes environments appropriately. Further, many aspects of SAS Viya’s infrastructure will be redesigned to be more "Kubernetes native," but the general feel of this model is what sysadmins/operators should see from SAS going forward.

Deploying the Full SAS Viya Stack in Kubernetes was published on SAS Users.