Tech

2月 122019
 

As one of SAS' newest systems engineers, recently joining the Americas Artificial Intelligence Team, I’m incredibly excited to gain expertise in artificial intelligence and machine learning. I also look forward applying my knowledge to enable others to leverage the advanced technologies that SAS offers.

However, as a recent graduate with no prior experience coding in SAS, I expected a steep learning curve and a slow introduction into the world of AI. When I first arrived on campus right after ringing in 2019, I had no idea how fast I would employ SAS technology to create a tangible AI application.

During my second week on the job, I learned about the development and deployment of effective computer vision models. Not only did I create a demo to distinguish between a valid company logo and a counterfeit version, but I was also amazed by the ease and speed at which it could be done.

Read on for a look at the model and how it works.

Model Development

For this model, I wanted to showcase the potential of using computer vision to protect corporate identity — in this case, a company’s logo. I used the familiar SAS logo as an example of a valid company logo, and for demonstration purposes, I employed the logo used by the Scandinavian Airlines System to represent what a counterfeit could look like. Although this airline is a legitimate business that isn’t knocking off SAS (they are actually one of our customers), the two logos showcase the technology’s ability to distinguish between similar branding. Thus, the model could easily be adapted to detect actual counterfeits.

I first assembled a collection of sample images of both companies’ logos to serve as training data. For each image, I drew a bounding box around the logo to label it as either a “SASlogo” or a “SAS_counterfeit.”

Use SAS to spot a counterfeit logo

This data was then used to train the model to identify the two logos through machine learning. The model was written on the SAS Deep Learning Python Interface, DLPy, using a YOLO (You Only Look Once) algorithm.

Creating a computer vision model

Creating a computer vision model with SAS

To test the model’s effectiveness in object detection, additional validation images were supplied to verify its ability to identify both valid and counterfeit versions of the logo. In the following image of SAS headquarters in Cary, the model correctly identified the displayed logo as the valid version with a confidence level of 0.61.

Model Deployment

One of the key advantages of using the DLPy interface is the ability to easily deploy the model to various SAS engines. I simply created an ASTORE file as the model output, which can be deployed with SAS Event Stream Processing (ESP), Cloud Analytics Service (CAS), or Micro Analytic Service (MAS).

However, the model can also be deployed when SAS technology is not available by creating an ONNX file as the output. This type of file can be used to integrate the model into an iOS application, for example.

Astore and ONNX of YOLO model

As I conclude my first weeks at SAS, I am thrilled by the opportunity to continue to build expertise in SAS technologies and programming. For the next several months, I will be attending the Customer Advisory Academy in Cary, and I look forward to applying the knowledge and skills I gain to the creation of other AI applications in the future!

Want to learn more about object detection in computer vision? Read this blog post!

How to spot counterfeit company logos with AI – no SAS programming experience needed was published on SAS Users.

2月 082019
 

Creating a map with SAS Visual Analytics begins with the geographic variable.  The geographic variable is a special type of data variable where each item has a latitude and longitude value.  For maximum flexibility, VA supports three types of geography variables:

  1. Predefined
  2. Custom coordinates
  3. Custom polygons

This is the first in a series of posts that will discuss each type of geography variable and their creation. The predefined geography variable is the easiest and quickest way to begin and will be the focus of this post.

SAS Visual Analytics comes with nine (9) predefined geographic lookup types.  This lookup method requires that your data contains a variable matching one of these nine data types:

  • Country or Region Names – Full proper name of a country or region (ISO 3166-1)
  • Country or Region ISO 2-Letter Codes – Alpha-2 country code (ISO 3166-1)
  • Country or Region ISO Numeric Codes – Numeric-3 country code (ISO 3166-1)
  • Country or Region SAS Map ID Values – SAS ID values from MPASGFK continent data sets
  • Subdivision (State, Province) Names – Full proper name for level 2 admin regions (ISO 3166-2)
  • Subdivision (State, Province) SAS Map ID Values – SAS ID values from MAPSGFK continent data sets (Level 1)
  • US State Names – Full proper name for US State
  • US State Abbreviations – Two letter US State abbreviation
  • US Zip Codes – A 5-digit US zip code (no regions)

Once you have identified a variable in your dataset matching one of these types, you are ready to begin.  For our example map, the dataset 'Crime' and variable 'State name' will be used.  Let’s get started.

Creating a predefined geography variable in SAS Visual Analytics

  1. Begin by opening VA and navigate to the Data panel on the left of the application.
  2. Select the desired dataset and locate a variable that matches one of the predefined lookup types discussed above. Click the down arrow to the right of the variable and select ‘Geography’ from the Classification dropdown menu.
  3. The ‘Edit Geography Item’ window will open. Depending upon the type of geography variable selected, some of the options on this dialog will vary.  The 'Name' textbox is common for all types and will contain the variable selected from your dataset.  Edit this label as needed to make it more user friendly for your intended audience.
  4. The ‘Geography data type’ drop down list is where you select the desired type of geography variable.  In this example, we are using the default predefined option.
  5. Locate the 'Name or code context' dropdown list.  Select the type of predefined variable that matches the data type of the variable chosen from your data.  Once selected, VA scans your data and does an internal lookup on each data item.  This process identifies latitude and longitude values for each item of your dataset.  Lookup results are shown on the right of the window as a percentage and a thumbnail size map.  The thumbnail map displays the the first 100 matches.
  6. If there are any unmatched data items, the first 5 will be displayed.  This may provide a better understanding of your data.  In this example, it is clear from variable name as to what type should be selected (US State Names).  However, in most cases that choice will not be this obvious.  The lesson here, know your data!

Unmatched data items indicators

Once you are satisfied with the matched results, click the OK button to continue.  You should see a new section in the Data panel labeled ‘Geography’.  The name of the variable will be displayed beside a globe icon. This icon represents the geography variable and provides confirmation it was created successfully.

Icon change for geography variable

Now that the geography variable has been created, we are ready to create a map.  To do this, simply drag it from the Data panel and drop it on the VA report canvas.  The auto-map feature of VA will recognize the geography variable and create a bubble map with an OpenStreetMap background.  Congratulations!  You have just created your first map in VA.

Bubble map created with predefined geography variable

The concept of a geography variable was introduced in this post as the foundation for creating all maps in VA.  Using the predefined geography variable is the quickest way to get started with Geo maps.  In situations when the predefined type is not possible, using one of VA's custom geography types becomes necessary.  These scenarios will be discussed in future blog posts.

Fundamentals of SAS Visual Analytics geo maps was published on SAS Users.

2月 062019
 

Splitting external text data files into multiple files

Recently, I worked on a cybersecurity project that entailed processing a staggering number of raw text files about web traffic. Millions of rows had to be read and parsed to extract variable values.

The problem was complicated by the varying records composition. Each external raw file was a collection of records of different structures that required different parsing programming logic. Besides, those heterogeneous records could not possibly belong to the same rectangular data tables with fixed sets of columns.

Solving the problem

To solve the problem, I decided to employ a "divide and conquer" strategy: to split the external file into many files, each with a homogeneous structure, then parse them separately to create as many output SAS data sets.

My plan was to use a SAS DATA Step for looping through the rows (records) of the external file, read each row, identify its type, and based on that, write it to a corresponding output file.

Like how we would split a data set into many:

 
data CARS_ASIA CARS_EUROPE CARS_USA;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   output CARS_ASIA;
      when('Europe') output CARS_EUROPE;
      when('USA')    output CARS_USA;
   end;   
run;

But how do you switch between the output files? The idea came from SAS' Chris Hemedinger, who suggested using multiple FILE statements to redirect output to different external files.

Splitting an external raw file into many

As you know, one can use PUT statement in a SAS DATA Step to output a character string or a combination of character strings and variable values into an external file. That external file (a destination) is defined by a

 
filename inf  'c:\temp\input_file.txt';
filename out1 'c:\temp\traffic.txt';
filename out2 'c:\temp\system.txt';
filename out3 'c:\temp\threat.txt';
filename out4 'c:\temp\other.txt';
 
data _null_;
   infile inf;
   input REC_TYPE $10. @;
   input;
   select(REC_TYPE);
      when('TRAFFIC') file out1;
      when('SYSTEM')  file out2;
      when('THREAT')  file out3;
      otherwise       file out4;
   end;
   put _infile_;
run;

In this code, the first INPUT statement retrieves the value of REC_TYPE. The trailing @ line-hold specifier ensures that an input record is held for the execution of the next INPUT statement within the same iteration of the DATA Step. It may not be used exactly as written, but the point is you need to capture the filed(s) of interest and stay on the same row.

The second INPUT statement reads the whole raw file record into the _infile_ DATA Step automatic variable.

Depending on the value of the REC_TYPE variable assigned in the first INPUT statement, SELECT block toggles the FILE definition between one of the four filerefs, out1, out2, out3, or out4.

Then the PUT statement outputs the _infile_ automatic variable value to the output file defined in the SELECT block.

Splitting a data set into several external files

Similar technique can be used to split a data table into several external raw files. Let’s combine the above two code samples to demonstrate how you can split a data set into several external raw files:

 
filename outasi 'c:\temp\cars_asia.txt';
filename outeur 'c:\temp\cars_europe.txt';
filename outusa 'c:\temp\cars_usa.txt';
 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi;
      when('Europe') file outeur;
      when('USA')    file outusa;
   end;
   put _all_; 
run;

This code will read observations of the SASHELP.CARS data table, and depending on the value of ORIGIN variable, put _all_ will output all the variables (including automatic variables _ERROR_ and _N_) as named values (VARIABLE_NAME=VARIABLE_VALUE pairs) to one of the three external raw files specified by their respective file references (outasi, outeur, or outusa.)

You can modify this code to produce delimited files with full control over which variables and in what order to output. For example, the following code sample produces 3 files with comma-separated values:

 
data _null_;
   set SASHELP.CARS;
   select(origin);
      when('Asia')   file outasi dlm=',';
      when('Europe') file outeur dlm=',';
      when('USA')    file outusa dlm=',';
   end;
   put make model type origin msrp invoice; 
run;

You may use different delimiters for your output files. In addition, rather than using mutually exclusive SELECT, you may use different logic for re-directing your output to different external files.

Bonus: How to zip your output files as you create them

For those readers who are patient enough to read to this point, here is another tip. As described in this series of blog posts by Chris Hemedinger, in SAS you can read your external raw files directly from zipped files without unzipping them first, as well as write your output raw files directly into zipped files. You just need to specify that in your filename statement. For example:

UNIX/Linux

 
filename outusa ZIP '/sas/data/temp/cars_usa.txt.gz' GZIP;

Windows

 
filename outusa ZIP 'c:\temp\cars.zip' member='cars_usa.txt';

Your turn

What is your experience with creating multiple external raw files? Could you please share with the rest of us?

How to split a raw file or a data set into many external raw files was published on SAS Users.

2月 022019
 

SAS Visual Analytics

I don't know about you, but when I read challenges like:

  • Detecting hidden heart failure before it harms an individual
  • Can SAS Viya AI help to digitalize pension management?
  • How to recommend your next adventure based on travel data
  • How to use advanced analytics in building a relevant next best action
  • Can SAS help you find your future home?
  • When does a customer have their travel mood on, and to which destination will he travel?
  • How can SAS Viya, Machine Learning and Face Recognition help find missing people?

…I can continue with the list of ideas provided by the teams participating in the SAS Nordics User Group’s Hackathon. But one thing is for sure, I become enthusiastic and I'm eager to discover the answers and how analytics can help in solving these questions.

When the Nordics team asked for support for providing SAS Viya infrastructure on Azure Cloud platform, I didn't hesitate to agree and started planning the environment.

Environment needs

Colleagues from the Nordics countries informed us their Hackathon currently included fourteen registered teams. Hence, they needed at least fourteen different environments with the latest and greatest SAS Viya Tools like SAS Visual Analytics, SAS VDMML and SAS Text Analytics. In addition, participants wanted to get the chance to use open source technologies with SAS and asked us to install R-Studio and Jupyter. This would allow data scientists develop models in a programming language of choice and provide access to SAS predictive modeling capabilities.

The challenge I faced was how to automate this installation process. We didn't want to repeat an exact installation fourteen times! Also, in case of a failure we needed a way to quickly reinstall a fresh virtual machine in our environment. We wanted to create the virtual machines on the Azure Cloud platform. The goal was to quickly get SAS Viya instances up and running on Azure, with little user interaction. We ended up with a single script expecting one parameter: the name of the instance. Next, I provide an overview of how we accomplished our task.

The setup

As we need to deploy fourteen identical copies of the same SAS Viya software, we decided to make use of the SAS Mirror Manager, which is a utility for synchronizing SAS software repositories. After downloading the mirror repository, we moved the complete file structure to a Web Server hosted on a separate Nordics Hackathon repository virtual machine, but within a similar private network where the SAS Viya instances will run. This guarantees low latency when downloading the software.

Once the repository server is up and running, we have what we needed to create a SAS Viya base image. Within that image, we first need to make sure to meet the requirements described in the SAS Viya Deployment Guide. To complete this task, we turned to the Viya Infrastructure Resource Kit (VIRK). The VIRK is a collection of tools, created by Erwan Granger, that assist in infrastructure and readiness-verification tasks. The script is located in a repository on SAS software’s GitHub page. By running the VIRK script before creation of the base image, we guarantee all virtual machines based on the image meet the necessary requirements.

Next, we create within the base image the SAS Viya Playbook as described in the SAS Viya Deployment Guide. That allows us to kick off a SAS Viya installation later. The Viya installation must occur later during the initial launch of a new VM based on that image. We cannot install SAS Viya beforehand because one of the requirements is a static IP address and a static hostname, which is different for each VM we launch. However, we can install R-Studio server on the base image. Another important file we make available on this base image is a script to initiate the Ansible installations of OpenLdap, SAS Viya and Jupyter.

Deployment

After the common components are in place we follow the instructions from Azure on how to create a custom image of an Azure VM. This capability is available on other public cloud providers as well. Now all the prerequisites to create working Viya environments for the Hackathon are complete. Finally, we create a launch script to install a full SAS Viya environment with single command and one parameter, the hostname, from the Azure CLI.

$ ./launchscript.sh viya01
$ ./launchscript.sh viya02
$ ./launchscript.sh viya03
...
$ ./launchscript.sh viya12
$ ./launchscript.sh viya13
$ ./launchscript.sh viya14

The script

The main parts of this launch script are:

  1. Testing if the Nordics Hackathon Repository VM is running because we must download software from our own locally created repository.
  2. Launch a new VM, based on the SAS Viya Image we created during preparation, assign a public static IP address, and choose a Standard_E32-16s_v3 Azure VM.
  3. Launch our own Viya-install script to perform the following three sub-steps:
    • Install openLDAP as the identity provider
    • Install SAS Viya just as you would do by following the SAS Viya Deployment Guide.
    • Install Jupyter with a customized Ansible script made by my colleague Alexander Koller.

The result of this is we have fourteen full SAS Viya installations ready in about one hour and 45 minutes. We recently posted a Linkedin video describing the entire process.

Final thoughts

I am planning to write a blog on SAS Communities to share more technical insight on how we created the script. I am honored I was asked to be part of the jury for the Hackathon. I am looking forward to the analytical insights that the different teams will discover and how they will make use of SAS Viya running on the Azure Cloud platform.

Additional resources

Series of Webinars supporting the Nordic Hackathon

Installing SAS Viya Azure virtual machines with a single click was published on SAS Users.

1月 252019
 

Need to authenticate on REST API calls

In my blog series regarding SAS REST APIs (article 1, article 2, article 3) I outlined how to integrate SAS analytical capabilities into applications. I detailed how to construct REST calls, build body parameters and interpret the responses. I've not yet covered authentication for the operations, purposefully putting the cart before the horse. If you're not authenticated, you can't do much, so this post will help to get the horse and cart in the right order.

While researching authentication I ran into multiple, informative articles and papers on SAS and OAuth. A non-exhaustive list includes Stuart Rogers' article on SAS Viya authentication options, one of which is OAuth. Also, I found several resources on connecting to external applications from SAS with explanations of OAuth. For example, Joseph Henry provides an overview of OAuth and using it with PROC HTTP and Chris Hemedinger explains securing REST API credentials in SAS programs in this article. Finally, the SAS Viya REST API documentation covers details on application registration and access token generation.

Consider this post a quick guide to summarize these resources and shed light on authenticating via authorization code and passwords.

What OAuth grant type should I use?

Choosing the grant method to get an access token with OAuth depends entirely on your application. You can get more information on which grant type to choose here. This post covers two grant methods: authorization code and password. Authorization code grants are generally used with web applications and considered the safest choice. Password grants are most often used by mobile apps and applied in more trusted environments.

The process, briefly

Getting an external application connected to the SAS Viya platform requires the following steps:

  1. Use the SAS Viya configuration server's Consul token to obtain an ID Token to register a new Client ID
  2. Use the ID Token to register the new client ID and secret
  3. Obtain the authorization code
  4. Acquire the access OAuth token of the Client ID using the authorization code
  5. Call the SAS Viya API using the access token for the authentication.

Registering the client (steps 1 and 2) is a one-time process. You will need a new authorization code (step 3) if the access token is revoked. The access and refresh tokens (step 4) are created once and only need to be refreshed if/when the token expires. Once you have the access token, you can call any API (step 5) if your access token is valid.

Get an access token using an authorization code

Step 1: Get the SAS Viya Consul token to register a new client

The first step to register the client is to get the consul token from the SAS server. As a SAS administrator (sudo user), access the consul token using the following command:

$ export CONSUL_TOKEN=`cat /opt/sas/viya/config/etc/SASSecurityCertificateFramework/tokens/consul/default/client.token`
64e01b03-7dab-41be-a104-2231f99d7dd8

The Consul token returns and is used to obtain an access token used to register the new application. Use the following cURL command to obtain the token:

$ curl -k -X POST "https://sasserver.demo.sas.com/SASLogon/oauth/clients/consul?callback=false&serviceId=app" \
     -H "X-Consul-Token: 64e01b03-7dab-41be-a104-2231f99d7dd8"
 {"access_token":"eyJhbGciOiJSUzI1NiIsIm...","token_type":"bearer","expires_in":35999,"scope":"uaa.admin","jti":"de81c7f3cca645ac807f18dc0d186331"}

The returned token can be lengthy. To assist in later use, create an environment variable from the returned token:

$ export IDTOKEN="eyJhbGciOiJSUzI1NiIsIm..."

Step 2: Register the new client

Change the client_id, client_secret, and scopes in the code below. Scopes should always include "openid" along with any other groups this client needs to get in the access tokens. You can specify "*" but then the user gets prompted for all their groups, which is tedious. The example below just uses one group named "group1".

$ curl -k -X POST "https://sasserver.demo.sas.com/SASLogon/oauth/clients" \
       -H "Content-Type: application/json" \
       -H "Authorization: Bearer $IDTOKEN" \
       -d '{
        "client_id": "myclientid", 
        "client_secret": "myclientsecret",
        "scope": ["openid", "group1"],
        "authorized_grant_types": ["authorization_code","refresh_token"],
        "redirect_uri": "urn:ietf:wg:oauth:2.0:oob"
       }'
{"scope":["openid","group1"],"client_id":"app","resource_ids":["none"],"authorized_grant_types":["refresh_token","authorization_code"],"redirect_uri":["urn:ietf:wg:oauth:2.0:oob"],"autoapprove":[],"authorities":["uaa.none"],"lastModified":1547138692523,"required_user_groups":[]}

Step 3: Approve access to get authentication code

Place the following URL in a browser. Change the hostname and myclientid in the URL as needed.

https://sasserver.demo.sas.com/SASLogon/oauth/authorize?client_id=myclientid&response_type=code

The browser redirects to the SAS login screen. Log in with your SAS user credentials.

SAS Login Screen

On the Authorize Access screen, select the openid checkbox (and any other required groups) and click the Authorize Access button.

Authorize Access form

After submitting the form, you'll see an authorization code. For example, "lB1sxkaCfg". You will use this code in the next step.

Authorization Code displays

Step 4: Get an access token using the authorization code

Now we have the authorization code and we'll use it in the following cURL command to get the access token to SAS.

$ curl -k https://sasserver.demo.sas.com/SASLogon/oauth/token -H "Accept: application/json" -H "Content-Type: application/x-www-form-urlencoded" \
     -u "app:appclientsecret" -d "grant_type=authorization_code&code=YZuKQUg10Z"
{"access_token":"eyJhbGciOiJSUzI1NiIsImtpZ...","token_type":"bearer","refresh_token":"eyJhbGciOiJSUzI1NiIsImtpZC...","expires_in":35999,"scope":"openid","jti":"b35f26197fa849b6a1856eea1c722933"}

We use the returned token to authenticate and authorize the calls made between the client and SAS. We also get a refresh token we use to issue a new token when the current one expires. This way we can avoid repeating all the previous steps. I explain the refresh process further down.

We will again create environment variables for the tokens.

$ export ACCESS_TOKEN="eyJhbGciOiJSUzI1NiIsImtpZCI6ImxlZ..."
$ export REFRESH_TOKEN="eyJhbGciOiJSUzI1NiIsImtpZC..."

Step 5: Use the access token to call SAS Viya APIs

The prep work is complete. We can now send requests to SAS Viya and get some work done. Below is an example REST call that returns user preferences.

$ curl -k https://sasserver.demo.sas.com/preferences/ -H "Authorization: Bearer $ACCESS_TOKEN"
{"version":1,"links":[{"method":"GET","rel":"preferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.collection","itemType":"application/vnd.sas.preference"},{"method":"PUT","rel":"createPreferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.preference","responseType":"application/vnd.sas.collection","responseItemType":"application/vnd.sas.preference"},{"method":"POST","rel":"newPreferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.collection","responseType":"application/vnd.sas.collection","itemType":"application/vnd.sas.preference","responseItemType":"application/vnd.sas.preference"},{"method":"DELETE","rel":"deletePreferences","href":"/preferences/preferences/stpweb1","uri":"/preferences/preferences/stpweb1","type":"application/vnd.sas.collection","itemType":"application/vnd.sas.preference"},{"method":"PUT","rel":"createPreference","href":"/preferences/preferences/stpweb1/{preferenceId}","uri":"/preferences/preferences/stpweb1/{preferenceId}","type":"application/vnd.sas.preference"}]}

Use the refresh token to get a new access token

To use the refresh token to get a new access token, simply send a cURL command like the following:

$ curl -k https://sasserver.demo.sas.com/SASLogon/oauth/token -H "Accept: application/json" \
     -H "Content-Type: application/x-www-form-urlencoded" -u "app:appclientsecret" -d "grant_type=refresh_token&refresh_token=$REFRESH_TOKEN"
{"access_token":"eyJhbGciOiJSUzI1NiIsImtpZCI6ImxlZ...","token_type":"bearer","refresh_token":"eyJhbGciOiJSUzI1NiIsImtpZCSjYxrrNRCF7h0oLhd0Y","expires_in":35999,"scope":"openid","jti":"a5c4456b5beb4493918c389cd5186f02"}

Note the access token is new, and the refresh token remains static. Use the new token for future REST calls. Make sure to replace the ACCESS_TOKEN variable with the new token. Also, the access token has a default life of ten hours before it expires. Most applications deal with expiring and refreshing tokens programmatically. If you wish to change the default expiry of an access token in SAS, make a configuration change in the JWT properties in SAS.

Get an access token using a password

The steps to obtain an access token with a password are the same as with the authorization code. I highlight the differences below, without repeating all the steps.
The process for accessing the ID Token and using it to get an access token for registering the client is the same as described earlier. The first difference when using password authentication is when registering the client. In the code below, notice the key authorized_grant_types has a value of password, not authorization code.

$ curl -k -X POST https://sasserver.demo.sas.com/SASLogon/oauth/clients -H "Content-Type: application/json" \
       -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZ..." \
       -d '{
        "client_id": "myclientid", 
        "client_secret": "myclientsecret",
        "scope": ["openid", "group1"],
        "authorized_grant_types": ["password","refresh_token"],
        "redirect_uri": "urn:ietf:wg:oauth:2.0:oob"
        }'
{"scope":["openid","group1"],"client_id":"myclientid","resource_ids":["none"],"authorized_grant_types":["refresh_token","authorization_code"],"redirect_uri":["urn:ietf:wg:oauth:2.0:oob"],"autoapprove":[],"authorities":["uaa.none"],"lastModified":1547801596527,"required_user_groups":[]}

The client is now registered on the SAS Viya server. To get the access token, we send a command like we did when using the authorization code, just using the username and password.

curl -k https://sasserver.demo.sas.com/SASLogon/oauth/token \
     -H "Content-Type: application/x-www-form-urlencoded" -u "sas.cli:" -d "grant_type=password&username=sasdemo&password=mypassword"
{"access_token":"eyJhbGciOiJSUzI1NiIsImtpZCI6Imx...","token_type":"bearer","refresh_token":"eyJhbGciOiJSUzI1NiIsImtpZ...","expires_in":43199,"scope":"DataBuilders ApplicationAdministrators SASScoreUsers clients.read clients.secret uaa.resource openid PlanningAdministrators uaa.admin clients.admin EsriUsers scim.read SASAdministrators PlanningUsers clients.write scim.write","jti":"073bdcbc6dc94384bcf9b47dc8b7e479"}

From here, sending requests and refreshing the token steps are identical to the method explained in the authorization code example.

Final thoughts

At first, OAuth seems a little intimidating; however, after registering the client and creating the access and refresh tokens, the application will handle all authentication components . This process runs smoothly if you plan and make decisions up front. I hope this guide clears up any question you may have on securing your application with SAS. Please leave questions or comments below.

Authentication to SAS Viya: a couple of approaches was published on SAS Users.

1月 222019
 

You'll notice several changes in SAS Grid Manager with the release of SAS 9.4M6.
 
For the first time, you can get a grid solution entirely written by SAS, with no dependence on any external or third-party grid provider.
 
This post gives a brief architectural description of the new SAS grid provider, including all major components and their role. The “traditional” SAS Grid Manager for Platform has seen some architectural changes too; they are detailed at the bottom.

A new kid in town

SAS Grid Manager is a complex offering, composed of different layers of software. The following picture shows a very simple, high-level view. SAS Infrastructure here represents the SAS Platform, for example the SAS Metadata Server, SAS Middle Tier, etc. They service the execution of computing tasks, whether a batch process, a SAS Workspace server, and so on. In a grid environment these computing tasks are distributed on multiple hosts, and orchestrated/managed/coordinated by a layer of software that we can generically call Grid Infrastructure or Grid Middleware. That’s basically a set of lower-level components that sit between computing processes and the operating system.

Since its initial design more than a decade ago, the SAS Grid Manager offering has always been able to leverage different grid infrastructure providers, thanks to an abstraction layer that makes them transparent to end-user client software.
 
Our strategic grid middleware has been, since the beginning, Platform Suite for SAS, provided by Platform Computing (now part of IBM).
 
A few years ago, with the release of SAS 9.4M3, SAS started delivering an additional grid provider, SAS Grid Manager for Hadoop, tailored to grid environments co-located with Hadoop.
 
The latest version, SAS 9.4M6, opens up choices with the introduction of a new, totally SAS-developed grid provider. What’s its name? Well, since it’s SAS’s grid provider, we use the simplest one: SAS Grid Manager. To avoid confusion, what we used to call SAS Grid Manager has been renamed SAS Grid Manager for Platform.

The reasons for a choice

The SAS-developed provider for SAS Grid Manager:
 
• Is streamlined specifically for SAS workloads.
 
• Is easier to install (simply use the SAS Deployment Wizard (SDW) and administer.
 
• Extends workload management and scheduling capabilities into other technologies, such as
 
    o Third-party compute workloads like open source.
 
    o SAS Viya (in a future release).
 
• Reduces dependence of SAS Grid Manager on third party technologies.

So what are the new components?

The SAS-developed provider for SAS Grid Manager includes:
 
• SAS Workload Orchestrator
 
• SAS Job Flow Scheduler
 
• SAS Workload Orchestrator Web Interface
 
• SAS Workload Orchestrator Administration Utility

 
These are the new components, delivered together with others also available in previous releases and with other providers, such as the Grid Manager Thin Client Utility (a.k.a. SASGSUB), the SAS Grid Manager Agent Plug-in, etc. Let’s see these new components in more detail.

SAS Workload Orchestrator

The SAS Workload Orchestrator is your grid controller – just like Platform’s LSF is with SAS Grid Manager, it:
 
• Dispatches jobs.
 
• Monitors hosts and spreads the load.
 
• Is installed and runs on all machines in the cluster (but is not required on dedicated Metadata Server or Middle-tier hosts).
 
A notable difference, when compared to LSF, is that the SAS Workload Orchestrator is a single daemon, with its configuration stored in a single text file in json format.
 
Redeveloped for modern workloads, the new grid provider can schedule more types of jobs, beyond just SAS jobs. In fact, you can use it to schedule ANY job, including open source code running in Python, R or any other language.

SAS Job Flow Scheduler

SAS Job Flow Scheduler is the flow scheduler for the grid (just as Platform Process Manager is with SAS Grid Manager for Platform):
 
• It passes commands to the SAS Workload Orchestrator at certain times or events.
 
• Flows can be used to run many tasks in parallel on the grid.
 
• Flows can also be used to determine the sequence of events for multiple related jobs.
 
• It only determines when jobs are submitted to the grid, but they may not run immediately if the right conditions are not met (hosts too busy, closed, etc.)
 
The SAS Job Flow Scheduler provides flow orchestration of batch jobs. It uses operating system services to trigger the flow to handle impersonation of the user when it is time for the flow to start execution.
 
A flow can be built using the SAS Management Console or other SAS products such as SAS Data Integration Studio.
 
SAS Job Flow Scheduler includes the ability to run a flow immediately (a.k.a. “Run Now”), or to schedule the flow for some future time/recurrence.
 
SAS Job Flow Scheduler consists of different components that cooperate to execute flows:
 
SASJFS service is the main running service that handles the requests to schedule a flow. It runs on the middle tier as a dedicated thread in the Web Infrastructure Platform, deployed inside sasserver1. It uses services provided by the data store (SAS Content Server) and Metadata Server to read/write the configuration options of the scheduler, the content of the scheduled flows and the history records of executed flows.
 
Launcher acts as a gateway between SASJFS and OS Trigger. It is a daemon that accepts HTTP connections using basic authentication (username/password) to start the OS Trigger program as the scheduled user. This avoids the requirement to save end-users’ passwords in the grid provider, for both Windows and Unix.
 
OS Trigger is a stand-alone Java program that uses the services of the operating system to handle the triggering of the scheduled flow by providing a call to the Job Flow Orchestrator. On Windows, it uses the Windows Task Scheduler; on UNIX, it uses cron or crontab.
 
Job Flow Orchestrator is a stand-alone program that manages the flow orchestration. It is invoked by the OS scheduler (as configured by the OS Trigger) with the id of the flow to execute, then it connects to the SASJFS service to read the flow information, the job execution configuration and the credentials to connect to the grid. With that information, it sends jobs for execution to the SAS Workload Orchestrator. Finally, it is responsible for providing the history record for the flow back to SASJFS service.

Additional components

SAS Grid Manager provides additional components to administer the SAS Workload Orchestrator:
 
• SAS Workload Orchestrator Web Interface
 
• SAS Workload Orchestrator Administration Utility
 
Both can monitor jobs, queues, hosts, services, and logs, and configure hosts, queues, services, user groups, and user resources.
 
The SAS Workload Orchestrator Web Interface is a web application hosted by the SAS Workload Orchestrator process on the grid master host; it can be proxied by the SAS Web Server to always point to the current master in case of failover.

The SAS Workload Orchestrator Administration Utility is an administration command-line interface; it has a similar syntax to SAS Viya CLIs and is located in the directory /Lev1/Applications/GridAdminUtility. A sample invocation to list all running jobs is:
sas-grid-cli show-jobs --state RUNNING

What has not changed

Describing what has not changed with the new grid provider is an easy task: everything else.
 
Obviously, this is a very generic statement, so let’s call out a few noteworthy items that have not changed:
 
• User experience is unchanged. SAS programming interfaces to grid have not changed, apart from the lower-level libraries to connect to the new provider. As such, you still have the traditional SAS grid control server, SAS grid nodes, SAS thin client (aka SASGSUB) and the full SAS client (SAS Display Manager). Users can submit jobs or start grid-launched sessions from SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, etc.
 
• A directory shared among all grid hosts is still required to share the grid configuration files.
 
• A high-performance, clustered file system for the SASWORK area and for data libraries is mandatory to guarantee satisfactory performance.

What about SAS Grid Manager for Platform?

The traditional grid provider, now rebranded as SAS Grid Manager for Platform, has seen some changes as well with SAS 9.4M6:
 
• The existing management interface, SAS Grid Manager for Platform Module for SAS Environment Manager, has been completely re-designed. The user interface has completely changed, although the functions provided remain the same.
 
• Grid Management Services (GMS) is not updated to work with the latest release of LSF. Therefore, the SAS Grid Manager plug-in for SAS Management Console is no longer supported. However, the plug-in is included with SAS 9.4M6 if you want to upgrade to SAS 9.4M6 without also upgrading Platform Suite for SAS.

You can find more comprehensive information in these doc pages:
 
What’s New in SAS Grid Manager 9.4
 
• Grid Computing for SAS Using SAS Grid Manager (Part 2) section of Grid Computing in SAS 9.4

Native scheduler, new types of workloads, and more: introducing the new SAS Grid Manager was published on SAS Users.

1月 182019
 
Interested in learning about what's new in a software release? What if you want to know whether anything has changed in a SAS product? Or whether there are steps that you need to take before you upgrade to a new software release?
 
The online SAS product documentation for SAS® 9.4 and SAS® Viya® contains new sections that provide these answers. The following sections are usually listed on the left-hand side of the table of contents for the online Help document:
 
“What’s New in Base SAS: Details”
“What’s New in SAS 9.4 and SAS Viya”
“SAS Guide to Software Updates and Product Changes”
Note: To make the product-change information easier to find, this section was retitled for SAS® 9.4M6 and SAS® Viya® 3.4. For documentation about previous releases of SAS 9.4, this section is called “SAS Guide to Software Updates.” The information about product changes is included in a subsection called “Product Details and Requirements.” Although the title is different in newer documentation, the content remains the same.

What's available in each section?

• “What’s New” contains information about new features. For example, in SAS 9.4M6, “What’s New” discusses a new ODS destination for Word and a new procedure called PROC SGPIE.
 
• The “SAS Guide to Software Updates and Product Changes” includes the following subsections:
      o A section on software updates, which is for customers who are upgrading to a new release. (FYI: A software update is any modification that SAS provides for existing software. An upgrade is a new release of SAS. A maintenance release is a collection of updates that are applied to the currently installed software.)
      o Another subsection discusses product details and requirements. In it, you will find information about values or settings that have changed from one release to the next. For example, for SAS 9.4M6, the default style for HTML5 output changed from HTMLBlue to HTMLEncore. Another example is for SAS® 9.4M0, when the YEARCUTOFF= system option changed from 1920 to 1926.

Other links to these resources

In the “What’s New” documentation, there is a link in each section to the corresponding product topic in “SAS Guide to Software Updates and Product Changes.”

For example, when you scroll through this SAS Studio page, you see references both to various “What’s New” pages for different versions of SAS Studio and to “SAS Guide to Software Updates and Product Changes.”

In “What's New in Base SAS: Details,” you can search by the software and maintenance release to find new features. Beginning with SAS 9.4, new features for maintenance releases are introduced using the SAS 9.4Mx notation. For example, in the search box on the page, you can enter 9.4M6.

Final thoughts

With these new online Help sections, you can find information quickly about new features of the current SAS release, as well as what has changed from the previous release. As always, we welcome your feedback and suggestions for improving the documentation.

Special thanks

Elizabeth Downes and Marie Dexter in SAS Documentation Development were very willing to make the requested wording changes in the documentation. They also contributed to the content of this article. Thanks to both for their time and effort!

Navigating SAS documentation to find out about new, modified, and updated features was published on SAS Users.

1月 162019
 

If you've ever wanted to apply modern machine learning techniques for text analysis, but didn't have enough labeled training data, you're not alone. This is a common scenario in domains that use specialized terminology, or for use cases where customized entities of interest won't be well detected by standard, off-the-shelf entity models.

For example, manufacturers often analyze engineer, technician, or consumer comments to identify the name of specific components which have failed, along with the associated cause of failure or symptoms exhibited. These specialized terms and contextual phrases are highly unlikely to be tagged in a useful way by a pre-trained, all-purpose entity model. The same is true for any types of texts which contain diverse mentions of chemical compounds, medical conditions, regulatory statutes, lab results, suspicious groups, legal jargon…the list goes on.

For many real-world applications, users find themselves at an impasse, it being incredibly impractical for experts to manually label hundreds of thousands of documents. This post will discuss an analytical approach for Named Entity Recognition (NER) which uses rules-based text models to efficiently generate large amounts of training data suitable for supervised learning methods.

Putting NER to work

In this example, we used documents produced by the United States Department of State (DOS) on the subject of assessing and preventing human trafficking. Each year, the DOS releases publicly-facing Trafficking in Persons (TIP) reports for more than 200 countries, each containing a wealth of information expressed through freeform text. The simple question we pursued for this project was: who are the vulnerable groups most likely to be victimized by trafficking?

Sample answers include "Argentine women and girls," "Ghanaian children," "Dominican citizens," "Afghan and Pakistani men," "Chinese migrant workers," and so forth. Although these entities follow a predictable pattern (nationality + group), note that the context must also be that of a victimized population. For example, “French citizens” in a sentence such as "French citizens are working to combat the threats of human trafficking" are not a valid match to our "Targeted Groups" entity.

For more contextually-complex entities, or fluid entities such as People or Organizations where every possible instance is unknown, the value that machine learning provides is that the algorithm can learn the pattern of a valid match without the programmer having to anticipate and explicitly state every possible variation. In short, we expect the machine to increase our recall, while maintaining a reasonable level of precision.

For this case study, here is the method we used:

1. Using SAS Visual Text Analytics, create a rules-based, contextual extraction model on a sample of data to detect and extract the "Targeted Groups" custom entity. Next, apply this rules-based model to a much larger number of observations, which will form our training corpus for a machine learning algorithm. In this case, we used Conditional Random Fields (CRF), a sequence modeling algorithm also included with SAS Visual Text Analytics.
 
2. Re-format the training data to reflect the json input structure needed for CRF, where each token in the sentence is assigned a corresponding target label and part of speech.
 
3. Train the CRF model to detect our custom entity and predict the correct boundaries for each match.
 
4. Manually annotate a set of documents to use as a holdout sample for validation purposes. For each document, our manual label captures the matched text of the Targeted Groups entity as well as the start and end offsets where that string occurs within the larger body of text.
 
5. Score the validation “gold” dataset, assess recall and precision metrics, and inspect differences between the results of the linguistic vs machine learning model.

Let's explore each of these steps in more detail.

1. Create a rules-based, contextual extraction model

In SAS Visual Text Analytics, we created a simple model consisting of a few intermediate, "helper" concepts and the main Targeted Groups concept, which combines these entities to generate our final output.

The Nationalities List and Affected Parties concepts are simple CLASSIFIER lists of nationalities and vulnerable groups that are known a priori. The Targeted Group is a predicate rule which only returns a match if the aforementioned two entities are found in that order, separated by no more than 7 tokens, AND if there is not a verb intervening between the two entities (the verb "trafficking" being the only exception). This verb exclusion clause was added to the rule to prevent false matches such as "Turkish Cypriots lacked shelters for victims" and "Bahraini government officials stated that they encouraged victims to participate in the investigation and prosecution of traffickers." We then applied this linguistic model to all the TIP reports leading up to 2017, which would form the basis for our CRF training data.

Nationalities List Helper Concept:

Affected Parties Helper Concept:

Verb Exclusions Helper Concept:

Targeted Group Concept (Final Fact Rule):

2. Re-format the training data

The SAS Visual Text Analytics score code produces a transactional-style output for predicate rules, where each fact argument and the full match are captured in a separate row. Note that a single document may have more than one match, which are then listed according to _result_id_.

Using code, we joined these results back to the original table and the underlying parsing tables to transform the native output you see above to this, the json format required to train a CRF model:

Notice how every single token in each sentence is broken out separately and has both a corresponding label and part of speech. For all the tokens which are not part of our Targeted Groups entity of interest, the label is simple "O", for "Other". But, for matches such as "Afghan women and girls," the first token in the match has a label of "B-vic" for "Beginning of the Victim entity" and subsequent tokens in that match are labeled "I-vic" for "Inside the Victim entity."

Note that part of speech tags are not required for CRF, but we have found that including them as an input improves the accuracy of this model type. These three fields are all we will use to train our CRF model.

3. Train the CRF model

Because the Conditional Random Fields algorithm predicts a label for every single token, it is often used for base-level Natural Language Processing tasks such as Part of Speech detection. However, we already have part of speech tags, so the task we are giving it in this case is signal detection. Most of the words are "Other," meaning not of interest, and therefore noise. Can the CRF model detect our Targeted Groups entity and assign the correct boundaries for the match using the B-vic and I-vic labels?
 
After loading the training data to CAS using SAS Studio, we applied the crfTrain action set as follows:

After it runs successfully, we have a number of underlying tables which will be used in the scoring step.

4. Manually annotate a set of documents

For ease of annotation and interpretability, we tokenized the saved the original data by sentence. Using a purpose-built web application which enables a user to highlight entities and save the relevant text string and its offsets to a file, we then hand-scored approximately 2,200 sentences from 2017 TIP documents. Remember, these documents have not yet been "seen" by either the linguistic model or the CRF model. This hand-scored data will serve as our validation dataset.

5. Score the validation “gold” dataset by both models and assess results

Finally, we scored the validation set in SAS Studio with the CRF model, so we could compare human versus machine outcomes.

In a perfect world, we would hope that all the matches found by humans are also found by the model and moreover, the model detected even more valid matches than the humans. For example, perhaps we did not include "Rohingyan" or "Tajik" (versus Tajikistani) as nationalities in our CLASSIFIER list in our rules-based model, but the machine learning model detected victims from these groups them as a valid pattern nonetheless. This would be a big success, and one of the compelling reasons to use machine learning for NER use cases.

In a future blog, I'll detail the results of the outcomes, including modeling considerations such as:
 
  o The format of the CRF training template
 
  o The relative impact of including inputs such as part of speech tags
 
  o Precision and recall metrics
 
  o Performance and train times by volumes of training documents

Machine markup provides scale and agility

In summary, although human experts might produce the highest-quality annotations for NER, machine markup can be produced much more cheaply and efficiently -- and even more importantly, scale to far greater data volumes in a fraction of the time. Generating a rules-based model to generate large amounts of "good enough" labeled data is an excellent way to take advantage of these economies of scale, reduce the cost-barrier to exploring new use cases, and improve your ability to quickly adapt to evolving business objectives.

Reduce the cost-barrier of generating labeled text data for machine learning algorithms was published on SAS Users.

1月 142019
 

In the second of three posts on using automated analysis with SAS Visual Analytics, we used the automated analysis object to get a better understanding of our variable of interest, X-Sell and Up-sell Flag, and how it is influenced by other variables in our dataset.

In this third and final post, you'll see how to filter the data even more to set up your customer care workers for success.

Remember how on the left-hand side of the analysis we had a list of subgroups with their probabilities? We can use those to filter our data or create additional subsets of data. Let’s create a calculated category from one of the subgroups and then use that to filter a list table of customers. If I right click on the 87% subgroup and select Derive subgroup item a new calculated category will appear in my Data pane.

Here is the new data item located in our data pane:

To see the filter for this data object we can right click on it and select edit.

We can now use this category as a filter. Here we have a basic customer table that does not have a filter applied:

If we apply the filter for customers who fall in the 87% subgroup and a filter for those customers who have not yet upgraded, we have a list of customers that are highly likely to upgrade.

We could give this list to our customer care centers and have them call these customers to see if they want to upgrade. Alternatively, the customer care center could use this filter to target customers for upgrades when they call in. So, if a customer calls into the center, the employee could see if that customer meets the criteria set out in the filter. If they do, they are highly likely to upgrade, and the employee should provide an offer to them.

How to match callers with sales channels

Let’s go back to our automated analysis and perform one more action. We’ll create a new object from the subgroup and assess the group by acquisition channel. This will help us determine which acquisition channel(s) the customers who are in our 87% subgroup purchased their plans from. Then we’ll know which sales teams we need to communicate to about our sales strategy.

To do this we’ll select our 87% group, right click and select New object from subgroup on new page, then Acquisition Channel.

Here we see the customers who are in or out of our subgroup by acquisition channel.

Because it is difficult to see the "in" group, we’ll remove those customers who are out of our subgroup by selecting out from the legend then right click and select New filter from selection, then Exclude selection.

Now we can see which acquisition channel the 87% subgroup purchased their current plan from and how many have already upgraded.

In less than a minute using SAS Visual Analytics' automated analysis we’ve gained business insights based on machine learning that would have taken hours to produce manually. Not only that, we’ve got easy-to-understand results that are built with natural language processing. We can now analyze all variables and remove any bias, ensuring we don’t miss key findings. Business users gain access to analytics without them having the expert skills needed to build models and interpret results. Automated analysis is a start and SAS is committed to investing time and resources into this new wave of BI. Look for more enhancements in future releases!

Miss the previous posts?

This is the third of a three-part series demonstrating automated analysis using SAS Visual Analytics on Viya. Part 1 describes a common visualization approach to handling customer data that leaves room for error and missed opportunities. Part 2 shows improvements through automated analysis.

Want to see automated analysis in action? Watch this video!

How SAS Visual Analytics' automated analysis takes customer care to the next level - Part 3 was published on SAS Users.

1月 102019
 

Everyone’s excited about artificial intelligence. But most people, in most jobs, struggle to see the how AI can be used in the day-to-day work they do. This post, and others to come, are all about practical AI. We’ll dial the coolness factor down a notch, but we explore some real gains to be made with AI technology in solving business problems in different industries.

This post demonstrates a practical use of AI in banking. We’ll use machine learning, specifically neural networks, to enable on-demand portfolio valuation, stress testing, and risk metrics.

Background

I spend a lot of time talking with bankers about AI. It’s fun, but the conversation inevitably turns to concerns around leveraging AI models, which can have some transparency issues, in a highly-regulated and highly-scrutinized industry. It’s a valid concern. However, there are a lot of ways the technology can be used to help banks –even in regulated areas like risk –without disrupting production models and processes.

Banks often need to compute the value of their portfolios. This could be a trading portfolio or a loan portfolio. They compute the value of the portfolio based on the current market conditions, but also under stressed conditions or under a range of simulated market conditions. These valuations give an indication of the portfolio’s risk and can inform investment decisions. Bankers need to do these valuations quickly on-demand or in real-time so that they have this information at the time they need to make decisions.

However, this isn’t always a fast process. Banks have a lot of instruments (trades, loans) in their portfolios and the functions used to revalue the instruments under the various market conditions can be complex. To address this, many banks will approximate the true value with a simpler function that runs very quickly. This is often done with first- or second-order Taylor series approximation (also called quadratic approximation or delta-gamma approximation) or via interpolation in a matrix of pre-computed values. Approximation is a great idea, but first- and second-order approximations can be terrible substitutes of the true function, especially in stress conditions. Interpolation can suffer the same draw-back in stress.

An American put option is shown for simplicity. The put option value is non-linear with respect to the underlying asset price. Traditional approximation methods, including this common second-order approximation, can fail to fit well, particularly when we stress asset prices.

Improving approximation with machine learning

Machine learning is technology commonly used in AI. Machine learning is what enables computers to find relationships and patterns among data. Technically, traditional first- order and second-order approximation is a form of classical machine learning, such as linear regression. But in this post we’ll leverage more modern machine learning, like neural networks, to get a better fit with ease.

Neural networks can fit functions with remarkable accuracy. You can read about the universal approximation theorem for more about this. We won’t get into why this is true or how neural networks work, but the motivation for this exercise is to use this extra good-fitting neural network to improve our approximation.

Each instrument type in the portfolio will get its own neural network. For example, in a trading portfolio, our American options will have their own network and interest rate swaps, their own network.

The fitted neural networks have a small computational footprint so they’ll run very quickly, much faster than computing the true value of the instruments. Also, we should see accuracy comparable to having run the actual valuation methods.

The data, and lots of it

Neural networks require a lot of data to train the models well. The good thing is we have a lot of data in this case, and we can generate any data we need. We’ll train the network with values of the instruments for many different combinations of the market factors. For example, if we just look at the American put option, we’ll need values of that put option for various levels of moneyness, volatility, interest rate, and time to maturity.

Most banks already have their own pricing libraries to generate this data and they may already have much of it generated from risk simulations. If you don’t have a pricing library, you may work through this example using the Quantlib open source pricing library. That’s what I’ve done here.

Now, start small so you don’t waste time generating tons of data up front. Use relatively sparse data points on each of the market factors but be sure to cover the full range of values so that the model holds up under stress testing. If the model was only trained with interest rates of 3 -5 percent, it’s not going to do well if you stress interest rates to 10 percent. Value the instruments under each combination of values.

Here is my input table for an American put option. It’s about 800k rows. I’ve normalized my strike price, so I can use the same model on options of varying strike prices. I’ve added moneyness in addition to underlying.

This is the input table to the model. It contains the true option prices as well as the pricing inputs. I used around 800K observations to get coverage across a wide range of values for the various pricing inputs. I did this so that my model will hold up well to stress testing.

The model

I use SAS Visual Data Mining and Machine Learning to fit the neural network to my pricing data. I can use either the visual interface or a programmatic interface. I’ll use SAS Studio and its programmatic interface to fit the model. The pre-defined neural network task in SAS Studio is a great place to start.

Before running the model, I do standardize my inputs further. Neural networks do best if you’ve adjusted the inputs to a similar range. I enable hyper-parameter auto-tuning so that SAS will select the best model parameters for me. I ask SAS to output the SAS code to run the fitted model so that I can later test and use the model.

The SAS Studio Neural Network task provides a wizard to specify the data and model hyper parameters. The task wizard generates the SAS code on the right. I’ve allowed auto-tuning so that SAS will find the best model configuration for me.

I train the model. It only takes a few seconds. I try the model on some new test data and it looks really good. The picture below compares the neural network approximation with the true value.

The neural network (the solid red line) fits very well to the actual option prices (solid blue line). This holds up even when asset prices are far from their base values. The base value for the underlying asset price is 1.

If your model’s done well at this point, then you can stop. If it’s not doing well, you may need to try a deeper model, or different model, or add more data. SAS offers model interpretability tools like partial dependency to help you gauge how the model fits for different variables.

Deploying the model

If you like the way this model is approximating your trade or other financial instrument values, you can deploy the model so that it can be used to run on-demand stress tests or to speed up intra-day risk estimations. There are many ways to do this in SAS. The neural network can be published to run in SAS, in data-base, in Hadoop, or in-stream with a single click. I can also access my model via REST API, which gives me lots of deployment options. What I’ll do, though, is use these models in SAS High-Performance Risk (HPRisk) so that I can leverage the risk environment for stress testing and simulation and use its nice GUI.

HPRisk lets you specify any function, or method, to value an instrument. Given the mapping of the functions to the instruments, it coordinates a massively parallel run of the portfolio valuation for stress testing or simulation.

Remember the SAS file we generated when we trained the neural network. I can throw that code into HPRisk’s method and now HPRisk will run the neural network I just trained.

I can specify a scenario through the HPRisk UI and instantly get the results of my approximation.

Considerations

I introduced this as a practical example of AI, specifically machine learning in banking, so let’s make sure we keep it practical, by considering the following:
 
    • Only approximate instruments that need it. For example, if it's a European option, don’t approximate. The function to calculate its true price, the Black-Scholes equation, already runs really fast. The whole point is that you’re trying to speed up the estimation.
 
    • Keep in mind that this is still an approximation, so only use this when you’re willing to accept some inaccuracy.
 
    • In practice, you could be training hundreds of networks depending on the types of instruments you have. You’ll want to optimize the training time of the networks by training multiple networks at once. You can do this with SAS.
 
    • The good news is that if you train the networks on a wide range of data, you probably won’t have to retrain often. They should be pretty resilient. This is a nice perk of the neural networks over the second-order approximation whereby parameters need to be recomputed often.
 
    • I’ve chosen neural networks for this example but be open to other algorithms. Note that different instruments may benefit from different algorithms. Gradient boosting and others may offer simpler, more intuitive models, that get similar accuracy.

When it comes to AI in business, you’re most likely to succeed when you have a well-defined problem, like our stress testing that takes too long or isn’t accurate. You also need good data to work with. This example had both, which made it a good candidate for to demonstrate practical AI.

More resources

Interested in other machine learning algorithms or AI technologies in general? Here are a few resources to keep learning.

Article: A guide to machine learning algorithms and their applications
Blog post: Which machine learning algorithm should I use?
Video: Supervised vs. Unsupervised Learning
Article: Five AI technologies that you need to know

Practical AI in banking was published on SAS Users.