CAS

11月 222019
 

sxlion

大家有一个普通的印象:SAS的更新很慢,很老很落后。可能跟它的版本命名有关,SAS9.0是2004年出来的,到现在都快20年了,版本号 还停留在9字头,并且还没有继续更新的迹象。当然这个与SAS公司的明面

原创文章: ”难道是SAS10?云分析服务时代的到来“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/454

11月 222019
 

sxlion

大家有一个普通的印象:SAS的更新很慢,很老很落后。可能跟它的版本命名有关,SAS9.0是2004年出来的,到现在都快20年了,版本号 还停留在9字头,并且还没有继续更新的迹象。当然这个与SAS公司的明面

原创文章: ”难道是SAS10?云分析服务时代的到来“,转载请注明: 转自SAS资源资讯列表

本文链接地址: http://saslist.net/archives/454

9月 262019
 

In part 1 of this post, we looked at setting up Spark jobs from Cloud Analytics Services (CAS) to load and save data to and from Hadoop. Now we are moving on to the next step in the analytic cycle, scoring data in Hadoop and executing SAS code as a Spark job. The Spark scoring jobs execute using SAS In-Database Technologies for Hadoop.

The integration of the SAS Embedded Process and Hadoop allows scoring code to run directly on Hadoop. As a result, publishing and scoring of both DS2 and Data step models occur inside Hadoop. Furthermore, access to Spark data exists through the SAS Workspace Server or the SAS Compute Server using SAS/ACCESS to Hadoop, or from the CAS server using SAS Data Connectors.

Scoring Data from CAS using Spark

SAS PROC SCOREACCEL provides an interface to the CAS server for DS2 and DATA step model publishing and scoring. Model code is published from CAS to Spark and then executed via the SAS Embedded Process.

PROC SCOREACCEL supports a file interface for passing the model components (model program, format XML, and analytic stores). The procedure reads the specified files and passes their contents on to the model-publishing CAS action. In this case, the files must be visible from the SAS client.

CAS publishModel and runModel actions publish and execute score data in Spark:

 
%let CLUSTER="/opt/sas/viya/config/data/hadoop/lib:
  /opt/sas/viya/config/data/hadoop/lib/spark:/opt/sas/viya/config/data/hadoop/conf";
proc scoreaccel sessref=mysess1;
publishmodel
   target=hadoop
   modelname="simple01"
   modeltype=DS2
/* 
   filelocation=local */
programfile="/demo/code/simple.ds2"
username="cas"
modeldir="/user/cas"
classpath=&CLUSTER.
; runmodel 
    target=hadoop
    modelname="simple01"
    username="cas"
    modeldir="/user/cas"
    server=hadoop.server.com'
    intable="simple01_scoredata"
    outtable="simple01_outdata"
    forceoverwrite=yes
    classpath=&CLUSTER.
    platform=SPARK
; 
quit;

In the PROC SCOREACCEL example above, a DS2 model is published to Hadoop and executed with the Spark processing engine. The CLASSPATH statement specifies a link to the Hadoop cluster. The input and output tables, simple01_scoredata and simple01_outdata, already exist on the Hadoop cluster.

Score data in Spark from CAS using SAS Scoring Accelerator

SAS Scoring Accelerator execution status in YARN as a Spark job from CAS

As you can see in the image above, the model is scored in Spark using the SAS Scoring Accelerator. The Spark job name reflects the input and output tables.

Scoring Data from MVA SAS using Spark

Steps to run a scoring model in Hadoop:

  1. Create a traditional scoring model using SAS Enterprise Miner or an analytic store scoring model, generated using SAS Factory Miner HPFOREST or HPSVM components.
  2. Specify the Hadoop connection attributes: %let indconn= user=myuserid;
  3. Use the INDCONN macro variable to provide credentials to connect to the Hadoop HDFS and Spark. Assign the INDCONN macro before running the %INDHD_PUBLISH_MODEL and the %INDHD_RUN_MODEL macros.
  4. Run the %INDHD_PUBLISH_MODEL macro.
  5. With traditional model scoring, the %INDHD_PUBLISH_MODEL performs multiple tasks using some of the files created by the SAS Enterprise Miner Score Code Export node. Using the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog, this model performs the following tasks:

    • translates the scoring model into the sasscore_modelname.ds2 file, runs scoring inside the SAS Embedded Process
    • takes the format catalog, if available, and produces the sasscore_modelname.xml file. This file contains user-defined formats for the published scoring model.
    • uses SAS/ACCESS Interface to Hadoop to copy the sasscore_modelname.ds2 and sasscore_modelname.xml scoring files to HDFS
  6. Run the %INDHD_RUN_MODEL macro.

The %INDHD_RUN_MODEL macro initiates a Spark job that uses the files generated by the %INDHD_PUBLISH_MODEL to execute the DS2 program. The Spark job stores the DS2 program output in the HDFS location specified by either the OUTPUTDATADIR= argument or by the element in the HDMD file.

Here is an example:

 
option set=SAS_HADOOP_CONFIG_PATH="/opt/sas9.4/Config/Lev1/HadoopServer/conf";
option set=SAS_HADOOP_JAR_PATH="/opt/sas9.4/Config/Lev1/HadoopServer/lib:/opt/sas9.4/Config/Lev1/HadoopServer/lib/spark";
 
%let scorename=m6sccode;
%let scoredir=/opt/code/score;
option sastrace=',,,d' sastraceloc=saslog;
option set=HADOOPPLATFORM=SPARK;
 
%let indconn = %str(USER=hive HIVE_SERVER=’hadoop.server.com');
%put &indconn;
%INDHD_PUBLISH_MODEL( dir=&scoredir., 
        datastep=&scorename..sas,
        xml=&scorename..xml,
        modeldir=/sasmodels,
        modelname=m6score,
        action=replace);
 
%INDHD_RUN_MODEL(inputtable=sampledata, 
outputtable=sampledata9score, 
scorepgm=/sasmodels/m6score/m6score.ds2, 
trace=yes, 
    platform=spark);
Score data in Spark from MVA SAS using Spark

SAS Scoring Accelerator execution status in YARN as a Spark job from MVA SAS

To execute the job in Spark, either set the HADOOPPLATFORM= option to SPARK or set PLATFORM= to SPARK inside the INDHD_RUN_MODEL macro. The SAS Scoring Accelerator uses SAS Embedded Process to execute the Spark job with the job name containing the input table and output table.

Executing user-written DS2 code using Spark

User-written DS2 programs can be complex. When running inside a database, a code accelerator execution plan might require multiple phases. Because of Scala program generation that integrates with the SAS Embedded Process program interface to Spark, the many phases of a Code Accelerator job reduces to one single Spark job.

In-Database Code Accelerator

The SAS In-Database Code Accelerator on Spark is a combination of generated Scala programs, Spark SQL statements, HDFS files access, and DS2 programs. The SAS In-Database Code Accelerator for Hadoop enables the publishing of user-written DS2 thread or data programs to Spark, executes in parallel, and exploits Spark’s massively parallel processing. Examples of DS2 thread programs include large transpositions, computationally complex programs, scoring models, and BY-group processing.

Below is a table of required DS2 options to execute as a Spark job.

DS2ACCEL Set to YES
HADOOPPLATFORM Set to SPARK

 

There are six different ways to run the code accelerator inside Spark, called CASES. The generation of the Scala program by the SAS Embedded Process Client Interface depends on how the DS2 program is written. In the following example, we are looking at Case 2, which is a thread and a data program, neither of them with a BY statement:

 
proc ds2 ds2accel=yes;
thread work.workthread / overwrite=yes; 
        method run();
          set hdplib.cars;
          output;
end; endthread; run; 
  data hdplib.carsout (overwrite=yes); dcl thread work.workthread m;
  dcl double count;
  keep count make model;
method run(); set from m; count+1; output; 
end; enddata; run; quit;

The entire DS2 program runs in two phases. The DS2 thread program runs during Phase One, and its tasks execute in parallel. The DS2 data program runs during Phase Two using a single task.

Finally

With SAS Scoring Accelerator and Spark integration, users have the power and flexibility to process and score modeled data in Spark. SAS Code Accelerator and Spark integration takes the flexibility further to process any Spark data using DS2 code. Furthermore it is now possible for business to respond to use cases immediately and with higher reliability in the big data space.

Data and Analytics Innovation using SAS & Spark - part 2 was published on SAS Users.

8月 282019
 

This article is not a tutorial on Hadoop, Spark, or big data. At the same time, no prerequisite knowledge of these technologies is required for understanding. We’ll give you enough background prior to diving into the details. In simplest terms, the Hadoop framework maintains the data and Spark controls and directs data processing. As an analogy, think of Hadoop as a train, big data as the payload, and Spark as the crew driving the train and organizing and distributing the goods.

Big data

I recently read that data volumes are doubling each year. Not that long ago we talked in terms of gigabytes. This quickly turned into terabytes and we’re now in the age of petabytes. The type of data is also changing. Data used to fit neatly into rows and columns. Now, nearly eighty percent of data is unstructured. All these trends and facts have led us to deal with massive amounts of data, aka big data. Maintaining and processing big data required creating technical frameworks. Next, we’ll investigate a couple of these tools.

Hadoop

Hadoop is a technology stack utilizing parallel processing on a distributed filesystem. Hadoop is useful to companies when data sets become so large or complex that their current solutions cannot effectively process the information in a reasonable amount of time. As the data science field has matured over the past few years, so has the need for a different approach to processing data.

Apache Spark

Apache Spark is a cluster-computing framework utilizing both iterative algorithms and interactive/exploratory data analysis. The goal of Spark is to keep the benefits of Hadoop’s scalable, distributed, fault-tolerant processing framework, while making it more efficient and easier to use. Using in-memory distributed computing, Spark provides capabilities over and above the batch model of Hadoop MapReduce. As a result, this brings to the big data world new applications of data science that were previously too expensive or slow on massive data sets.

Now let’s explore how SAS integrates with these technologies to maximize capturing, managing, and analyzing big data.

SAS capabilities to leverage Spark

SAS provides Hadoop data processing and data scoring capabilities using SAS/ACCESS Interface to Hadoop and In-Database Technologies to Hadoop with MapReduce or Spark as the processing framework. This addresses some of the traditional data management batch processing, huge volumes of extract, transform, load (ETL) data as well as faster, interactive and in-memory processing for quicker response with Spark.

In SAS Viya, SAS/ACCESS Interface to Hadoop includes SAS Data Connector to Hadoop. All users with SAS/ACCESS Interface to Hadoop can use the serial. Likewise, SAS Data Connect Accelerator to Hadoop can load or save data in parallel between Hadoop and SAS using SAS Embedded Process, as a Hive/MapReduce or Spark job.

Connecting to Spark in a Hadoop Cluster

There are two ways to connect to a Hadoop cluster using SAS/ACCESS Interface to Hadoop, based on the SAS environment: LIBNAME and CASLIB statements.

LIBNAME statement to connect to Spark from MVA SAS

options set=SAS_HADOOP_JAR_PATH="/third_party/Hadoop/jars/lib:/third_party/Hadoop/jars/lib/spark"; 
options set=SAS_HADOOP_CONFIG_PATH="/third_party/Hadoop/conf"; 
 
libname hdplib hadoop server="hadoop.server.com" port=10000 user="hive"
schema='default' properties="hive.execution.engine=SPARK";

Parameters

SAS_HADOOP_JAR_PATH Directory path for the Hadoop and Spark JAR files
SAS_HADOOP_CONFIG_PATH Directory path for the Hadoop cluster configuration files
Libref The hdplib libref specifies the location where SAS will find the data
SAS/ACCESS Engine Name HADOOP option to connect Hadoop engine
SERVER Hadoop Hive server to connect
PORT Listening Hive server Port. 10000 is the default, so it is not required. It is included just in case
USER and PASSWORD Are not always required
SCHEMA Hive schema to access. It is optional; by default, it connects to the “default” schema
PROPERTIES Hadoop properties. Choosing SPARK for the property hive.execution.engine enables SAS Viya to use Spark as the execution platform

 
CASLIB statement to connect from CAS

caslib splib sessref=mysession datasource=(srctype="hadoop", dataTransferMode="auto",username="hive", server="hadoop.server.com", 
hadoopjarpath="/opt/sas/viya/config/data/hadoop/lib:/opt/sas/viya/conf ig/data/hadoop/lib/spark", 
hadoopconfigdir="/opt/sas/viya/config/data/hadoop/conf", schema="default"
platform="spark"
dfdebug="EPALL" 
properties="hive.execution.engine=SPARK");

Parameters

CASLIB Space holder for the specified data access. The splib CAS library specifies the Hadoop data source
sessref Holds the CAS library in a specific CAS session. mysession is the current active CAS session
SRCTYPE Type of data source
DATATRANSFERMODE Type of data movement between CAS and Hadoop. Accepts one of three values – serial, parallel, auto. When AUTO is specified, CAS choose the type of data transfer based on available license in the system. If Data Connect Accelerator to Hadoop has been licensed, parallel data transfer will be used, otherwise serial mode of transfer is used
HADOOPJARPATH Hadoop and Spark JAR files location path on the CAS cluster
HADOOPCONFIGDIR Hadoop configuration files location path on the CAS cluster
PLATFORM Type of Hadoop platform to execute the job or transfer data using SAS Embedded Process. Default value is “mapred” for Hive MapReduce. When using “Spark”, data transfer and job executes as a Spark job
DFDEBUG Used to receive additional information back from SAS Embedded Process transfers data in the SAS log
PROPERTIES Hadoop properties. Choosing SPARK for the property hive.execution.engine enables SAS Viya to use Spark as the execution platform

 

Data Access using Spark

SAS Data Connect Accelerator for Hadoop with the Spark platform option uses Hive as the query engine to access Spark data. Data movement happens between Spark and CAS through SAS generated Scala code. This approach is useful when data already exists in Spark and either needs to be used for SAS analytics processing or moved to CAS for massively parallel data and analytics processing.

Loading Data from Hadoop to CAS using Spark

Processing data in CAS offers advanced data preparation, visualization, modeling and model pipelines, and finally model deployment. Model deployment can be performed using available CAS modules or pushed back to Spark if the data is already in Hadoop.

Load data from Hadoop to CAS using Spark

proc casutil 
      incaslib=splib
      outcaslib=casuser;
      load casdata="gas"
      casout="gas"
      replace;
run;

Parameters

PROC CASUTIL Used to process CAS action routines to process data
INCASLIB Input CAS library to read data
OUTCASLIB Output CAS library to write data
CASDATA Table to load to the CAS in-memory server
CASOUT Output CAS table name

 

We can look at the status of the data load job using Hadoop' resource management and job scheduling application, YARN. YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.

Loading Data from Hadoop to Viya CAS using Spark

In the figure above, the YARN application executed the data load as a Spark job. This was possible because the CASLIB statement had Platform= Spark option specified. The data movement direction, in this case Hadoop to CAS uses the Spark job name, “SAS CAS/DC Input,” where “Input” is data loaded into CAS.

Saving Data from CAS to Hadoop using Spark

You can save data back to Hadoop from CAS at many stages of the analytic life cycle. For example, use data in CAS to prepare, blend, visualize, and model. Once the data meets the business use case, data can be saved in parallel to Hadoop using Spark jobs to share with other parts of the organization.

Using the SAVE CAS action to move data to Hadoop using Spark

proc cas;
session mysession; 
      table.save /
      caslib="splib"
      table={caslib="casuser", name="gas"},
      name="gas.sashdat"
      replace=True;
quit;

Parameters

PROC CAS Used to execute CAS actionsets and actions to process data.
“table” is the actionset and “save” is the action
TABLE Location and name of the source table
NAME Name of the target table saved to the Hadoop library using Spark

 

We can verify the status of saving data from CAS to Hadoop using YARN application. Data from CAS saves as a Hadoop table using, Spark as the execution platform. Furthermore, as SAS Data Connect Accelerator for Hadoop transfers data in parallel, individual Spark executors in each of the Spark executor nodes handles data execution for that specific Hadoop cluster node.

Saving Data from Viya CAS to Hadoop using Spark

Finally, the SAVE data executed as a Spark job. As we can see from YARN, the Spark job named “SAS CAS/DC Output” specifies that the data moves from CAS to Hadoop.

Where we are; where we're going

We have so far traveled across the Spark pond to setup SAS libraries for Spark, Load and Save data from and to Hadoop using Spark. In the next section we’ll look at ways to Score data and execute SAS code inside Hadoop using Spark.

Data and Analytics Innovation using SAS & Spark - part 1 was published on SAS Users.

6月 262019
 

In his article How to use CASL to develop and work with user-defined CAS actions, Brian Kinnebrew defines CASL as "a language specification used by the SAS client to interact with and provide easy access to Cloud Analytic Services (CAS). CASL is a statement-based scripting language with many uses and strengths." I can't come up with a better definition, so if you'd like to learn more about the basics of CASL, I encourage you to read Brian's post.

In a SAS Stored Process (or any traditional SAS program) the code has multiple DATA steps and procedures with a good dose of macros.

Over the last couple of years, the focus of my projects is the use of CAS actions with no traditional procedures in the mix. Most of these involved web applications calling multiple CAS actions. My initial approach was to make multiple http calls - one per action. This could get tedious.

Then I met my action hero: sccasl.runCasl.

My favorite SAS CASL action

This action executes a CASL "script" on the CAS server analogous to executing a SAS Stored Process. Running a CASL program with a mix of CAS actions and CASL statements on the CAS server has these benefits:

  1. Reduced the number of http calls to the server
  2. The client-side code is much easier to reason
  3. The returned values can be a dictionary that is suited for further consumption by the client, simplifying the client code

My personal name for these scripts is "CAS stored process".

Where art thou Macro?

In many applications, user input is passed to the code running on the server. In a SAS Stored Process, macros pass the parameters. CASL has no macros. My initial approaches to passing parameters to CASL programs were:

  1. Generate the final CASL program in JavaScript with the user input values inserted into the code. Sample code is available here.
    • Drawback: Debugging the code in SAS Studio requires a cut-and-paste of the generated code into SAS Studio.
  2. Load the data into a CAS table using table.upload action for programs needing an input table. Sample code is available here.
    • Drawback: This requires an additional http call to the server.

In developing the GraphQL approach to writing applications - GraphQL and SAS Viya applications - a good match - I addressed the two drawbacks listed above by creating two functions and put them in my utility belt.

    Superfriend functions

  • jsonToDict.js - generate a string having the CASL dictionary version of a JavaScript object
  • argsToTable - create a CAS table from a dictionary

The remainder of this article discusses these two functions. To demonstrate the functions' usage, I use code listings from the scoring example, covered in the GraphQL example.

The jsonToDict function

The result of this function produces a string with CASL dictionary suitable for inclusion in CASL code.

Function definition

jsonToDict ⇒ string

Returnsstring - returns the string containing the CASL dictionary

Param Type Description
obj object the JavScript object of interest
name string the name to assign to the dictionary

 

Example function code

The code below outlines the jsonToDict function usage.

obj = {x:1, b:2, c:['a', 'b']};
let r = jsonToDict(obj, '_appEnv_');
The result is:
r = `_appEnv_ = {x=1, b=2, c={"a", "b"}}'`;

The following lists the input parameters passed to the CASL code for scoring.

let input = {
        JOB    : 'J1',
        CLAGE  : 100, 
        CLNO   : 20, 
        DEBTINC: 20, 
        DELINQ : 2, 
        DEROG  : 0, 
        MORTDUE: 4000, 
        NINQ   : 1,
        YOJ    : 10,
        LOAN: 1000,
        ASSET: 100000
    };

Below is the Javascript code and the result.

let _args_ = jsonToDict(input, '_args_');
results in
 
args_ = `_args_ = {  JOB= "J1" ,CLAGE=100  ,CLNO=20  ,DEBTINC=20  ,DELINQ=2  ,DEROG=0  ,
MORTDUE=4000  ,NINQ=1  ,YOJ=10  ,LOAN=1000  ,ASSET=100000  };`;

To allow the use of different versions of the model, the name of the scoring model is passed in as a parameter. The Javascript for the model follows.

let env = {
     astore: {
            caslib: 'Public',
            name  : 'GRADIENT_BOOSTING___BAD_2'
        }
};
let _appEnv_= jsonToDict(env, '_appEnv_');
 
resulting in:
let _appEnv_ = `_appEnv_ = { astore = { caslib="Public", name="GRADIENT_BOOSTING___BAD_2"}};`;

Next, I prepend the strings to the CASL program in the client code with the following code snips.

let code = _args_ + _appEnv_ + `the CASL code shown below';
loadactionset "astore";
 
/* convert arguments to a cas table */
argsToTable(_args_, 'casuser', 'INPUTDATA' );
 
/* score */
action astore.score r=rc/
    table  = { caslib= 'casuser', name = 'INPUTDATA' } 
    rstore = { caslib= _appEnv_.astore.caslib,  name=_appEnv_.astore.name }
    casout  = { caslib = 'casuser', name = 'OUTPUTDATA' replace= TRUE};
 
/* fetch results */
action table.fetch r = result /
    table = {  caslib = 'casuser' name = 'OUTPUTDATA' } ;
 
/* extract the score and send it as a dictionary */
score = result.Fetch[1].P_BAD;
send_response({score = score});

Now the CASL program can access the incoming information using the two dictionaries _args_ and _appEnv_. Note: As a personal choice, I use a convention of _args_ for user input and _appEnv_ for application specific information. I use the restaf application framework to make the http calls as shown below.

let payload = {
        action: 'sccasl.runCasl',
        data  : { code: code}
    }
 let result = await store.runAction(session, payload); 
let score = result.items('results', 'score');

CASL coding - easier than saying Kilp-ill-skim

Notice the absence of string substitutions that look strange. Just simple, straight forward coding. Easier than saying your name backwards, forcing you back to the fifth dimension.

The argsToTable.casl function

The sample code above used the function argsToTable. As you may guess it converts the _args_ dictionary into a CAS table used in the scoring action. The argsToTable is the function in CASL handling this task.

Function definition

argsToTable ⇒ Load dictionary into a CAS Table

Param Type Description
input dictionary the data to load
caslib string caslib of output table
name string name of output table

 

The relevant CASL code from Step 1 is reproduced here:

argsToTable(_args_, 'casuser', 'INPUTDATA' );

The argsToTable function is either stored on a server or prepended to the CASL code sent to the runCasl action. This function removes the need to run a http call to load the data in the CAS table.

Returning data from CAS - the send_response function

The ultimate sidekick function

Any good super hero has a sidekick. The function send_response in CASL is very versatile - it allows one to return data in the form the application needs and allows more than one result. In many programs I return data in a form easily consumed by the client code.

For example, if you wanted to return just the rows of table you can do the following:

function resultsToDict(r);
    casResults = {};
    i = 1;
  do row over r;
     casResults[i] = row;
     i = i + 1;
   end;
  return casResults;
end;
 
/* and use it as follows: */
 
action table.fetch r = result /
    table = {  caslib = 'casuser',  name = 'mydata' };
/* extract the data and return it as a dictionary */
casResults = resultsToDict(result.Fetch);
send_response({casResults: casResults});

Finally

Using a combination of CASL, a couple of utility functions and the runCasl action you can develop some very efficient programs with minimal traffic between your client and the server. If you run multiple actions in sequence you should consider grouping them into a CASL program and executing them on the CAS server using the runCasl action.

Next

In my next article, which I hope to finish in a flash, I will discuss using the runCasl action to create a browser for CAS tables with support for pagination.

All comments are welcome. Please feel free to clone the code, make it better.

Cheers...
Deva

Let runCasl be your BFF and favorite action hero

"CAS Stored Process" with my Favorite Action Hero runCasl was published on SAS Users.

6月 212019
 

For every project in SAS®, the first step is almost always making your data available. This blog shows you how to load three of the most common input data types—a data set, a text file, and a Microsoft Excel file—into SAS® Cloud Analytic Services (CAS) tables.

The three methods that I show here are the three easiest ways to load each data type into CAS. Multiple tools can load data into CAS, but I am showing the tools that I consider the easiest to use and that are probably the most familiar to SAS programmers.

You need to place your data in a location that can be accessed by the programming environment that is used to access CAS. The most common programming environment that accesses CAS is SAS® Studio. Input data files that are used in CAS are going to be very large. You will need to use an SFTP tool to move your data from your PC to a directory that can be accessed by your SAS Studio session. (Check with your system administrator to see what the preferred tool is at your site.)

After your data is in a location that can be accessed by the programming environment, you need to start a CAS session. This is done with the CAS statement; here is the syntax:

cas session-name <option(s)>;

The options that you specify depend on how your system administrator configured your environment. For example, I asked my system administrator to set it up so that the only thing I need to do is issue the following statement:

cas;

That statement then creates a CAS session with the default name of CASAUTO, with an active caslib of CASUSER.

After you establish your CAS session, you can start loading data.

Load a SAS data set

The easiest way to load SAS data into CAS is to use a DATA step. The basic syntax is the same as it is when you are creating a SAS data set. The key difference is that the libref that is listed in the DATA step must point to a caslib.

The following example accesses SASHELP.CARS and then creates the table CARS in the CASUSER caslib.

cas;                   /* log on to the CAS session */
   caslib _all_ assign;   /* create a libref for each active 
                             caslib */
 
   data casuser.cars;     /* create the table in the caslib */
      set sashelp.cars;
   run;

The one thing to note about this code is that this DATA step is running in SAS and not CAS. A DATA step runs in CAS only if the input and output librefs are both using the CAS engine and only if it uses language elements that CAS supports. In this example, the SASHELP libref was not created with the CAS engine, so the step must run in SAS.

There are no calculations in this step, so there is no effect on performance.

When you load a data set into a CAS table, one task that you might want to perform is to promote the table. The following DATA step shows how to promote a table as you create it:

data casuser.cars(promote=yes);
      set sashelp.cars;
   run;

Promoting a table gives it a global scope. You can then access the table in multiple sessions, and the table is also available in SAS® Visual Analytics. The PROMOTE= data set option can be used with all three of the examples in this blog.

Load a delimited text file

The easiest way to load a delimited text file and store it as a CAS table is to use another familiar SAS step, the IMPORT procedure. The syntax is going to be basically the same as it is in SAS. Again, the key difference is that you need to point to a caslib in the OUT= option of the PROC IMPORT statement.

If you are running SAS® Studio 5.1, a common location for a text file that needs to be loaded into CAS is the SAS Content folder. This folder is a predefined repository for text files that you need to access from SAS Studio.

In order to access the files from SAS Content, you need to use the FILESRVC access method with the FILENAME statement. This method enables you to store and retrieve content using the SAS® Viya® Files service. Here is the basic syntax:

filename fileref filesrvc folderpath='path'
            filename='name';

For more information about this access method, see SAS 9.4 Global Statements Reference.

In the following example, PROC IMPORT and the FILESRVC access method are used to load the class.csv file from the SAS Content folder. The resulting ALLCLASS table is written to the CASUSER caslib.

filename myfile filesrvc folderpath='/Users/saskir'
            filename='class.csv';
   proc import datafile=myfile out=casuser.allclass(promote=yes)
        dbms=csv;
   run;

Load an Excel file

This section shows you how to load an Excel file into a CAS table by using PROC IMPORT. Loading an Excel file using PROC IMPORT requires that you have SAS/ACCESS® Interface to PC Files. If you are unsure whether you have this product, you can write all of your licensed products to the log by using the SETINIT procedure:

proc setinit;
   run;

You should see the following in the log when the product is licensed:

---SAS/ACCESS Interface to PC Files

After you confirm that you have this product, you can adapt the following PROC IMPORT code. This example loads the ReportTest.xlsx file and stores it in a table named ReportTest in the CASUSER caslib.

cas;
   caslib _all_ assign;
 
   proc import datafile='/viyashare/ReportTest.xlsx'
        out=casuser.ReportTest dbms=xlsx;
   run;

There are other methods

The purpose of this blog was to show you the easiest ways to load a SAS data set, text file, and Excel file into CAS. Although there are multiple ways to accomplish these tasks, both programmatically and interactively, these methods are the easiest and most straightforward ways to get data into CAS.

Learn the three easiest ways to load data into CAS tables was published on SAS Users.

5月 172019
 
Did you know that you can run Lua code within Base SAS? This functionality has been available since the SAS® 9.4M3 (TS1M3) release. With the LUA procedure, you can submit Lua statements from an external Lua script or just submit the Lua statements using SAS code. In this blog, I will discuss what PROC LUA can do as well as show some examples. I will also talk about a package that provides a Lua interface to SAS® Cloud Analytic Services (CAS).

What Is Lua?

Lua is a lightweight, embeddable scripting language. You can use it in many different applications from gaming to web applications. You might already have written Lua code that you would like to run within SAS, and PROC LUA enables you to do so.
With PROC LUA, you can perform these tasks:

  • run Lua code within a SAS session
  • call most SAS functions within Lua statements
  • call functions that are created using the FCMP procedure within Lua statements
  • submit SAS code from Lua
  • Call CAS actions

PROC LUA Examples

Here is a look at the basic syntax for PROC LUA:

proc lua <infile='file-name'> <restart> <terminate>;

Suppose you have a file called my_lua.lua or my_lua.luc that contains Lua statements, and it is in a directory called /local/lua_scripts. You would like to run those Lua statements within a SAS session. You can use PROC LUA along with the INFILE= option and specify the file name that identifies the Lua source file (in this case, it is my_lua). The Lua file name within your directory must contain the .lua or. luc extension, but do not include the extension within the file name for the INFILE= option. A FILENAME statement must be specified with a LUAPATH fileref that points to the location of the Lua file. Then include the Lua file name for the INFILE= option, as shown here:

filename luapath '/local/lua_scripts';
proc lua infile='my_lua';

This example executes the Lua statements contained within the file my_lua.lua or my_lua.luc from the /local/lua_scripts directory.

If there are multiple directories that contain Lua scripts, you can list them all in one FILENAME statement:

filename luapath ('directory1', 'directory2', 'directory3');

The RESTART option resets the state of Lua code submissions for a SAS session. The TERMINATE option stops maintaining the Lua code state in memory and terminates the Lua state when PROC LUA completes.

The syntax above discusses how to run an external Lua script, but you can also run Lua statements directly in SAS code.

Here are a couple of examples that show how to use Lua statements directly inside PROC LUA:

Example 1

   proc lua; 
   submit; 
      local names= {'Mickey', 'Donald', 'Goofy', 'Minnie'} 
      for i,v in ipairs(names) do 
         print(v) 
   end 
   endsubmit; 
   run;

Here is the log output from Example 1:

NOTE: Lua initialized.
Mickey
Donald
Goofy
Minnie
NOTE: PROCEDURE LUA used (Total process time):
      real time           0.38 seconds
      cpu time            0.10 seconds

Example 2

   proc lua;
   submit;
      dirpath=sas.io.assign("c:\\test")
      dir=dirpath:opendir()
      if dir:has("script.txt") then print ("exists")
      else print("doesn't exist")
      end
   endsubmit;
   run;

Example 2 checks to see whether an external file called script.txt exists in the c:\test directory. Notice that two slashes are needed to specify the backslash in the directory path. One backslash would represent an escape character.

All Lua code must be contained between the SUBMIT and ENDSUBMIT statements.

You can also submit SAS code within PROC LUA by calling the SAS.SUBMIT function. The SAS code must be contained within [[ and ]] brackets. Here is an example:

   proc lua; 
   submit;
      sas.submit [[proc print data=sashelp.cars; run; ]]
   endsubmit;
   run;

Using a Lua Interface with CAS

Available to download is a package called SWAT, which stands for SAS Scripting Wrapper for Analytics Transfer. This is a Lua interface for CAS. After you download this package, you can load data into memory and apply CAS actions to transform, summarize, model, and score your data.

The package can be downloaded from this Downloads page: SAS Lua Client Interface for Viya. After you download the SWAT package, there are some requirements for the client machine to use Lua with CAS:

  1. You must use a 64-bit version of either Lua 5.2 or Lua 5.3 on Linux.

    Note: If your deployment requires newer Lua binaries, visit http://luabinaries.sourceforge.net/.
    Note: Some Linux distributions do not include the required shared library libnuma.so.1. It can be installed with the numactl package supplied by your distribution's package manager.

  2. You must install the third-party package dependencies middleclass (4.0+), csv, and ee5_base64, which are all included with a SAS® Viya® installation.

For more information about configuration, see the Readme file that is included with the SWAT download.

I hope this blog post has helped you understand the possible ways of using Lua with SAS. If you have other SAS issues that you would like me to cover in future blog posts, please comment below.

To learn more about PROC LUA, check out these resources:

Using the Lua programming language within Base SAS® was published on SAS Users.

5月 172019
 

In the article Serverless functions and SAS Viya - a good match I discussed using serverless functions to deliver SAS Viya applications. Ignoring all the buzz words, a serverless function boils down to a set of REST APIs. So, if you tried the example you are now a REST API developer 🙂 .

The serverless function allowed the application developer to do the following:

  1. Define what the end user must supply to the function. A good application developer will try to make the request simple and easy to understand.
  2. Return to the end user a response easily consumed by the client's program. Again, a good application developer would make sure the response satisfies most common usage scenarios.
  3. Hide all the details of what it took to satisfy the users request.

This blog discusses using GraphQL to achieve the same goals. First, I will briefly discuss GraphQL, where it fits in with SAS Viya application integration, and how to create GraphQL-based applications. I also provide a series of examples based on real-world scenarios.

The images below display a high level comparison of the approaches between serverless and GraphQL.

serverless and GraphQL process flow

serverless and GraphQL process flow

Steps in the GraphQL flow

  1. A GraphQL server replaces the AWS API Gateway.
  2. The code that runs in the GraphQL server is referred to as "resolvers" - as the name implies, resolvers are used by the GraphQL server to execute user requests.
  3. The resolvers make the necessary REST API calls to the SAS Viya Server.

All of the code in this article resides in the restaf-graphql-demo GitHub repository. If you are not familiar with GraphQL please review the links at the end of this article before proceeding.

Why GraphQL?

Some smart folks at Facebook created GraphQL to solve problems they encountered using standard REST APIs. Companies like Github, Netflix, PayPal, The New York Times and many others are adopting GraphQL.

Some of the key motivators are:

  1. Users define and request what they need, following exact specifications
  2. A convenient way to front existing systems (REST-based or not) and databases with a Developer Experience friendly API
  3. Returning only the requested information reduces the data transferred - important for reducing network traffic
  4. GraphQL is less "chatty" - where REST API will requires multiple trips to the server, GraphQL can accomplish the same task in one round trip

Why GraphQL for SAS Viya application developers?

While the general GraphQL characteristics listed above are important, GraphQL is also a useful technology for developers creating applications integrated with SAS Viya.

  1. GraphQL is a ready-made vehicle for SAS users to deliver their applications as the next generation "stored process" developed with the data step+procedures, CAS Language (CASL) statements, custom CASL actions and SAS REST APIs.
  2. GraphQL is a great way for front-end and back-end developers to communicate.
  3. Developers can code to an agreed contract as specified by the GraphQL schema.
  4. Front-end developers can be confident what they get is exactly what they asked for.

Writing the GraphQL-based applications

The GraphQL queries used in this article are examples for demonstration purposes only and not "standards or strict guidelines" to follow. The code in the GitHub repository and the examples outlined below will help you jump-start your excellent adventure in GraphQL and SAS Viya applications.

The high-level steps for writing an application using GraphQL query are:

SAS Viya Side

  1. SAS programmers, data analysts and data scientists develop their intellectual protocol with SAS programs written with SAS procedures, CAS Actions, data step and CASL language.

Server Side

  1. Build the GraphQL schema and define the queries (see this for examples). In relation to SAS Viya, the schema describes the input and output of the SAS programs.
    • Make sure you have discussed this with the UI developers and the SAS programmers
  2. Write the resolvers - GraphQL server will call this code to resolve the requests by the user (see this for examples).
  3. Register both of these with the GraphQL server.

Client Side

  1. You can build the web apps in the normal way with these characteristics:
    • These apps will call a single end point (/graphql) with a POST method.
    • The payload is the GraphQL query
    • The response will match the query and are easily accessible

The image below shows the flow of a GraphQL-based application. User's queries are sent to the GraphQL server. The server parses the queries and calls the appropriate resolver (your code) to obtain the values for the requested fields. In this project the resolvers use restaf to make REST API calls to SAS Viya.

GraphQL-based application process flow

GraphQL-based application process flow

The rest of the blog discusses a few examples. All these examples are available in the repository. I chose to write the examples using JavaScript since it is one of the languages I am familiar with and can write reasonably decent code in. You can develop GraphQL-based SAS Viya applications in all the popular languages of today.

Example 1: Scoring a loan from client app
In this example, a data scientist working for a bank, has created a model to score a loan applicant's eligibility. The scientist outlines the following requirements:

  1. The user can only enter the desired loan amount and their current assets. All the other parameters needed for scoring have set values. All the values must be passed to the SAS code as a dictionary named _args_.
  2. Since the scientist wants to run A/B experiments the location and name of the scoring model's astore must be passed in as dictionary named _appEnv_.
  3. The code developed by the data scientist is below. The score returns as a dictionary.

    {score= <value>}

SAS Code

I wrote the SAS program in this example in CASL.

loadactionset "astore";

  /* convert arguments to a cas table */
/* _args_  and _appEnv_ are  generated by caslBase - see caslBase for details */

/* CASL function to convert a dictionary to a cas table  see lib/argsToTable.js for details*/
argsToTable(_args_, 'casuser', 'INPUTDATA' );

/* score */
action astore.score /
    table  = { caslib= 'casuser', name = 'INPUTDATA' } 
    rstore = { caslib= _appEnv_.astore.caslib,  name=_appEnv_.astore.name }
    casout  = { caslib = 'casuser', name = 'OUTPUTDATA' replace= TRUE};

/* fetch results */
action table.fetch r = result /
    table = {  caslib = 'casuser' name = 'INPUTDATA' } ;

/* extract the score and send it as a dictionary */
score = result.Fetch[1].P_BAD;
scoreo= {score= score};
send_response(scoreo);

Key points to note:

  1. The resolver creates and prepends two CASL dictionaries _args_ and _appEnv_.
  2. The CASL program returns the result using the send_response function.
    • One of the cool things is that CASL allows the programmer to customize the returned value. In this example the score extracts into a dictionary.

Schema

Based on the requirement the schema is as shown below:

type Query {
   scoreLoan(amount: Int assets: Int) : Float

Key Point:

  1. The two values the user specifies are defined as the filter parameters to the query.

Application

scoreLoan

Key point:

  1. The user enters the two values the data scientist requires.

Client code

async function runScore(amount, assets){
    let payload = {
        query: `query {
            scoreLoan(amount: ${amount} assets: ${assets} )
        }`
    }

    let config = {
        url            : host + '/graphql',
        withCredentials: true,
        method         : 'POST',
        data           : payload
    }

    let r = await axios(config);
    return r.data.data.scoreLoan;
}

Key points:

  1. The payload is the GraphQL query.
  2. I use the POST method.
  3. The end point is /graphql - this is the only endpoint the application will use.
  4. The response is available as r.data.data.scoreLoan
  5. Note the simplicity of the client code to access the GraphQL server and obtain the results.

Resolver

let caslBase = require('../lib/caslBase');

module.exports = async function scoreLoan (_, args, context) {
    let { store } = context;
    let input = {
        JOB    : 'J1',
        CLAGE  : 100, 
        CLNO   : 20, 
        DEBTINC: 20, 
        DELINQ : 2, 
        DEROG  : 0, 
        MORTDUE: 4000, 
        NINQ   : 1,
        YOJ    : 10
    };

    input.LOAN  = args.amount;
    input.VALUE = args.assets;

    let env = {
        astore: {
            caslib: 'Public',
            name  : 'GRADIENT_BOOSTING___BAD_2'
        }
    }
    let result = await caslBase(store,['argsToTable.casl', 'score.casl'], input, env);
    let score = result.items('results', 'score');
    
    return score;

}

Key points:

  1. As required, the default values for the other parameters are added to the user input.
  2. The resolver contains the location and name of the model.
  3. The names of the SAS code are passed to caslBase - this allows the code to read the SAS code from a repository.
  4. The caslBase function calls the jsonToDict to convert the json parameters to CASL dictionary and passes it on to CAS along with the code.
  5. The user receives the resulting score.
Example 2: Reporting wine production to management
The TwoBit winery management wants a simple report to view the production of different wines by year. They want to be able to pick the year range and the wines in which they are interested. The data shown below is for the TwoBit Winery. The goal is to query for selected wines and filter on years.

The data for the winery is listed below.

 
Obs year cabernet merlot pinot chardonnay twobit
1 2000 10 20 30 40 50
2 2001 5 10 15 5 0
3 2002 6 7 11 12 13
4 2003 5 8 0 0 50
5 2004 11 5 7 8 100
6 2005 1 1 0 0 1000
7 2006 0 0 0 0 3000

 

SAS Code

The SAS experts at the company created the following SAS code to meet management's request. Note that for demo purposes the wine data is created inline.

data wineList;  
 input year cabernet merlot pinot chardonnay twobit ;  
 cards;  
 2000 10 20 30 40 50   
 2001 5 10 15 5 0  
 2002 6 7 11 12 13  
 2003 5 8 0 0 50 
 2004 11 5 7 8 100  
 2005 1  1 0 0 1000  
 2006 0 0 0 0 3000  
;;;; 
run;  
/* _selections_ macro was generated in src/lib/getSelections function.
data wine ;  
    set winelist( where= (year GE &amp;from &amp;&amp; year LE &amp;to)); 
    keep &amp;_selections_; 
    run;  
ods html style=barrettsblue;  
    proc print data=wine;run;  
ods html close;run ;

Key points to note:

  1. The code requires macro variables &from, &to and &_selections_ be set before this code executes.
  2. The name of the returned table is wine.

Schema

type Query{
wineProduction(from: Int, to: Int): WineProduction
}

type WineProduction {
"""
An array containing wine production
"""
wines : [WineList]

"""
ODS output and Log output
"""
report: SASResults
}

type WineList {
year : Int
cabernet : Int
merlot : Int
pinot : Int
chardonnay: Int
twobit : Int
}

type WineProductionCas {
wines : [WineList]
}

type SASResults {
        """
        ODS output from the server
        """
        ods: String
        """
        Log output from the server
        """
        log: String
    
    }

Key points:

  1. As required, the year range is specified as filters for the query.
  2. As required, the user can pick the wines in which they are interested.

Application

The application is shown below.

Client code

The relevant client code is shown below (see this in the repository for the full program).

 let gqString = `query userQuery($from: Int, $to: Int) {
                           results: wineProduction(from: $from to: $to) {
                              wines { 
                                  ${wineList} 
                                } 
                                ${reportList}
                             } 
                            }`;
        let payload = {
            url   : host + '/graphql',
            method: 'POST',
            data: { 
                query: gqString,
                variables: {
                    from: fromYear.value,
                    to  : toYear.value
                }
            }
        }
        setReportValues(null);
        setResultValues(null);
        axios(payload)
         .then ( r => {
            let res = r.data.data.results;
           // Simple to extract the results
            setResultValues(res.wines);
            if (res.report != null ) {
                setReportValues(res.report);
            }
        
         })
         .catch( e => alert(e))
    }
})

Key points:

  1. The GraphQL query string is sent as the payload (wineList and reportList are strings computed earlier in the program based on user selection).
  2. The endpoint is again /graphql with a POST method.
  3. This snippet also shows the preferred way to send the filter values.

Resolver

The root resolver is shown below.

let getProgram    = require('../lib/getProgram');
let getSelections = require('../lib/getSelections');
let spBase        = require('../lib/spBase');

module.exports = async function wineProduction (_, args, context, info){
    let {store} = context;

<span style="font-size: 14px;">   // read source - reads in the sas program</span>
    let src = await getProgram(store, ['wines.sas']); 

    // update args with the wine list specified by the user
    let selections = getSelections(info, 'wines', args);

   // execute the sas code with compute server and get results
    let resultSummary = await spBase(store, selections.args, src);
    
    // resultSummary is now passed to the resolvers for wines and results fields.
    return resultSummary;
}

Key points:

  1. Code from the GitHub repo uses winelist.js to resolve the list of wines.
  2. Code from sasresults.js, sasOds.js and sasLog.js returns ODS output and the SAS log.
  3. The SAS code reads in from a repository using the getPrograms function.
Example 3: List SAS Visual Analytics reports
Another common use case is retrieving information about reports developed with SAS Visual Analytics. The GraphQL query to get the list of reports, who edited it last and when is shown below. This example uses the reports REST API.

Schema

{
    reports {
        name
        modifiedBy
        modifiedOn
   }
}

Creating a UI for this is a challenge exercise for the reader (meaning I did not get around to writing it 🙂 ). The returned results look something like this:

{
    "data": {
    "reports": [
        {
            "name": "Application Activity",
            "modifiedBy": "SAS Supplied",
            "modifiedOn": "2018-04-20T14:24:05.258Z"
       },
      {
           "name": "CAS Activity",
           "modifiedBy": "SAS Supplied",
          "modifiedOn": "2018-06-08T20:21:14.727Z"
        }
...

Resolver

module.exports = async function reports (_, args, context) {
    let {store} = context;
    let reports = store.getService ('reports');
    let list =await getList(store, reports);
    return list;
}

async function getList(store, reports) {
    let reportsList =await store.apiCall (reports.links ('reports'));
    if (reportsList.itemsList().size ===0) {
       return [];
     }
    let r = reportsList.itemsList().map (name => {
         let t = {
             name : name,
             modifiedBy: reportsList.items(name, 'data', 'modifiedBy'),
             modifiedOn: reportsList.items(name, 'data', 'modifiedTimeStamp')
         };
        return t;
     });
   return r;
}

Example 4: Getting the URL and image of a specific report
The query below can be used to obtain the URL to display the interactive report and svg image of a specific report.

Schema

{
      report(name:"Application Activity"){
           url
          image
      }
}

The returned value will be along these lines:

{
  "data": {
    "report": {
      "url": "http://superuser.com/?reportUri=/reports/reports/ecec39ad-994f-4055-8e40-4360f410bc6e...",
      "image: {the svg of the image}
    }
}

Resolver

There are 3 resolvers associated with this query, the root resolver and resolvers for image and url. For the sake of brevity, I will not review those here. please visit the code in the repository.

In conclusion

The examples above cover some basic scenarios for SAS Viya applications.

  1. Using CAS actions
  2. Using traditional data step and procs
  3. Obtaining ODS output
  4. Working with SAS Visual Analytics

The simplicity of the client code and the resolvers are what makes GraphQL attractive for writing SAS Viya applications. You can also exploit other features in SAS Viya using the same pattern. Further, you can use the examples in this repository to easily customize your own use cases. The resolvers and helper functions are written to be reusable with minimal effort. The instructions are in the README file in the repository. If you create interesting schema and resolvers for SAS Viya, please share them with the SAS user community.

Opinion

Like all new technologies GraphQL has its proponents and detractors. Also, many people get caught in the low-value arguments about GraphQL being better or worse than REST. I personally do not follow these discussions since you should use the best tool for the job.

I find GraphQL most attractive when developing a back-end for SAS Viya applications. Both front and back-end developers will benefit from the clear definition of the schema. Having well supported GraphQL servers by Apollo and Facebook makes it easier to adopt GraphQL.

Useful links

There are a growing number of resources from which to learn and model. Below is small starter list.

  1. graphql.org
  2. Apollo
  3. Relay
  4. GraphQL Concepts Visualized by Dhaivat Pandya
  5. GraphQL tutorial from TutorialsPoint
  6. How to GraphQL

GraphQL and SAS Viya applications - a good match was published on SAS Users.

4月 052019
 

Recently, you may have heard about the release of the new SAS Analytics Cloud. The platform allows fast access to data-science applications in the cloud! Running on the SAS Cloud and using the latest container technology, Analytics Cloud eliminates the need to install, update, or maintain software or related infrastructure.

SAS Machine Learning on SAS Analytics Cloud is designed for SAS and open source data scientists to gain on-demand programmatic access to SAS Viya. All the algorithms provided by SAS Visual Data Mining and Machine Learning (VDMML), SAS Visual Statistics and SAS Visual Analytics are available through the offering. Developers and data scientists access SAS through a programming interface using either the SAS or Python programming languages.

A free trial for Analytics Cloud is available, and registration is simple. The trial environment allows users to manage and collaborate with others, share data, and create runtime models to analyze their data. The system is pre-loaded with sample data for learning, and allows users to upload their own data. My colleague Joe Furbee explains how to register for the trial and takes you on a tour of the system in his article, Zero to SAS in 60 Seconds- SAS Machine Learning on SAS Analytics Cloud.

Luckily, I had the privilege of being the technical writer for the documentation for SAS Analytics Cloud, and through this met two of my now close friends at SAS.

Alyssa Andrews (pictured left) and Mariah Bragg (pictured right) are both Software Developers at SAS, but worked on the UI for SAS Analytics Cloud. Mariah works in the Research and Development (R&D) division of SAS while Alyssa works in the Information Technology (IT) division. As you can see this project ended up being an interesting mix of SAS teams!

As Mariah told me the history, I learned that SAS Analytics Cloud “was a collaborative project between IT and R&D. The IT team presented the container technology idea to Dr. Goodnight but went to R&D because they wanted this idea run like an R&D project.”

As we prepared for the release of SAS Analytics Cloud to the public, I asked Mariah and Alyssa about their experience working on the UI for SAS Analytics Cloud, and about all the work that they had completed to bring this powerful platform to life!


What is SAS Analytics Cloud for you? How do you believe it will help SAS users?

Alyssa: For me, it is SAS getting to do Software as a Service. So now you can click on our SAS Software and it can magically run without having to add the complexity of shipping a technical support agent to the customers site to install a bunch of complex software.

Mariah: I agree. This will be a great opportunity for SAS to unify and have all our SAS products on cloud.

Alyssa: Now, you can trial and then pay for SAS products on the fly without having to go through any complexities.

What did you do on the project as UI Developers?

Alyssa: I was lent out to the SAS Analytics Cloud team from another team and given a tour-of-duty because I had a background in Django (a high-level Python Web design tool) which is another type of API framework you can build a UI on top of. Then I met Mariah, who came from an Angular background, and we decided to build the project on Angular. So, I would say Mariah was the lead developer and I was learning from her. She did more of the connecting to the API backend and building the store part out, and I did more of the tweaks and the overlays.

What is something you are proud of creating for SAS Analytics Cloud?

Mariah: I’m really proud to be a part of something that uses Angular. I think I was one of the first people to start using Angular at SAS and I am so excited that we have something out there that is using this new technology. I am also really proud of how our team works together, and I’m really proud of how we architectured the application. We went through multiple redesigns, but they were very manageable, and we really built and designed such that we could pull out components and modify parts without much stress.

Alyssa: That we implemented good design practices. It is a lot more work on the front-end, but it helps so much not to have just snowflake code (a term used by developers to describe code that isn’t reusable or extremely unique to where it becomes a problem later on and adds weight to the program) floating. Each piece of code is there for a reason, it’s very modular.

What are your hopes for the future of SAS Analytics Cloud?

Alyssa: I hope that it continues to grow and that we add even more applications to this new container technology, so that SAS can move even more into the cloud arena. I hope it brings success. It is a really cool platform, so I can’t wait to hear about users and their success with it.

Mariah:
I agree with Alyssa. I also hope it is successful so that we keep moving into the Cloud with SAS.

Learning more

As a Developmental Editor with SAS Press, it was a new and engaging experience to get to work with such an innovative technology like SAS Analytics Cloud. I was happy I got to work with such an exciting team and I also look forward to what is next for SAS Analytics Cloud.

And as a SAS Press team member, I hope you check out the new way to trial SAS Machine Learning with SAS Analytics Cloud. And while you are learning SAS, check out some of our great books that can help you get started with SAS Studio, like Ron Cody’s Biostatistics by Example Using SAS® Studio and also explore Geoff Der and Brian Everitt’s Essential Statistics Using SAS® University Edition.

Already experienced but want to know more about how to integrate R and Python into SAS? Check out Kevin D. Smith’s blogs on R and Python with SAS Viya. Also take a moment to investigate our new books on using open source R and Python with SAS Viya: SAS Viya: The R Perspective by Yue Qi, Kevin D. Smith, and XingXing Meng and SAS Viya: The Phyton Perspective by Kevin D. Smith and XingXing Meng.

These great books can set you on the right path to learning SAS before you begin your jump into SAS Analytics Cloud, the new way to experience SAS.

SAS® Analytics Cloud—an interview with the women involved was published on SAS Users.

3月 272019
 

PAYG financial services: coming to a bank near you

You walk into your neighborhood bank to see about a mortgage. You and your spouse have your eye on the perfect 3BR, 2BA brick ranch near your child's school, and it won't be on the market long. An hour later, you burst through the front door with a bottle of champagne: "We're qualified!"

Also celebrating is your bank's branch manager. She was skeptical when headquarters analysts equipped branches for "Cloud-based application using SAS" , saying it would speed up loan applications. But your quick, frictionless transaction proved them right.The bank's accountants are happy too. The new pay-as-you-go mode of using SAS software in the cloud means big savings.

The above scenario is possible now through serverless functions, which enable your SAS Viya applications to take input from end users, score the loan application, and return results.

The rest of this post gets into the nitty gritty of serverless functions and SAS Viya, detailing what happens in a bank's computers after a customer applies for a loan. The qualification process starts by running a previously built scoring model to generate a score. You will see how the combination of REST APIs in SAS Viya, analytic models and the restaf library make the task of building the serverless function relatively simple.

The blog titled "SAS REST APIs: a sample application" demonstrated building a SAS Viya application using REST APIs, SAS Visual Analytics and SAS Operational Research. This is typical web applications with application server and SAS Viya running on premise.

If you are one of many users using(or considering) a cloud provider, serverless functions is an useful alternate way to deliver your applications to your users. This eliminates the need to manage the application server associated with your application. Additionally you get zero administration and auto-scaling among other benefits. Many SAS applications that respond quickly to user requests are ideal candidates to be deployed as serverless functions.

The example in this article is available on SAS software’s GitHub site in the viya-apps-serverless-score repository.  If you want to see the end application for frame of reference, see the Using the serverless functions section at the bottom of this article.

Let’s begin with a bit of background on serverless computing and then dig into the details of the application and functions.

Serverless computing explained

The benefits of serverless functions as touted by AWS serverless, Azure and serverless.com:

AWS Lambda

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume– there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

What is serverless computing?

According to Azure serverless computing is the abstraction of servers, infrastructure, and operating systems. When you build serverless apps you don’t need to provision and manage any servers, so you can take your mind off infrastructure concerns. Serverless computing is driven by the reaction to events and triggers happening in near-real-time—in the cloud. As a fully managed service, server management and capacity planning are invisible to the developer and billing is based just on resources consumed or the actual time your code is running.

Four core benefits of serverless computing from serverless.com:

  1. Zero administration – Deploy code without provisioning anything beforehand or managing anything afterward. There is no concept of a fleet, an instance, or even an operating system. No more bothering the Ops department.
  2. Auto-scaling – Let your service providers manage the scaling challenges. No need to fire alerts or write scripts to scale up and down. Handle quick bursts of traffic and weekend lulls the same way — with peace of mind.
  3. Pay-per-use – Function-as-a-service compute and managed services charged based on usage rather than pre-provisioned capacity. You can have complete resource utilization without paying a cent for idle time. The results? 90% cost-savings over a cloud VM, and the satisfaction of knowing that you never pay for resources you don’t use.
  4. Increased velocity – Shorten the loop between having an idea and deploying to production. Because there’s less to provision up front and less to manage after deployment, smaller teams can ship more features. It’s easier than ever to make your idea live.

OK, so there is a server involved in serverless computing. The beauty in this technology is that once you deploy your code, you don't have to worry about the underlying infrastructure. You just know that the app should work and you only incur costs when the app is running.

Basic flow

Serverless functions are loaded and executed based on the occurrence of one of the triggers/events supported by the cloud vendor. In this example the API Gateway triggers the serverless functions when an http call invokes the function. The API Gateway calls the handler for the function and passes in the user data. On return from the handler the response is sent to the client. This article focuses on the code inside the Serverless Function box in the picture below.

Figure 1: Request Workflow

This example utilizes two key functions:

  1. app – This function serves up an html application for user to enter the data. This is an example of a web application as a serverless function.
  2. score – This function takes user input from the web app, executes scoring on a Viya Server and returns the results.

Serverless.yml

The serverless.yml defines the serverless functions and the handlers, used to execute and other system related information. We will focus only on the application specific information.

The code snippet below shows the definition of the path and handler for the two functions in the serverles.yml file.

functions:
  app: 
    handler: src/app.app
    events:
      - http:
          path: app
          method: get
          cors: 
            origin: '*'
          request:
            parameters:
              paths:
                id: true  
 
  score:
    handler: src/score.score
    events:
      - http:
          path: score
          method: post
          cors: 
            origin: '*'

The functions(app & score) in the yaml define:

  1. event - http event will trigger this function
  2. path - this is path to the function - similar to what you define in Express or hapijs
  3. method - http standard GET, PUT etc...
  4. others - refer to the cloud vendor's documentation for other available options.

The serverless.yml file also sets application related information using environment variables. In this particular use case we define how to access SAS Viya and which scoring model to use.

environment:
#
# Information for logging into SAS Viya
#
  VIYA_SERVER: http://example.viya.server.com
  CLIENTID: raf
  CLIENTSECRET: raf
  USER:rafuser
  PASSWORD: rafpass
 
#
# astore to be used for scoring
#
  caslib: casuser
  name: GRADIENT_BOOSTING___BAD_2

A note on securing your password

In this example we store the userid and password in environment variables. This is to the keep the focus on the internals of serverless functions for SAS Viya. Locally you can use "serverless variables" to secure the information during development. However, for production deployment, refer to your provider's recommendations and the user community for best practices.

Sounds like a followup blog in the future 🙂

Anatomy of the serverless function

Figure 2 shows the flow inside the serverless function for this example. This pattern will repeat itself in your serverless functions.

Figure 2: Serverless Function Flow

Serverless function score

The code below is the handler for the score function. The rest of this section will discuss each of the key features of the handler.

//
// See src/score.js for the full code
//
module.exports.score = async function (event, context ) {
 
   let store      =  restaf.initStore(); /* initialize restaf     */
   let inParms = parseEvent(event);  /* get user input        */
   let payload = getLogonPayload(); /* get logon information */
 
   return store.logon(payload)               /* logon to SAS Viya */
        .then (()    > scoreMain( store, inParms )) /* score     */
        .then(result > setPayload(result)) /* return results     */
        .catch(err   > setError(err))	      /* else return errors */
}

Step 1: Parse the input

The event parameter contains the input from the caller (web application, another serverless function, etc).
The content of the event parameter is whatever the designer of the serverless function desires. In this particular case, a sample event data is shown below.

{
    "input": {
        "JOB"    : "J1",
        "CLAGE"  : 100,
        "CLNO"   : 20,
        "DEBTINC": 20,
        "DELINQ" : 2,
        "DEROG"  : 0,
        "MORTDUE": 4000,
        "NINQ"   : 1,
        "YOJ"    : 10,
        "LOAN"   : 10000,
        "VALUE"  : 1000000
    }
}

The parseEvent function validates the incoming information.

module.exports = function parseEvent(event)
    let input = null;
    let body = {};
    let rstore = {
        caslib:  process.env.ASTORE_CASLIB,
        name  : process.env.ASTORE_NAME
    }
    if ( event.body !=  null ) {
        body = ( typeof event.body === 'string') ? JSON.parse(event.body) : Object.assign({}, event.body);
       if ( body.hasOwnProperty('input') === true ) {
          input = body.input;
    }
    return { rstore: rstore, input: input }
}

Step 2: Logon to SAS Viya

The server.yml defines the SAS Viya logon information. Note there are other secure ways to manage sensitive information like passwords. You should refer to your provider’s documentation.

module.exports = function getLogonPayload() {
    let p = {
        authType    : 'password',
        host        : `${process.env.VIYA_SERVER}`,
        user        : process.env['USER'],
        password    : process.env['PASSWORD'],
        clientID    : process.env['CLIENTID'],
        clientSecret: (process.env.hasOwnProperty('CLIENTSECRET')) ? process.env[ 'CLIENTSECRET' ] : ''
        };
    return p;
 }

The line restaf.logon(payload) in function in the handler code logs on to the SAS Viya Server using this information.

Step 3 and Step 4: Create Payload and make REST API calls

On successful logon the server is called to do the scoring. This particular example uses the sccasl.runcasl method to run CAS Language (CASL) statements and return the scores. Creating the score has two steps:

  1. upload user input: The user input is converted to a csv and uploaded to a CAS table
  2. Submit CASL statements to SAS Viya (CAS) to do the scoring

The code in src/scoreMain in the repository accomplishes both these steps.

Each of these steps use a CAS action:

    • table.upload – to upload the user data into a CAS Table. The input data is converted into a comma-delimited file(csv) and then uploaded. The REST call using restaf looks like this:
    let csv = makecsv(input); /* create a csv */
    let JSON_Parameters = {
        casout: {
            caslib : 'casuser', /* a valid caslib */
            name   : 'INPUTDATA', /* name of output file on cas server */
            replace: true
        },
 
        importOptions: {
            fileType: 'csv' /* type of the file being uploaded */
        }
    };
 
    let payload = {
        headers: { 'JSON-Parameters': JSON_Parameters },
        data   : csv,
        action : 'table.upload'
    };
 
    let result = await store.runAction(session, payload);
    • sccasl.runcasl – execute CASL statements to do the scoring
 // Setup casl statements 	 	 
 let caslStatements = `	 	 
 loadactionset "astore";	 	 
 action table.loadTable /	 	 
 caslib = "${rstore.caslib}" 	 	 
 path = "${rstore.name}.sashdat"	 	 
 casout = { caslib = "${rstore.caslib}" name = "${rstore.name}" replace=TRUE};	 	 
 
 action astore.score /	 	 
 table = { caslib= 'casuser' name = 'INPUTDATA' } 	 	 
 rstore = { caslib= "${rstore.caslib}" name = '${rstore.name}' }	 	 
 out = { caslib = 'casuser' name = 'OUTPUTDATA' replace= TRUE};	 	 
 action table.fetch r = result/	 	 
 format = TRUE	 	 
 table = { caslib = 'casuser' name = 'OUTPUTDATA' } ;	 	 
 send_response(result);	 	 
 `;	 	 
 // execute cas actions	 	 
 payload = {	 	 
 action: 'sccasl.runcasl',	 	 
 data : { code: caslStatements}	 	 
 }	 	 
 result = await store.runAction(session, payload);

Step 5: Create response

AWS serverless functions must return data and error(s) in a certain form. The two functions setPayload.js and setError.js accomplish this.

module.exports = function setPayload (body) {
    return {
        "statusCode": 200,
        "headers"   : {
            'Access-Control-Allow-Origin'     : '*',
            'Access-Control-Allow-Credentials': true
          },
        "isBase64Encoded": false,
        "body"           : JSON.stringify(body)
    }
  }

Using the serverless functions

When the serverless function is deployed you will get a link for each of the functions. In our case we receive the request shown below (with xxxx replaced with appropriate information).

GET - https://xxxx.amazonaws.com/demo/app

The first link serves up the web application. The user enters some values and the app calls the score serverless function to get the results.
Alternatively, you can write your own application and make an http POST call to the score function using a link such as:

POST - https://xxxx.amazonaws.com/demo/score

To invoke the web application, you will visit the link

https://xxxx.amazonaws.com/demo/app

with your browser. You should see a display shown in Figure 3:

Figure 3: Application Input Screen

Entering values into the two fields and pressing Submit calls the second serverless function, score, and results in a pie chart as seen in Figure 4:

Figure 4: Score Report Screen

Please see the loan.html file in the GitHub repository for details on the application. Displayed below is the relevant part of the Javascript in the loan.html. The score-function-url is the url for the score function. The payload was described earlier in this article. The http call is made using axios.

async function runScore(inputValues ){
 
    let payload = {
        astore: {
            caslib: 'Public',
            name: 'GRADIENT_BOOSTING___BAD_2'
        },
        input: inputValues
    }
    let config = {
        url: {score-function-url}
        method: 'POST',
        data: payload
    }
    let r = await axios(config);
    return r.data.score;
 
}

Porting to other cloud providers

The cloud provider dependent information is handled in the following functions: score.js, parseEvent.js, setPayload.js and setError.js. The rest of the code is host agnostic. In writing your own functions the recommendation is to follow the same pattern as much as possible. The generic code is then available in its own repository for reuse with other providers and applications.

Go try it yourself

I have shown you how to deliver your SAS Viya applications as serverless functions. To access more examples please see the GitHub restaf-demos repository.

Supporting Resources

Serverless functions and SAS Viya - a good match was published on SAS Users.