CASL

8月 082019
 

Some key components of CASL are the action statements. These statements perform tasks that range from configuring the connection to the server, to summarizing large amounts of data, to processing image files. Each action has its own purpose. However, there is some overlapping functionality between actions. For example, more than one action can summarize numeric variables.
This blog looks at three actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS. Each of these actions generates summary statistics. Though there might be more actions that generate the same statistics, these three are a good place to start as you learn CASL.

Create a CAS table for these examples

The following step generates a table called mydata, stored in the casuser caslib, that will be used for the examples in this blog.

cas;
libname myuser cas caslib='casuser';
data myuser.mydata;
	length color $8;
	array X{100};
	do k=1 to 9000;
   	do i=1 to 50;
      	X{i} = rand('Normal',0, 4000);
      end;
      do i=51 to 100;
      	X{i} = rand('Normal', 100000, 1000000);
      end;
      if x1 < 0 then color='red';
      	else if x1 < 3000 then color='blue';
         else color='green';
		output;
	end;
run;

SIMPLE.SUMMARY

The purpose of the Simple Analytics action set is to perform basic analytical functions. One of the actions in the action set is the SUMMARY action, used for generating descriptive statistics like the minimum, maximum, mean, and sum.
This example demonstrates obtaining the sum, mean, and n statistics for five variables (x1–x5) and grouping the results by color. The numeric input variables are specified in the INPUTS parameter. The desired statistics are specified in the SUBSET parameter.

proc cas;
   simple.summary / 
      inputs={"x1","x2","x3","x4","x5"},
      subset={"sum","mean","n"},
      table={caslib="casuser",name="mydata",groupBy={"color"}},
      casout={caslib="casuser", name="mydata_summary", replace=true};
run;
	table.fetch /
		table={caslib="casuser",name="mydata_summary" };
run;
quit;

The SUMMARY action creates a table that is named mydata_summary. The TABLE.FETCH action is included to show the contents of the table.

The mydata_summary table can be used as input for other actions, its variable names can be changed, or it can be transposed. Now that you have the summary statistics, you can use them however you need to.

AGGREGATION.AGGREGATE

Many SAS® procedures have been CAS-enabled, which means you can use a CAS table as input. However, specifying a CAS table does not mean all of the processing takes place on the CAS server. Not every statement, option, or statistic is supported on the CAS server for every procedure. You need to be aware of what is not supported so that you do not run into issues if you choose to use a CAS-enabled procedure. In the documentation, refer to the CAS processing section to find the relevant details.
When a procedure is CAS-enabled, it means that, behind the scenes, it is submitting an action. The MEANS and SUMMARY procedure steps submit the AGGREGATION.AGGREGATE action.
With PROC MEANS it is common to use a BY or CLASS statement and ask for multiple statistics for each analysis variable, even different statistics for different variables. Here is an example:

proc means sum data=myuser.mydata noprint;
  by color;
   var x1 x2 x3;
   output out=test(drop=_type_ _freq_) sum(x1 x3)=x1_sum x3_sum
   max(x2)=x2_max std(x3)=x3_std;
run;

The AGGREGATE action produces the same statistics and the same structured output table as PROC MEANS.

proc cas;
	aggregation.aggregate / 
		table={name="mydata",caslib="casuser",groupby={"color"}}
      casout={name="mydata_aggregate", caslib='casuser', replace=true}
      varspecs={{name='x1', summarysubset='sum', columnnames={'x1_sum'}}, 
                {name='x2', agg='max', columnnames={'x2_max'}},
                {name='x3', summarysubset={'sum','std'},
                columnnames={'x3_sum','x3_std'}}}
      savegroupbyraw=true, savegroupbyformat=false, raw=true;
run;
quit;

The VARSPECS parameter might be confusing. It is where you specify the variables that you want to generate statistics for, which statistics to generate, and what the resulting column should be called. Check the documentation: depending on the desired statistic, you need to use either SUMMARYSUBSET or AGG arguments.

If you are using the GROUPBY action, you most likely want to use the SAVEGROUPBYRAW=TRUE parameter. Otherwise, you must list every GROUPBY variable in the VARSPECS parameter. Also, the SAVEGROUPBYFORMAT=FALSE parameter prevents the output from containing _f versions (formatted versions) of all of the GROUPBY variables.

DATAPREPROCESS.RUSTATS

The RUSTATS action, in the Data Preprocess action set, computes univariate statistics, centralized moments, quantiles, and frequency distribution statistics. This action is extremely useful when you need to calculate percentiles. If you ask for percentiles from a procedure, all of the data will be moved to the compute server and processed there, not on the CAS server.
This example has an extra step. Actions require a list of variables, which can be cumbersome when you want to generate summary statistics for more than a handful of variables. Macro variables are a handy way to insert a list of strings, variable names in this case, without having to enter all of the names yourself. The SQL procedure step generates a macro variable containing the names of all of the numeric variables. The macro variable is referenced in the INPUTS parameter.
The RUSTATS action has TABLE and INPUTS parameters like the previous actions. The REQUESTPACKAGES parameter is the parameter that allows for a request for percentiles.
The example also contains a bonus action, TRANSPOSE.TRANSPOSE. The goal is to have a final table, mydata_rustats2, with a structure like PROC MEANS would generate. The tricky part is the COMPUTEDVARSPROGRAM parameter.
The table generated by the RUSTATS action has a column called _Statistic_ that contains the name of the statistic. However, it contains “Percentile” multiple times. A different variable, _Arg1_, contains the value of the percentiles (1, 10, 20, and so on). The values of _Statistic_ and _Arg1_ need to be combined, and that new combined value generates the new variable names in the final table.
The COMPUTEDVARS parameter specifies that the name of the new variable will hold the concatenation of _Statistic_ and _Arg1_. The COMPUTEDVARSPROGRAM parameter tells CAS how to create the values for NEWID. The NEWID value is then used in the ID parameter to make the new variable names—pretty cool!

proc sql noprint;
	select quote(strip(name)) into: numvars separated by ','
	from dictionary.columns 
 	where libname='MYUSER' and memname='MYDATA' and type='num';
quit;
 
proc cas;
	dataPreprocess.rustats / 
   	table={name="mydata",caslib="casuser"} 
   	inputs={&numvars}
   	requestpackages={{percentiles={1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95 99},scales={"std"}}}
   	casoutstats={name="mydata_rustats",caslib="casuser"} ;
 
	transpose.transpose / 
   	table={caslib='casuser', name="mydata_rustats", groupby={"_variable_"},
			computedvars={{name="newid",format="$20."}},computedvarsprogram="newid=strip(_statistic_)||compress(strip(_arg1_),'.-');"}
   	transpose={"_value_"} 
   	id={"newid"}
   	casOut={caslib='casuser', name="mydata_rustats2", replace=true};
   run;
quit;

Here is a small portion of the final table. Remember, you can use the TABLE.FETCH action to view the table contents as well.

Summary

Summarizing numeric data is an important step in analyzing your data. CASL provides multiple actions that generate summary statistics. This blog provided a quick overview of three of those actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS.
The wonderful part of so many choices is that you can decide which one best fits your needs. Summarizing your data with actions also ensures that all of the processing occurs on the CAS server and that you are taking full advantage of its capabilities.
Be sure to use the DROPTABLE action to delete any tables that you do not want taking up space in memory:

proc cas;
	table.droptable / caslib='casuser' name='mydata' quiet=true;
	table.droptable / caslib='casuser' name='mydata_summary' quiet=true;
	table.droptable / caslib='casuser' name='mydata_aggregate' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats2' quiet=true;
quit;
cas casauto terminate;

Learn More

Summarization in CASL was published on SAS Users.

6月 262019
 

In his article How to use CASL to develop and work with user-defined CAS actions, Brian Kinnebrew defines CASL as "a language specification used by the SAS client to interact with and provide easy access to Cloud Analytic Services (CAS). CASL is a statement-based scripting language with many uses and strengths." I can't come up with a better definition, so if you'd like to learn more about the basics of CASL, I encourage you to read Brian's post.

In a SAS Stored Process (or any traditional SAS program) the code has multiple DATA steps and procedures with a good dose of macros.

Over the last couple of years, the focus of my projects is the use of CAS actions with no traditional procedures in the mix. Most of these involved web applications calling multiple CAS actions. My initial approach was to make multiple http calls - one per action. This could get tedious.

Then I met my action hero: sccasl.runCasl.

My favorite SAS CASL action

This action executes a CASL "script" on the CAS server analogous to executing a SAS Stored Process. Running a CASL program with a mix of CAS actions and CASL statements on the CAS server has these benefits:

  1. Reduced the number of http calls to the server
  2. The client-side code is much easier to reason
  3. The returned values can be a dictionary that is suited for further consumption by the client, simplifying the client code

My personal name for these scripts is "CAS stored process".

Where art thou Macro?

In many applications, user input is passed to the code running on the server. In a SAS Stored Process, macros pass the parameters. CASL has no macros. My initial approaches to passing parameters to CASL programs were:

  1. Generate the final CASL program in JavaScript with the user input values inserted into the code. Sample code is available here.
    • Drawback: Debugging the code in SAS Studio requires a cut-and-paste of the generated code into SAS Studio.
  2. Load the data into a CAS table using table.upload action for programs needing an input table. Sample code is available here.
    • Drawback: This requires an additional http call to the server.

In developing the GraphQL approach to writing applications - GraphQL and SAS Viya applications - a good match - I addressed the two drawbacks listed above by creating two functions and put them in my utility belt.

    Superfriend functions

  • jsonToDict.js - generate a string having the CASL dictionary version of a JavaScript object
  • argsToTable - create a CAS table from a dictionary

The remainder of this article discusses these two functions. To demonstrate the functions' usage, I use code listings from the scoring example, covered in the GraphQL example.

The jsonToDict function

The result of this function produces a string with CASL dictionary suitable for inclusion in CASL code.

Function definition

jsonToDict ⇒ string

Returnsstring - returns the string containing the CASL dictionary

Param Type Description
obj object the JavScript object of interest
name string the name to assign to the dictionary

 

Example function code

The code below outlines the jsonToDict function usage.

obj = {x:1, b:2, c:['a', 'b']};
let r = jsonToDict(obj, '_appEnv_');
The result is:
r = `_appEnv_ = {x=1, b=2, c={"a", "b"}}'`;

The following lists the input parameters passed to the CASL code for scoring.

let input = {
        JOB    : 'J1',
        CLAGE  : 100, 
        CLNO   : 20, 
        DEBTINC: 20, 
        DELINQ : 2, 
        DEROG  : 0, 
        MORTDUE: 4000, 
        NINQ   : 1,
        YOJ    : 10,
        LOAN: 1000,
        ASSET: 100000
    };

Below is the Javascript code and the result.

let _args_ = jsonToDict(input, '_args_');
results in
 
args_ = `_args_ = {  JOB= "J1" ,CLAGE=100  ,CLNO=20  ,DEBTINC=20  ,DELINQ=2  ,DEROG=0  ,
MORTDUE=4000  ,NINQ=1  ,YOJ=10  ,LOAN=1000  ,ASSET=100000  };`;

To allow the use of different versions of the model, the name of the scoring model is passed in as a parameter. The Javascript for the model follows.

let env = {
     astore: {
            caslib: 'Public',
            name  : 'GRADIENT_BOOSTING___BAD_2'
        }
};
let _appEnv_= jsonToDict(env, '_appEnv_');
 
resulting in:
let _appEnv_ = `_appEnv_ = { astore = { caslib="Public", name="GRADIENT_BOOSTING___BAD_2"}};`;

Next, I prepend the strings to the CASL program in the client code with the following code snips.

let code = _args_ + _appEnv_ + `the CASL code shown below';
loadactionset "astore";
 
/* convert arguments to a cas table */
argsToTable(_args_, 'casuser', 'INPUTDATA' );
 
/* score */
action astore.score r=rc/
    table  = { caslib= 'casuser', name = 'INPUTDATA' } 
    rstore = { caslib= _appEnv_.astore.caslib,  name=_appEnv_.astore.name }
    casout  = { caslib = 'casuser', name = 'OUTPUTDATA' replace= TRUE};
 
/* fetch results */
action table.fetch r = result /
    table = {  caslib = 'casuser' name = 'OUTPUTDATA' } ;
 
/* extract the score and send it as a dictionary */
score = result.Fetch[1].P_BAD;
send_response({score = score});

Now the CASL program can access the incoming information using the two dictionaries _args_ and _appEnv_. Note: As a personal choice, I use a convention of _args_ for user input and _appEnv_ for application specific information. I use the restaf application framework to make the http calls as shown below.

let payload = {
        action: 'sccasl.runCasl',
        data  : { code: code}
    }
 let result = await store.runAction(session, payload); 
let score = result.items('results', 'score');

CASL coding - easier than saying Kilp-ill-skim

Notice the absence of string substitutions that look strange. Just simple, straight forward coding. Easier than saying your name backwards, forcing you back to the fifth dimension.

The argsToTable.casl function

The sample code above used the function argsToTable. As you may guess it converts the _args_ dictionary into a CAS table used in the scoring action. The argsToTable is the function in CASL handling this task.

Function definition

argsToTable ⇒ Load dictionary into a CAS Table

Param Type Description
input dictionary the data to load
caslib string caslib of output table
name string name of output table

 

The relevant CASL code from Step 1 is reproduced here:

argsToTable(_args_, 'casuser', 'INPUTDATA' );

The argsToTable function is either stored on a server or prepended to the CASL code sent to the runCasl action. This function removes the need to run a http call to load the data in the CAS table.

Returning data from CAS - the send_response function

The ultimate sidekick function

Any good super hero has a sidekick. The function send_response in CASL is very versatile - it allows one to return data in the form the application needs and allows more than one result. In many programs I return data in a form easily consumed by the client code.

For example, if you wanted to return just the rows of table you can do the following:

function resultsToDict(r);
    casResults = {};
    i = 1;
  do row over r;
     casResults[i] = row;
     i = i + 1;
   end;
  return casResults;
end;
 
/* and use it as follows: */
 
action table.fetch r = result /
    table = {  caslib = 'casuser',  name = 'mydata' };
/* extract the data and return it as a dictionary */
casResults = resultsToDict(result.Fetch);
send_response({casResults: casResults});

Finally

Using a combination of CASL, a couple of utility functions and the runCasl action you can develop some very efficient programs with minimal traffic between your client and the server. If you run multiple actions in sequence you should consider grouping them into a CASL program and executing them on the CAS server using the runCasl action.

Next

In my next article, which I hope to finish in a flash, I will discuss using the runCasl action to create a browser for CAS tables with support for pagination.

All comments are welcome. Please feel free to clone the code, make it better.

Cheers...
Deva

Let runCasl be your BFF and favorite action hero

"CAS Stored Process" with my Favorite Action Hero runCasl was published on SAS Users.

11月 132018
 

In my previous blog post I demonstrated how to create your own CAS actions and action sets.  In this post, we will explore how to create your own CAS functions using the CAS Language (CASL).  A function is a component of the CASL programming language that can accept arguments, perform a computation or other operation, and return a value.  The value that is returned can be used in an assignment statement or elsewhere in expressions.

About SAS functions

SAS provides two types of supplied functions: built-in functions and common functions.  Built-in functions contain functionality that is unique to CASL.  These allow you to perform operations on your result tables, arrays, and dictionaries, and provide run-time support for your CASL programs.  Built-in functions cannot be replaced with user-defined functions.

Conversely, common functions provide functionality that is common to other SAS functions.  When used in a CASL program, SAS functions take a CASL value and a CASL value is returned.  Unlike built-in functions, you can replace these functions with user-defined functions.

Since the capabilities of built-in functions are unique to CASL, let’s look at these in-depth and demonstrate with an example.  Save the following FedSQL code in an external file called hmeqsql.sas.  This code will be read into CAS and stored as a variable.

The execDirect action executes FedSQL code in CAS.  The READPATH built-in function reads the FedSQL code saved in hmeqsql.sas and stores it in the CASL variable sqlcode which is used as input to the query parameter.

The fetch action displays the first 20 rows from the output table hmeq.out.

If you don’t feel like looking through the documentation for a built-in or common function, a list of each can be generated programmatically.  Run the following code to see a list of built-in functions.

Partial list of CASL built-in functions

Run the following code to see a list of common functions.

Partial list of common functions

User-defined CASL functions

In addition to the customizable capabilities of built-in functions supplied by SAS, you can also create your own functions using the FUNCTION statement.  User-defined functions can be called in expressions using CASL and they provide a large amount of flexibility.  The following example creates four different functions for temperature conversion.

After creating these functions, they can be called immediately, or you can store them in an external file and call them via a %include statement.  In this example, the user-defined functions have been stored in an external file called FunctionStore.sas.  You can call one, all, or any number of your user-defined functions.

The output from each function call is displayed in the log.

Lastly, if you want to see all user-defined functions, run the FUNCTIONLIST statement.  A list will be printed to the log.

More about CASL programming and using functions in CASL

Check out these resources for further information on programming in the CASL language and using functions in CASL.

Customize your CASL code with built-in and user-defined functions was published on SAS Users.