CASL

3月 162020
 

As a long-time SAS 9 programmer, I typically accomplish my data preparation tasks through some combination of the DATA Step, Proc SQL, Proc Transpose and some housekeeping procs like Proc Contents and Proc Datasets. With the introduction of SAS Viya, SAS released a new scripting language called CASL – a language that interacts with SAS Cloud Analytics Services (CAS).

CASL statements include actions, logically organized into action sets based on common functionality. For example, the Table action set allows you to load a table in CAS, view table metadata, change table metadata such as drop or rename a column, fetch (print) sample rows, save or drop a table from CAS, among other things. Steven Sober provides a great overview of CASL in his 2019 SAS Global Forum paper.

Learning CASL is a good idea assuming you want to leverage the power of CAS, because CASL is the language of CAS. While you can continue to use Viya-enabled procs for many of your data processing needs, certain new functionality is only available through CASL. CAS actions also provide a more granular access to options which otherwise may not be available as procedure options. But old habits die hard, and for a while I found myself bouncing between SAS 9.4 and CASL. I'd pull the data down from CAS just to get it to process in the SAS Programming Runtime Environment (SPRE) because it took less effort than figuring out how to get it done properly in CAS.

Then I started a project with a seriously large data set and quickly hit the limit on how much data I could pull down to process in SPRE. And although I could adjust the DATALIMIT option to retrieve more data than the default limit, I was wasting time and server resources unnecessarily moving the data between CAS and SPRE. All this, just so I could process the data “old school.”

I decided to challenge myself to do ALL my data preparation in CASL. I love a good challenge! I started collecting various useful CASL code snippets. In this post, I am sharing the tidbits I’ve accumulated, along with some commentary. Note, you can execute CAS actions from multiple clients, including SAS, Python, R, Lua and Java. Since my objective was to transition from traditional SAS code to CASL, I’ll focus solely on CAS actions from the SAS client perspective. While I used SAS Viya 3.5 for this work, most of the code snippets should work on prior versions as well.

The sections below cover: how to submit CASL code; loading, saving, dropping and deleting data; exploring data; table metadata management; and data transformation. Feel free to jump ahead to any section of interest.

How do you submit CASL code?

You use PROC CAS to submit CASL code from a SAS client. For example:

proc cas;
   <cas action 1>;
   <cas action 2>;
   …;
quit;

Similarly to other interactive procs that use run-group processing, separate CAS actions by run; statements. For example:

proc cas;
   <cas action 1>;
   run;
   <cas action 2>;
   run;
quit;

In fact, you can have the entire data preparation and analysis pipeline wrapped inside a single PROC CAS, passing data and results in the form of CASL variables from one action to the next. It can really be quite elegant.

Moving Data Using PROC CAS

Loading SASHDAT data in CAS

Your data must be in the SASHDAT format for CAS to process it. To load a SASHDAT table into CAS, use the table.loadtable CAS action. The code below assumes your SASHDAT table is saved to a directory on disk associated with your current active caslib, and you are loading it into the same caslib. (This usually occurs when you already performed the conversion to SASHDAT format, but the data has been unloaded. If you are just starting out and are wondering how to get your data into the SASHDAT format in the first place, the next session covers it, so keep reading.)

proc cas; 
     table.loadtable / path="TABLE_NAME.sashdat" casOut="TABLE_NAME"; 
     table.promote /name="TABLE_NAME" drop=true; 
quit;

The table.promote action elevates your newly loaded CAS table to global scope, making it available to other CAS sessions, including any additional sessions you start, or to other users assuming they have the right privileges. I can’t tell you how many times I forgot to promote my data, only to find that my hard-earned output table disappeared because I took a longer coffee break than expected! Don’t forget to promote or save off your data (or both, to be safe).

If you are loading from a directory other than the one associated with your active caslib, modify the path= statement to include the relative path to the source directory – relative to your active caslib. If you are looking to load to a different caslib, modify the casOut= statement by placing the output table name and library in curly brackets. For example:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME" caslib="CASLIB2"};
    table.promote /name="TABLE_NAME" drop=true;
quit;

You can also place a promote=true option inside the casOut= curly brackets instead of calling the table.promote action, like so:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" 
                      casOut={name="TABLE_NAME" caslib="CASLIB2" promote=true};
quit;

Curly brackets are ubiquitous in CASL (and quite unusual for SAS 9.4). If you take away one thing from this post, make it “watch your curly brackets.”

Loading SAS7BDAT, delimited data, and other file formats in CAS

If you have a SAS7BDAT file already on disk, load it in CAS with this code:

proc cas;
    table.loadtable /path="TABLE_NAME.sas7bdat" casout="TABLE_NAME" 
                     importoptions={filetype="basesas"};
quit;

Other file formats load similarly – just use the corresponding filetype= option to indicate the type of data you are loading, such as CSV, Excel, Document (.docx, .pdf, etc.), Image, Video, etc. The impressive list of supported file types is available here.

proc cas;
    table.loadtable / path="TABLE_NAME.csv" casout="TABLE_NAME" 
                      importoptions={filetype="csv"};
    run;
quit;

You can include additional parameters inside the importOptions= curly brackets, which differ by the file type. If you don’t need any additional parameters, use the filetype=”auto” and let CAS determine the best way to load the file.

When loading a table in SAS7BDAT, delimited or some other format, the table.loadtable action automatically converts your data to SASHDAT format.

Loading data in CAS conditionally

Imagine you are building a script to load data conditionally – only if it’s not already loaded. This is handy if you have a reason to believe the data might already be in CAS. To check if the data exists in CAS and load conditionally, you can leverage the table.tableExists action in combination with if-then-else logic. For example:

proc cas;
    table.tableExists result =r / name="TABLE_NAME";
    if r=0  then do;
        table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME"};
        table.promote /name="YOUR_TABLE_NAME" drop=true;
    end;
    else print("Table already loaded");
quit;

Notice that the result=r syntax captures the result code from the tableExists action, which is evaluated before the loadtable and promote actions are executed. If the table is already loaded in CAS, “Table already loaded” is printed to the log. Otherwise, the loadtable and promote actions are executed.

The ability to output CAS action results to a CASL variable (such as result=r in this example) is an extremely powerful feature of CASL. I include another example of this further down, but you can learn more about this functionality from documentation or this handy blog post.

Saving your CAS data

Let’s pretend you’ve loaded your data, transformed it, and promoted it to global scope. You or your colleagues can access it from other CAS sessions. You finished your data preparation, right? Wrong. As the header of this section suggests, you also need to save your prepared CAS data. Why? Because up to this point, your processed and promoted data exists only in memory. You will lose your work if your SAS administrator reboots the server or restarts the CAS controller. If you need to quickly reload prepared data, you must back it up to a caslib’s data source. See the CAS data lifecycle for more details.

To save off CAS data, naturally, you use the table.save action. For example:

proc cas;
    table.save / table="TABLE_NAME" name="TABLE_NAME.sashdat" replace=true;
quit;

In this example, you save off the CAS table to disk as a SASHDAT file, defaulting to the location associated with your active caslib. You can modify the table.save parameters to save or export the data to an alternative data storage solution with full control over the file format (including but not limited to such popular options as HDFS, Oracle, SQL Server, Salesforce, Snowflake and Teradata), compression, partitioning and other options.

Dropping and deleting data

To drop a table from CAS, execute a table.droptable action. For example:

proc cas;
    table.droptable / name="TABLE_NAME" quiet=true;
quit;

The quiet=true option prevents CAS from generating an error if the table does not exist in CAS. Dropping a table deletes it from memory. It’s a good practice to drop tables you no longer need, particularly the one you have promoted. Local-scope tables disappear on their own when the session expires, whereas global tables will stay in memory until they are unloaded.

Dropping a table does not delete the underlying source data. To delete the source of a CAS table, use the table.deleteSource action. For example:

proc cas;
    table.deletesource / source="TABLE_NAME.sashdat" quiet=true;
quit;

Exploring Data Using PROC CAS

After taking a close look at moving the data using PROC CAS, let’s look at some useful ways to start exploring and manipulating CAS data.

Fetching sample data

When preparing data, I find it useful to look at sample data. The table.fetch action is conceptually similar to PROC PRINT and, by default, outputs the first 20 rows of a CAS table:

proc cas;
table.fetch / table="Table_Name";
quit;

You can modify the table.fetch options to control which observations and variables to display and how to display them. For example:

proc cas;
table.fetch / table={name="TABLE_NAME" where="VAR1 in ('value1','value2')"},              /*1*/
	      orderby={{name="VAR1"},                                                     /*2*/
                             {name="VAR2", order="descending"}
                             },	
	     fetchvars={{name="VAR1", label="Variable 1"},                                /*3*/
	                     {name="VAR2", label="Variable 2"}, 
 		             {name="VAR3", label="Variable 3", format=comma12.1}
                             },
	     to=50,                                                                       /*4*/
	     index=false;							          /*5*/
quit;

In the code snippet above:

  • #1 – where= statement limits the records to those meeting the where criteria.
  • #2 – orderby= option defines the sort order. Ascending is the default and is not required. If sorting by more than one variable, put them in a list inside curly brackets, as shown in this example. If a list item has a subparameter (such as order= here), encase each item in curly brackets.
  • #3 – fetchvars= option defines the variables to print as well as their display labels and formats. If you select more than one variable, put them in a list inside curly brackets, as shown here. And again, if a list item includes a subparmeter, then enclose each list item in curly brackets.
  • #4 – to= option defines the number of rows to print.
  • #5 – index= false option deactivates the index column in the output (the default is index=true). This is similar to the noobs option in PROC PRINT.

As mentioned earlier, make sure to watch your curly brackets!

Descriptive statistics and variable distributions

The next step in data exploration is looking at descriptive statistics and variable distributions. I would need a separate blog post to cover this in detail, so I only touch upon a few of the many useful CAS actions.

To look at statistics for numeric variables, use the simple.summary action, which computes standard descriptive statistics, such as minimum, maximum, mean, standard deviation, number missing, and so on. For example:

proc cas;
    simple.summary / table="TABLE_NAME";
quit;

Among its other features, the simple.summary action allows analysis by one or more group-by variables, as well as define the list of desired descriptive statistics. For example:

proc cas;
simple.summary / table={name="TABLE_NAME", groupBy="VAR1", vars={"NUMVAR1","NUMVAR2”}},
                 subSet={"MAX", "MIN", "MEAN", "NMISS"};
quit;

Another useful action is simple.topK, which selects the top K and bottom K values for variables in a data set, based on a user-specified ranking order. The example below returns the top 5 and bottom 5 values for two variables based on their frequency:

proc cas;
simple.topk / table="TABLE_NAME" 
              aggregator="N",                        
              inputs={"VAR1","VAR2"},
              topk=5,
              bottomk=5;
quit;

Simple is a rich action set with heaps of useful options covered in the documentation.

You may be wondering – what about crosstabs and frequency tables? The simple action set includes freq and crosstab actions. In addition, the action closely imitating the functionality of the beloved PROC FREQ is freqTab.freqTab. For example, the code snippet below creates frequency tables for VAR1, VAR2 and a crosstab of the two.

proc cas;
freqtab.freqtab / table="TABLE_NAME"
                  tabulate={"VAR1","VAR2",
                  {vars={"VAR1","VAR2"}}
                  };
quit;

Managing CAS Table Variables

Changing table metadata

One of the basic tasks after exploring your data is changing table metadata, such as dropping unnecessary variables, renaming tables and columns, and changing variable formats and labels. The table.altertable action helps you with these housekeeping tasks. For example, the code snippet below renames the table, drops two variables and renames and changes labels for two variables:

proc cas;
    table.altertable / table="TABLE_NAME" rename="ANALYTIC_TABLE"
                       drop={"VAR1",”VAR2”}
                       columns={{name="VAR3" rename="ROW_ID" label="Row ID"},
                                {name="VAR4" rename="TARGET" label="Outcome Variable"}
                                }
                       ;
quit;

Outputting variable list to a data set

Another useful trick I frequently use is extracting table columns as a SAS data set. Having a list of variables as values in a data set makes it easy to build data-driven scripts leveraging macro programming. The code snippet below provides an example. Here we encounter another example of capturing action result as a CASL variable and using it in further processing – I can’t stress enough how helpful this is!

proc cas;
    table.columninfo r=collinfo / table={name="TABLE_NAME"};       /*1*/
    collist=collinfo["ColumnInfo"];                                /*2*/
    saveresult collist casout="collist";                           /*3*/
quit;

In the snippet above:

  • #1 - the columninfo action collects column information. The action result is passed to a CASL variable collinfo. Notice, instead of writing out result=, I am using an alias r =.
  • #2 - the portion of the a CASL variable collinfo containing column data is extracted into another CASL variable collist.
  • #3 - the saveresult statement sends the data to a CAS table collist. If you want to send the results to a SAS7BDAT data set, replace casout= with dataout=, and provide the library.table_name information.

Transforming the Data

Lastly, let’s look at some ways to use CAS actions to transform your data. Proc SQL and DATA step are the two swiss-army knives in SAS 9 developers’ toolkit that take care of 90% of the data prep. The good news is you can execute both DATA Step and SQL directly from PROC CAS. In addition, call the transpose action to transpose your data.

Executing DATA Step code

The dataStep.runCode action enables you to run DATA step code directly inside PROC CAS. You must enclose your DATA step code in quotation marks after the code= statement. For example:

proc cas;
    dataStep.runCode /
    code="
        data table_name;
        set table_name;
        run;
        ";
quit;

Running DATA step code in CAS allows access to sophisticated group-by processing and the use of such popular programming techniques as first- and last-dot. Refer to the documentation for important nuances related to processing in a distributed, multi-threaded environment of CAS.

Executing FedSQL

To run SQL in CASL, use the fedSQL.execDirect action. Enclose the SQL query in quotation marks following the query= statement. Optionally, you can use the casout= statement to save the results to a CAS table. For example:

proc cas;
    fedsql.execDirect/
    query=
         "
          select  *
          from TABLE1 a  inner join TABLE2 b
          on a.VAR1 = b.VAR1
         "
    casout={name="TABLE3", replace=True};
quit;

Similarly to DATA step, be aware of the many nuances when executing SQL in CAS via FedSQL. Brian Kinnebrew provides an excellent overview of FedSQL in his SAS Communities article, and the documentation has up-to-date details on the supported functionality.

Transposing data

Transposing data in PROC CAS is a breeze. The example below uses transpose.transpose action to restructure rows into columns.

proc cas;
    transpose.transpose /
          table={name="TABLE_NAME", groupby={"VAR1"}}
          transpose={"VAR2"}
          id={"VAR3"}
          prefix="Prefix"
    casout={name="TRANSPOSED" replace=true};
run;

You can transpose multiple variables in the same transpose action. Simply place additional variables inside the curly brackets following transpose=, in quotes, separated by a comma.

Conclusion

PROC CAS is a wrapper procedure enabling you to leverage SAS’ new programming language - CASL. CASL enables you to submit CAS actions directly to SAS Cloud Analytic Services engine from a SAS client. This post provided examples of loading, managing, exploring and transforming your data through CAS actions. Certain new functionality in CAS is only available through CAS actions, so getting comfortable with CASL makes sense. Fear not, and let the curly brackets guide the way 😊.

Acknowledgement

I would like to thank Brian Kinnebrew for his thoughtful review and generous help on my journey learning CASL.

Challenge accepted: Learning data prep in CASL was published on SAS Users.

8月 082019
 

Some key components of CASL are the action statements. These statements perform tasks that range from configuring the connection to the server, to summarizing large amounts of data, to processing image files. Each action has its own purpose. However, there is some overlapping functionality between actions. For example, more than one action can summarize numeric variables.
This blog looks at three actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS. Each of these actions generates summary statistics. Though there might be more actions that generate the same statistics, these three are a good place to start as you learn CASL.

Create a CAS table for these examples

The following step generates a table called mydata, stored in the casuser caslib, that will be used for the examples in this blog.

cas;
libname myuser cas caslib='casuser';
data myuser.mydata;
	length color $8;
	array X{100};
	do k=1 to 9000;
   	do i=1 to 50;
      	X{i} = rand('Normal',0, 4000);
      end;
      do i=51 to 100;
      	X{i} = rand('Normal', 100000, 1000000);
      end;
      if x1 < 0 then color='red';
      	else if x1 < 3000 then color='blue';
         else color='green';
		output;
	end;
run;

SIMPLE.SUMMARY

The purpose of the Simple Analytics action set is to perform basic analytical functions. One of the actions in the action set is the SUMMARY action, used for generating descriptive statistics like the minimum, maximum, mean, and sum.
This example demonstrates obtaining the sum, mean, and n statistics for five variables (x1–x5) and grouping the results by color. The numeric input variables are specified in the INPUTS parameter. The desired statistics are specified in the SUBSET parameter.

proc cas;
   simple.summary / 
      inputs={"x1","x2","x3","x4","x5"},
      subset={"sum","mean","n"},
      table={caslib="casuser",name="mydata",groupBy={"color"}},
      casout={caslib="casuser", name="mydata_summary", replace=true};
run;
	table.fetch /
		table={caslib="casuser",name="mydata_summary" };
run;
quit;

The SUMMARY action creates a table that is named mydata_summary. The TABLE.FETCH action is included to show the contents of the table.

The mydata_summary table can be used as input for other actions, its variable names can be changed, or it can be transposed. Now that you have the summary statistics, you can use them however you need to.

AGGREGATION.AGGREGATE

Many SAS® procedures have been CAS-enabled, which means you can use a CAS table as input. However, specifying a CAS table does not mean all of the processing takes place on the CAS server. Not every statement, option, or statistic is supported on the CAS server for every procedure. You need to be aware of what is not supported so that you do not run into issues if you choose to use a CAS-enabled procedure. In the documentation, refer to the CAS processing section to find the relevant details.
When a procedure is CAS-enabled, it means that, behind the scenes, it is submitting an action. The MEANS and SUMMARY procedure steps submit the AGGREGATION.AGGREGATE action.
With PROC MEANS it is common to use a BY or CLASS statement and ask for multiple statistics for each analysis variable, even different statistics for different variables. Here is an example:

proc means sum data=myuser.mydata noprint;
  by color;
   var x1 x2 x3;
   output out=test(drop=_type_ _freq_) sum(x1 x3)=x1_sum x3_sum
   max(x2)=x2_max std(x3)=x3_std;
run;

The AGGREGATE action produces the same statistics and the same structured output table as PROC MEANS.

proc cas;
	aggregation.aggregate / 
		table={name="mydata",caslib="casuser",groupby={"color"}}
      casout={name="mydata_aggregate", caslib='casuser', replace=true}
      varspecs={{name='x1', summarysubset='sum', columnnames={'x1_sum'}}, 
                {name='x2', agg='max', columnnames={'x2_max'}},
                {name='x3', summarysubset={'sum','std'},
                columnnames={'x3_sum','x3_std'}}}
      savegroupbyraw=true, savegroupbyformat=false, raw=true;
run;
quit;

The VARSPECS parameter might be confusing. It is where you specify the variables that you want to generate statistics for, which statistics to generate, and what the resulting column should be called. Check the documentation: depending on the desired statistic, you need to use either SUMMARYSUBSET or AGG arguments.

If you are using the GROUPBY action, you most likely want to use the SAVEGROUPBYRAW=TRUE parameter. Otherwise, you must list every GROUPBY variable in the VARSPECS parameter. Also, the SAVEGROUPBYFORMAT=FALSE parameter prevents the output from containing _f versions (formatted versions) of all of the GROUPBY variables.

DATAPREPROCESS.RUSTATS

The RUSTATS action, in the Data Preprocess action set, computes univariate statistics, centralized moments, quantiles, and frequency distribution statistics. This action is extremely useful when you need to calculate percentiles. If you ask for percentiles from a procedure, all of the data will be moved to the compute server and processed there, not on the CAS server.
This example has an extra step. Actions require a list of variables, which can be cumbersome when you want to generate summary statistics for more than a handful of variables. Macro variables are a handy way to insert a list of strings, variable names in this case, without having to enter all of the names yourself. The SQL procedure step generates a macro variable containing the names of all of the numeric variables. The macro variable is referenced in the INPUTS parameter.
The RUSTATS action has TABLE and INPUTS parameters like the previous actions. The REQUESTPACKAGES parameter is the parameter that allows for a request for percentiles.
The example also contains a bonus action, TRANSPOSE.TRANSPOSE. The goal is to have a final table, mydata_rustats2, with a structure like PROC MEANS would generate. The tricky part is the COMPUTEDVARSPROGRAM parameter.
The table generated by the RUSTATS action has a column called _Statistic_ that contains the name of the statistic. However, it contains “Percentile” multiple times. A different variable, _Arg1_, contains the value of the percentiles (1, 10, 20, and so on). The values of _Statistic_ and _Arg1_ need to be combined, and that new combined value generates the new variable names in the final table.
The COMPUTEDVARS parameter specifies that the name of the new variable will hold the concatenation of _Statistic_ and _Arg1_. The COMPUTEDVARSPROGRAM parameter tells CAS how to create the values for NEWID. The NEWID value is then used in the ID parameter to make the new variable names—pretty cool!

proc sql noprint;
	select quote(strip(name)) into: numvars separated by ','
	from dictionary.columns 
 	where libname='MYUSER' and memname='MYDATA' and type='num';
quit;
 
proc cas;
	dataPreprocess.rustats / 
   	table={name="mydata",caslib="casuser"} 
   	inputs={&numvars}
   	requestpackages={{percentiles={1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95 99},scales={"std"}}}
   	casoutstats={name="mydata_rustats",caslib="casuser"} ;
 
	transpose.transpose / 
   	table={caslib='casuser', name="mydata_rustats", groupby={"_variable_"},
			computedvars={{name="newid",format="$20."}},computedvarsprogram="newid=strip(_statistic_)||compress(strip(_arg1_),'.-');"}
   	transpose={"_value_"} 
   	id={"newid"}
   	casOut={caslib='casuser', name="mydata_rustats2", replace=true};
   run;
quit;

Here is a small portion of the final table. Remember, you can use the TABLE.FETCH action to view the table contents as well.

Summary

Summarizing numeric data is an important step in analyzing your data. CASL provides multiple actions that generate summary statistics. This blog provided a quick overview of three of those actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS.
The wonderful part of so many choices is that you can decide which one best fits your needs. Summarizing your data with actions also ensures that all of the processing occurs on the CAS server and that you are taking full advantage of its capabilities.
Be sure to use the DROPTABLE action to delete any tables that you do not want taking up space in memory:

proc cas;
	table.droptable / caslib='casuser' name='mydata' quiet=true;
	table.droptable / caslib='casuser' name='mydata_summary' quiet=true;
	table.droptable / caslib='casuser' name='mydata_aggregate' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats2' quiet=true;
quit;
cas casauto terminate;

Learn More

Summarization in CASL was published on SAS Users.

6月 262019
 

In his article How to use CASL to develop and work with user-defined CAS actions, Brian Kinnebrew defines CASL as "a language specification used by the SAS client to interact with and provide easy access to Cloud Analytic Services (CAS). CASL is a statement-based scripting language with many uses and strengths." I can't come up with a better definition, so if you'd like to learn more about the basics of CASL, I encourage you to read Brian's post.

In a SAS Stored Process (or any traditional SAS program) the code has multiple DATA steps and procedures with a good dose of macros.

Over the last couple of years, the focus of my projects is the use of CAS actions with no traditional procedures in the mix. Most of these involved web applications calling multiple CAS actions. My initial approach was to make multiple http calls - one per action. This could get tedious.

Then I met my action hero: sccasl.runCasl.

My favorite SAS CASL action

This action executes a CASL "script" on the CAS server analogous to executing a SAS Stored Process. Running a CASL program with a mix of CAS actions and CASL statements on the CAS server has these benefits:

  1. Reduced the number of http calls to the server
  2. The client-side code is much easier to reason
  3. The returned values can be a dictionary that is suited for further consumption by the client, simplifying the client code

My personal name for these scripts is "CAS stored process".

Where art thou Macro?

In many applications, user input is passed to the code running on the server. In a SAS Stored Process, macros pass the parameters. CASL has no macros. My initial approaches to passing parameters to CASL programs were:

  1. Generate the final CASL program in JavaScript with the user input values inserted into the code. Sample code is available here.
    • Drawback: Debugging the code in SAS Studio requires a cut-and-paste of the generated code into SAS Studio.
  2. Load the data into a CAS table using table.upload action for programs needing an input table. Sample code is available here.
    • Drawback: This requires an additional http call to the server.

In developing the GraphQL approach to writing applications - GraphQL and SAS Viya applications - a good match - I addressed the two drawbacks listed above by creating two functions and put them in my utility belt.

    Superfriend functions

  • jsonToDict.js - generate a string having the CASL dictionary version of a JavaScript object
  • argsToTable - create a CAS table from a dictionary

The remainder of this article discusses these two functions. To demonstrate the functions' usage, I use code listings from the scoring example, covered in the GraphQL example.

The jsonToDict function

The result of this function produces a string with CASL dictionary suitable for inclusion in CASL code.

Function definition

jsonToDict ⇒ string

Returnsstring - returns the string containing the CASL dictionary

Param Type Description
obj object the JavScript object of interest
name string the name to assign to the dictionary

 

Example function code

The code below outlines the jsonToDict function usage.

obj = {x:1, b:2, c:['a', 'b']};
let r = jsonToDict(obj, '_appEnv_');
The result is:
r = `_appEnv_ = {x=1, b=2, c={"a", "b"}}'`;

The following lists the input parameters passed to the CASL code for scoring.

let input = {
        JOB    : 'J1',
        CLAGE  : 100, 
        CLNO   : 20, 
        DEBTINC: 20, 
        DELINQ : 2, 
        DEROG  : 0, 
        MORTDUE: 4000, 
        NINQ   : 1,
        YOJ    : 10,
        LOAN: 1000,
        ASSET: 100000
    };

Below is the Javascript code and the result.

let _args_ = jsonToDict(input, '_args_');
results in
 
args_ = `_args_ = {  JOB= "J1" ,CLAGE=100  ,CLNO=20  ,DEBTINC=20  ,DELINQ=2  ,DEROG=0  ,
MORTDUE=4000  ,NINQ=1  ,YOJ=10  ,LOAN=1000  ,ASSET=100000  };`;

To allow the use of different versions of the model, the name of the scoring model is passed in as a parameter. The Javascript for the model follows.

let env = {
     astore: {
            caslib: 'Public',
            name  : 'GRADIENT_BOOSTING___BAD_2'
        }
};
let _appEnv_= jsonToDict(env, '_appEnv_');
 
resulting in:
let _appEnv_ = `_appEnv_ = { astore = { caslib="Public", name="GRADIENT_BOOSTING___BAD_2"}};`;

Next, I prepend the strings to the CASL program in the client code with the following code snips.

let code = _args_ + _appEnv_ + `the CASL code shown below';
loadactionset "astore";
 
/* convert arguments to a cas table */
argsToTable(_args_, 'casuser', 'INPUTDATA' );
 
/* score */
action astore.score r=rc/
    table  = { caslib= 'casuser', name = 'INPUTDATA' } 
    rstore = { caslib= _appEnv_.astore.caslib,  name=_appEnv_.astore.name }
    casout  = { caslib = 'casuser', name = 'OUTPUTDATA' replace= TRUE};
 
/* fetch results */
action table.fetch r = result /
    table = {  caslib = 'casuser' name = 'OUTPUTDATA' } ;
 
/* extract the score and send it as a dictionary */
score = result.Fetch[1].P_BAD;
send_response({score = score});

Now the CASL program can access the incoming information using the two dictionaries _args_ and _appEnv_. Note: As a personal choice, I use a convention of _args_ for user input and _appEnv_ for application specific information. I use the restaf application framework to make the http calls as shown below.

let payload = {
        action: 'sccasl.runCasl',
        data  : { code: code}
    }
 let result = await store.runAction(session, payload); 
let score = result.items('results', 'score');

CASL coding - easier than saying Kilp-ill-skim

Notice the absence of string substitutions that look strange. Just simple, straight forward coding. Easier than saying your name backwards, forcing you back to the fifth dimension.

The argsToTable.casl function

The sample code above used the function argsToTable. As you may guess it converts the _args_ dictionary into a CAS table used in the scoring action. The argsToTable is the function in CASL handling this task.

Function definition

argsToTable ⇒ Load dictionary into a CAS Table

Param Type Description
input dictionary the data to load
caslib string caslib of output table
name string name of output table

 

The relevant CASL code from Step 1 is reproduced here:

argsToTable(_args_, 'casuser', 'INPUTDATA' );

The argsToTable function is either stored on a server or prepended to the CASL code sent to the runCasl action. This function removes the need to run a http call to load the data in the CAS table.

Returning data from CAS - the send_response function

The ultimate sidekick function

Any good super hero has a sidekick. The function send_response in CASL is very versatile - it allows one to return data in the form the application needs and allows more than one result. In many programs I return data in a form easily consumed by the client code.

For example, if you wanted to return just the rows of table you can do the following:

function resultsToDict(r);
    casResults = {};
    i = 1;
  do row over r;
     casResults[i] = row;
     i = i + 1;
   end;
  return casResults;
end;
 
/* and use it as follows: */
 
action table.fetch r = result /
    table = {  caslib = 'casuser',  name = 'mydata' };
/* extract the data and return it as a dictionary */
casResults = resultsToDict(result.Fetch);
send_response({casResults: casResults});

Finally

Using a combination of CASL, a couple of utility functions and the runCasl action you can develop some very efficient programs with minimal traffic between your client and the server. If you run multiple actions in sequence you should consider grouping them into a CASL program and executing them on the CAS server using the runCasl action.

Next

In my next article, which I hope to finish in a flash, I will discuss using the runCasl action to create a browser for CAS tables with support for pagination.

All comments are welcome. Please feel free to clone the code, make it better.

Cheers...
Deva

Let runCasl be your BFF and favorite action hero

"CAS Stored Process" with my Favorite Action Hero runCasl was published on SAS Users.

11月 132018
 

In my previous blog post I demonstrated how to create your own CAS actions and action sets.  In this post, we will explore how to create your own CAS functions using the CAS Language (CASL).  A function is a component of the CASL programming language that can accept arguments, perform a computation or other operation, and return a value.  The value that is returned can be used in an assignment statement or elsewhere in expressions.

About SAS functions

SAS provides two types of supplied functions: built-in functions and common functions.  Built-in functions contain functionality that is unique to CASL.  These allow you to perform operations on your result tables, arrays, and dictionaries, and provide run-time support for your CASL programs.  Built-in functions cannot be replaced with user-defined functions.

Conversely, common functions provide functionality that is common to other SAS functions.  When used in a CASL program, SAS functions take a CASL value and a CASL value is returned.  Unlike built-in functions, you can replace these functions with user-defined functions.

Since the capabilities of built-in functions are unique to CASL, let’s look at these in-depth and demonstrate with an example.  Save the following FedSQL code in an external file called hmeqsql.sas.  This code will be read into CAS and stored as a variable.

The execDirect action executes FedSQL code in CAS.  The READPATH built-in function reads the FedSQL code saved in hmeqsql.sas and stores it in the CASL variable sqlcode which is used as input to the query parameter.

The fetch action displays the first 20 rows from the output table hmeq.out.

If you don’t feel like looking through the documentation for a built-in or common function, a list of each can be generated programmatically.  Run the following code to see a list of built-in functions.

Partial list of CASL built-in functions

Run the following code to see a list of common functions.

Partial list of common functions

User-defined CASL functions

In addition to the customizable capabilities of built-in functions supplied by SAS, you can also create your own functions using the FUNCTION statement.  User-defined functions can be called in expressions using CASL and they provide a large amount of flexibility.  The following example creates four different functions for temperature conversion.

After creating these functions, they can be called immediately, or you can store them in an external file and call them via a %include statement.  In this example, the user-defined functions have been stored in an external file called FunctionStore.sas.  You can call one, all, or any number of your user-defined functions.

The output from each function call is displayed in the log.

Lastly, if you want to see all user-defined functions, run the FUNCTIONLIST statement.  A list will be printed to the log.

More about CASL programming and using functions in CASL

Check out these resources for further information on programming in the CASL language and using functions in CASL.

Customize your CASL code with built-in and user-defined functions was published on SAS Users.