CASL

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

9月 282021
 

SQL is an important language for any programmer working with data. In SAS Cloud Analytic Services (CAS) you can execute SQL queries using the fedSQL.execDirect CAS action!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the

proc cas;
    fedSQL.execDirect / 
        query="select Make,
                      Model,
                      MSRP,
                      mean(MPG_City,MPG_Highway) as MPG_Avg
               from casuser.cars
               where Make='Toyota'
               order by MPG_Avg desc";
quit;

And the results:

The execDirect action executes the query in the distributed CAS environment and returns the expected results.

Using a SOURCE block

While the query we executed was simple, this is not always the case. Adding a complicated query as a string can make writing the query difficult.

Instead of specifying the query as a string in the CAS procedure, use the SOURCE statement to embed text in a variable. In the following example, I'll execute the same query as before. However, this time I'll nest the query inside a SOURCE block by specifying the SOURCE statement and name the variable MPG_toyota. Then I'll add the query inside the SOURCE block and use the ENDSOURCE statement to end the block.

proc cas;
    source MPG_toyota;
        select Make,
               Model,
               MSRP,
               mean(MPG_City,MPG_Highway) as MPG_Avg
        from casuser.cars
        where Make='Toyota'
        order by MPG_Avg desc;
    endsource;
 
    fedSQL.execDirect / query=MPG_toyota;
quit;

After the SOURCE block is complete, you can reference the variable as the value to the query parameter in the execDirect action.

And the results:

The results returned are the same, but using a SOURCE block makes the code easier to write and maintain.

Summary

In SAS Viya, FedSQL provides a scalable, threaded, high-performance way to query data and create new CAS tables from existing tables in the CAS server. In this example we saw two distinct ways to run SQL code on SAS Viya. This is only the beginning. See the resources below for more details on PROC FedSQL.

Resources

SAS® Viya®: FedSQL Programming for SAS® Cloud Analytic Services
FEDSQL Procedure
execDirect CAS Action
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing SQL in SAS Viya was published on SAS Users.

9月 232021
 

In my previous post CAS-Action! Simply Distinct - Part 1 I reviewed using the simple.distinct CAS action to explore distinct and missing values in a distributed CAS table.

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

And now, back to the distinct action. What if we want to do more? Maybe you want to create a CSV file that documents the percentage of distinct values in each column? Let's explore some possibilities.

To complete the task I'll break it down into four steps.

Step 1 - Find the number of rows in the CAS table

To find the number of rows in a CAS table use the simple.numRows CAS action. Let's execute the numRows action and store the results in a variable. I'll also PRINT and DESCRIBE the results to take a closer look at the output.

proc cas;
    simple.numRows result=n / table={name="cars",caslib="casuser"};
    describe n;
    print n;
...

And the results:
Output of the simple.numRows CAS action

The results of the DESCRIBE statement show the output of the action is a dictionary with a key named numRows and an integer as the value. The PRINT statement shows the value of the dictionary, numRows=428.

Now that we have the total number of rows, we can use that number in our calculation.

Step 2 - Find the number of distinct values in each column

Next, let's execute the distinct action and store the results in a variable named d. Then execute the PRINT statement to confirm the results.

...
    simple.distinct result=d /
            table={name="cars",caslib="casuser"};
    print d;
...

And the resluts:
Results of the simple.distinct CAS action
The results of the distinct action are as expected. Each column with the number of distinct values.

Step 3 - Create a calculated column that computes the percentage of distinct values

Now that we have the number of distinct values in each column, and the total number of rows in the CARS CAS table, we can calculate the total percent of distinct values in each column.

Consider the code:

...
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
    print pctDistinct;
...

To add a calculated column to a result table use the compute operator. In the first argument, specify column metadata inside an array (column name, label and format). In the second argument, specify the expression. My expression nDistinct/n.numRows divides the distinct values in each column by the total number of rows in the CARS table.

After the compute operator, select specific rows and columns from the result table using bracket notation. Here I'll select all rows, and only the columns Column, nDistinct and PctDistinct.

Lastly, I used the PRINT statement to confirm the results.

New calculated result table

In the output we can see the new result table with the computed column.

Step 4 - Save the results table as a CSV file

Lastly, let's put it all together!

I'll add the code from the pervious steps, then save the table as a CSV file using the SAVERESULTS statement with the CSV= option.

%let outpath=/*specify output file location*/;
 
proc cas;
* Specify the CAS table *;
    casTbl={name="cars", caslib="casuser"};
 
* Store the number of rows in the CAS table *;
    simple.numRows result=n / table=casTbl;
 
* Store the number of distinct values in each column *;
    simple.distinct result=d / table=casTbl;
 
* Calculate the percentage of distinct values in each column *;
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
 
* Save the result table as a CSV file *;
    saveresult pctDistinct csv="&outpath/pctDistinctCars.csv";
quit;

In the CSV= option I specified the outpath macro variable that contains the location of output folder, and add the name of the CSV file.

After executing the code the log indicates everything ran successfully, and a CSV file was created in the specified location. Next I'll find and open the CSV file.

My CSV file is located in my outfiles folder:

Then double click on the file to open it in SAS Studio:

Summary

The distinct action is a flexible and easy way to explore your data. It allows you to quickly explore your distributed CAS tables, then process and save the results in a variety of formats to fit your needs.

Additional Resources

distinct CAS action
CAS-Action! Simply Distinct - Part 1
CASL Result Tables
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Simply Distinct - Part 2 was published on SAS Users.

9月 152021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dive into exploring your data by viewing the number of distinct and missing values that exist in each column using the simple.distinct CAS action.

In this example, I will use the CAS procedure to execute the distinct action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax from other languages.

Determine the Number of Distinct and Missing Values in a CAS Table

To begin, let's use the simple.distinct CAS action on the CARS in-memory table to view the action's default behavior.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"};
quit;

In the preceeding code, I specify the CAS procedure, the action, then reference the in-memory table. The results of the call are displayed below.

The results allow us to quickly explore the CAS table and see the number of distinct and missing values. That's great, but what if you only want to see specific columns?

Specify the Columns in the Distinct Action

Sometimes your CAS tables contain hundreds of columns, but you are only interested in a select few. With the distinct action, you can specify a subset of columns using the inputs parameter. Here I'll specify the Make, Origin and Type columns.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        inputs={"Make","Origin","Type"};
quit;

After executing the code the results return the information for only the Make, Origin and Type columns.

Next, let's explore what we can do with the results.

Create a CAS Table with the Results

Some actions allow you to create a CAS table with the results. You might want to do this for a variety of reasons like use the new CAS table in a SAS Visual Analytics dashboard or in a data visualization procedure like SGPLOT.

To create a CAS table with the distinct action result, add the casOut parameter and specify new CAS table information, like name and caslib.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        casOut={name="distinctCars", caslib="casuser"};
quit;

After executing the code, the action returns information about the name and caslib of the new CAS table, and the number of rows and columns.

Visualize the Number of Distinct Values in Every Column

Lastly, what if you want to create a data visualization to better explore the table? Maybe you want to visualize the number of distinct values for each column? This task can be accomplished with variety of methods. However, since I know my newly created distinctCars CAS table has only 15 rows, I'll reference the CAS table directly using SGPLOT procedure.

This method works as long as the LIBNAME statement references your caslib correctly. I recommend this method when you know the CAS table is a manageable size. This is important because the CAS server does not execute the SGPLOT procedure on a distributed CAS table. The CAS server instead transfers the entire CAS table back to the client for processing.

To begin, the following LIBNAME statement will reference the casuser caslib.

libname casuser cas caslib="casuser";

Once the LIBNAME statement is correct, all you need to do is specify the CAS table in the DATA option of the SGPLOT procedure.

title justify=left height=14pt "Number of Distinct Values for Each Column in the CARS Table";
proc sgplot data=casuser.distinctCars
            noborder nowall;
    vbar Column / 
        response=NDistinct
        categoryorder=respdesc
        nooutline
        fillattrs=(color=cx0379cd);
    yaxis display=(NOLABEL);
    xaxis display=(NOLABEL);
quit;

The results show a bar chart with the number of distinct values for each column.

Summary

The simple.distinct CAS action is an easy way to explore a distributed CAS table. With one simple action, you can easily see how many distinct values are in each column, and the number of missing rows!

In Part 2 of this post, I'll further explore the simple.distinct CAS action and offer more ideas on how to interpret and use the results.

Additional Resources

distinct CAS action
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code

CAS-Action! Simply Distinct - Part 1 was published on SAS Users.

8月 282021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dig in with the table.columnInfo action.

In this post we'll l look at exploring the column attributes of a CAS table. Knowing the column names, data types, and any additional formats or labels associated with the columns makes it easier to work with the data. One way to see this type of information on a CAS table is to use the table.columnInfo CAS action!

In this example, I will use the CAS procedure to execute the columnInfo action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language.

Return CAS table Column Information

To view the column information of your CAS table, use the columnInfo CAS action with the table parameter. That's it! Refer to the code below.

proc cas;
    table.columnInfo / table={name="cars", caslib="casuser"};
quit;

The results would appear similar to the following:

table.columnInfo CAS action results

and return a variety of information about each column in the CAS table:

  • the column name
  • a label if one has been applied
  • the Id, which indicates the position of the column in the table
  • the data type of the column
  • the column length
  • the format, formatted length, width and decimal

Create a CSV Data Dictionary Using the ColumnInfo Action

What if instead, you want to create a data dictionary documenting the CAS table? With the columnInfo action you can export the results to a CSV file!

I'll use the columnInfo action again, but this time I'll store the results in a variable. The variable is a dictionary, so I need to reference the dictionary ci, then the columnInfo key to access the table. Next, I'll create two computed columns using the compute operator. The first column contains the name of the CAS table, and the second, the caslib of the table. I'll print the new table to confirm the results.

proc cas;
    table.columnInfo result=ci / table={name="cars", caslib="casuser"};
 
    ciTbl = ci.ColumnInfo.compute({"TableName","Table Name"}, "cars")
                         .compute({"Caslib"}, "casuser");
 
    print ciTbl;
    ...

The code produces the folloiwng result:

The new result table documents the CAS table columns, table name and caslib.

Lastly, I'll export the result table to a CSV file. First, I'll specify the folder location using the outpath variable. Then use the SAVERESULT statement to save the result table as a CSV file named carsDataDictionary.csv.

    ...
    outpath="specify a folder location";
    saveresult ciTbl csv=outpath || "carsDataDictionary.csv";
quit;

 

After I execute the CAS procedure I can find and open the CSV file to view the documented CAS table!

Summary

The table.columnInfo CAS action is a simple and easy way to show column information about your distributed CAS table. Using the results of the action allow you to create a data dictionary in a variety of formats.

Additional resources

table.columnInfo CAS action
CAS Action! - a series on fundamentals
SAS® Cloud Analytic Services: Fundamentals
CASL Result Tables
SAVERESULT Statement
Code

CAS-Action! Show me the ColumnInfo! was published on SAS Users.

8月 192021
 

In Part 1 of my series fetch CAS, fetch!, I executed the fetch CAS action to return rows from a CAS table. That was great, but what can you do with the results? Maybe you want to create a visualization that includes the top five cars by MSRP for all Toyota vehicles? How can we accomplish this task? We'll cover this question and provide several other examples in this post.

Save the results of a CAS action as a SAS data set

First, execute the table.fetch CAS action on the CARS in-memory table to filter for Toyota cars, return the Make, Model and MSRP columns, and sort the results by MSRP. Then save the results of the action in a variable using the results option. The results of an action return a dictionary to the client. The fetch action returns a dictionary with a single key, and the result table as the value. In this example, I'll name the variable toyota.

proc cas;
    table.fetch result=toyota / 
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP"}
          },
          sortBy={
                 {name="MSRP", order="DESCENDING"}
          },
          index=FALSE,
          to=5;
...

After executing the code, the results of the action are stored in the variable toyota and not shown in the output.

Next, use the SAVERESULT statement to save the result table stored in the toyota variable. Since the variable is a dictionary, specify the variable name toyota, a dot, then the fetch key. This will access the result table from the dictionary. Finally, specify the DATAOUT= option with the name of the SAS data set to create.

proc cas;
    table.fetch result=toyota / 
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP"}
          },
          sortBy={
                 {name="MSRP", order="DESCENDING"}
          },
          index=FALSE,
          to=5;
 
     saveresult toyota.fetch dataout=work.top5;
quit;

After executing the code, the result table is saved as a SAS data set. The SAS data set is named top5 and saved to the WORK library.

 

 

Wondering what else can we do? Let's take a look.

Visualize the SAS data set

Now that the result table is saved as a SAS data set, you can use the SGPLOT procedure to create a bar chart! Consider the code below.

title justify=left height=14pt "Top 5 Toyota Cars by MSRP";
proc sgplot data=work.top5
         noborder nowall;
    vbar Model / 
          response=MSRP 
          categoryorder=respdesc
          nooutline
          fillattrs=(color=cx0379cd);
    label MSRP="MSRP";
quit;
title;

There it is! We processed our data in CAS using the fetch action, returned a smaller subset of results back to the client, then used traditional SAS programming techniques on the smaller table. This method will work similarly in other languages like Python and R. Then you can then use the native visualization packages of the language!

You can now use your imagination on what else to do with the raw data from the CARS table or from the top5 results table we produced with the table.fetch action. Feel free to get creative.

Summary

CAS actions are optimized to run in a distributed environment on extremely large tables. Your CAS table can contain millions or even billions of rows. Since the data in CAS can be extremely large, the goal is to process and subset the table on the CAS server, then return a smaller amount of data to the client for additional processing, visualization or modeling.

Additional resources

fetch Action
SAVERESULT Statement
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code used in this post

CAS-Action! fetch CAS, fetch! - Part 2 was published on SAS Users.

8月 122021
 

Welcome to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dig in with our first topic - the fetch action.

The table.fetch CAS action retrieves the first 20 rows of a CAS distributed in-memory table. It's similar to using the head method in R or Python. However, the fetch action can do more than fetch rows from a table. In this example, I will use the CAS procedure to execute the fetch action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and other languages with some slight changes to the syntax. Pretty cool!

Retrieve the first 20 rows of a CAS table using the fetch action

To begin, let's use the table.fetch CAS action on the cars in-memory table to view the action's default behavior. Consider the following code:

proc cas; 
    table.fetch / table={name="cars", caslib="casuser"}; 
quit;

To use the fetch action with the CAS language, you specify the CAS procedure, the action, a forward slash, then the table parameter to specify the CAS table. In the table parameter I'll use the name sub-parameter to specify the cars in-memory table, and the caslib sub-parameter to specify the casuser caslib. The result of the call is listed below.

We retrieved 20 rows from our distributed cars table. What else can we fetch?

Retrieve the first n rows of a CAS table

What if you don't want the first 20 rows? Maybe you only want the first 5. To modify the amount of rows returned use the to parameter, then specify the number of rows. In the results, I also saw an _Index_ column. That column appears by default. To remove the _Index_ column, add the index=FALSE parameter.

proc cas;
    table.fetch /
         table={name="cars", caslib="casuser"},
         to=5,
         index=FALSE;
quit;

The results return 5 rows from the cars table, and the _Index_ column has been removed.

Now, what if I want to sort the returned rows?

Sort the table

The results of a CAS table are not guaranteed since the table is distributed among the CAS workers. To see the results in a precise order, use the sortBy parameter. The sortBy parameter requires an array of key-value pairs (dictionaries), so it's a bit tricky the first time you use it.

In this example, let's sort the table by Make and MSRP in descending order.

proc cas;
    table.fetch /
          table={name="cars", caslib="casuser"},
          sortBy={
                  {name="Make", order="DESCENDING"},
                  {name="MSRP", order="DESCENDING"}
          },
          index=FALSE;
quit;

The results show 20 rows of the cars table sorted by Make and MSRP. Great!

Subset the table

What if I only want to see the cars where Make is Toyota, and return the columns Make, Model, MSRP and Invoice? You can add the where and vars sub-parameters in the table parameter to subset the table.

proc cas;
    table.fetch /
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'",
                 vars={"Make","Model","MSRP","Invoice"}
           },
           to=5,
           index=FALSE;
quit;

The results return 5 rows from the cars table where Make is Toyota and only the four specified columns.

Quick detour

Instead of using the vars sub-parameter as we did in the previous example, you can use the fetchVars parameter. The code would change to:

proc cas;
    table.fetch /
          table={name="cars", caslib="casuser",
                 where="Make='Toyota'"},
          fetchVars={"Make","Model","MSRP","Invoice"},
          to=5,
          index=FALSE;
quit;

Either method works and it's totally up to you.

Create a calculated column

Lastly, let's create a calculated column named MPG_Avg that calculates the average city and highway miles per gallon for each car, then subset the results for all cars with an MPG_Avg greater than 40. To create a calculated column, use the computedVarsProgram sub-parameter in the table parameter. Then you can subset on the calculated column with the where sub-parameter.

proc cas;
    table.fetch /
          table={name="cars", caslib="casuser",
                 vars={"Make","Model","MPG_Avg"},
                 where="MPG_Avg > 40",
                 computedVarsProgram="MPG_Avg=mean(MPG_City,MPG_Highway)"
          },
    sortBy={
             {name="MPG_Avg", order="DESCENDING"}
    },
    index=FALSE;
quit;

Summary

CAS actions are optimized to run on the CAS server on large data. They are flexible and offer many parameters to control the output and can be executed in a variety of languages! In the next post, I'll cover more on the fetch action.

Additional Resources

fetch Action documentation
SAS® Cloud Analytic Services: Fundamentals documentation
Code used in this post

CAS-Action! fetch CAS, fetch! - Part 1 was published on SAS Users.

3月 162020
 

As a long-time SAS 9 programmer, I typically accomplish my data preparation tasks through some combination of the DATA Step, Proc SQL, Proc Transpose and some housekeeping procs like Proc Contents and Proc Datasets. With the introduction of SAS Viya, SAS released a new scripting language called CASL – a language that interacts with SAS Cloud Analytics Services (CAS).

CASL statements include actions, logically organized into action sets based on common functionality. For example, the Table action set allows you to load a table in CAS, view table metadata, change table metadata such as drop or rename a column, fetch (print) sample rows, save or drop a table from CAS, among other things. Steven Sober provides a great overview of CASL in his 2019 SAS Global Forum paper.

Learning CASL is a good idea assuming you want to leverage the power of CAS, because CASL is the language of CAS. While you can continue to use Viya-enabled procs for many of your data processing needs, certain new functionality is only available through CASL. CAS actions also provide a more granular access to options which otherwise may not be available as procedure options. But old habits die hard, and for a while I found myself bouncing between SAS 9.4 and CASL. I'd pull the data down from CAS just to get it to process in the SAS Programming Runtime Environment (SPRE) because it took less effort than figuring out how to get it done properly in CAS.

Then I started a project with a seriously large data set and quickly hit the limit on how much data I could pull down to process in SPRE. And although I could adjust the DATALIMIT option to retrieve more data than the default limit, I was wasting time and server resources unnecessarily moving the data between CAS and SPRE. All this, just so I could process the data “old school.”

I decided to challenge myself to do ALL my data preparation in CASL. I love a good challenge! I started collecting various useful CASL code snippets. In this post, I am sharing the tidbits I’ve accumulated, along with some commentary. Note, you can execute CAS actions from multiple clients, including SAS, Python, R, Lua and Java. Since my objective was to transition from traditional SAS code to CASL, I’ll focus solely on CAS actions from the SAS client perspective. While I used SAS Viya 3.5 for this work, most of the code snippets should work on prior versions as well.

The sections below cover: how to submit CASL code; loading, saving, dropping and deleting data; exploring data; table metadata management; and data transformation. Feel free to jump ahead to any section of interest.

How do you submit CASL code?

You use PROC CAS to submit CASL code from a SAS client. For example:

proc cas;
   <cas action 1>;
   <cas action 2>;
   …;
quit;

Similarly to other interactive procs that use run-group processing, separate CAS actions by run; statements. For example:

proc cas;
   <cas action 1>;
   run;
   <cas action 2>;
   run;
quit;

In fact, you can have the entire data preparation and analysis pipeline wrapped inside a single PROC CAS, passing data and results in the form of CASL variables from one action to the next. It can really be quite elegant.

Moving Data Using PROC CAS

Loading SASHDAT data in CAS

Your data must be in the SASHDAT format for CAS to process it. To load a SASHDAT table into CAS, use the table.loadtable CAS action. The code below assumes your SASHDAT table is saved to a directory on disk associated with your current active caslib, and you are loading it into the same caslib. (This usually occurs when you already performed the conversion to SASHDAT format, but the data has been unloaded. If you are just starting out and are wondering how to get your data into the SASHDAT format in the first place, the next session covers it, so keep reading.)

proc cas; 
     table.loadtable / path="TABLE_NAME.sashdat" casOut="TABLE_NAME"; 
     table.promote /name="TABLE_NAME" drop=true; 
quit;

The table.promote action elevates your newly loaded CAS table to global scope, making it available to other CAS sessions, including any additional sessions you start, or to other users assuming they have the right privileges. I can’t tell you how many times I forgot to promote my data, only to find that my hard-earned output table disappeared because I took a longer coffee break than expected! Don’t forget to promote or save off your data (or both, to be safe).

If you are loading from a directory other than the one associated with your active caslib, modify the path= statement to include the relative path to the source directory – relative to your active caslib. If you are looking to load to a different caslib, modify the casOut= statement by placing the output table name and library in curly brackets. For example:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME" caslib="CASLIB2"};
    table.promote /name="TABLE_NAME" drop=true;
quit;

You can also place a promote=true option inside the casOut= curly brackets instead of calling the table.promote action, like so:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" 
                      casOut={name="TABLE_NAME" caslib="CASLIB2" promote=true};
quit;

Curly brackets are ubiquitous in CASL (and quite unusual for SAS 9.4). If you take away one thing from this post, make it “watch your curly brackets.”

Loading SAS7BDAT, delimited data, and other file formats in CAS

If you have a SAS7BDAT file already on disk, load it in CAS with this code:

proc cas;
    table.loadtable /path="TABLE_NAME.sas7bdat" casout="TABLE_NAME" 
                     importoptions={filetype="basesas"};
quit;

Other file formats load similarly – just use the corresponding filetype= option to indicate the type of data you are loading, such as CSV, Excel, Document (.docx, .pdf, etc.), Image, Video, etc. The impressive list of supported file types is available here.

proc cas;
    table.loadtable / path="TABLE_NAME.csv" casout="TABLE_NAME" 
                      importoptions={filetype="csv"};
    run;
quit;

You can include additional parameters inside the importOptions= curly brackets, which differ by the file type. If you don’t need any additional parameters, use the filetype=”auto” and let CAS determine the best way to load the file.

When loading a table in SAS7BDAT, delimited or some other format, the table.loadtable action automatically converts your data to SASHDAT format.

Loading data in CAS conditionally

Imagine you are building a script to load data conditionally – only if it’s not already loaded. This is handy if you have a reason to believe the data might already be in CAS. To check if the data exists in CAS and load conditionally, you can leverage the table.tableExists action in combination with if-then-else logic. For example:

proc cas;
    table.tableExists result =r / name="TABLE_NAME";
    if r=0  then do;
        table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME"};
        table.promote /name="YOUR_TABLE_NAME" drop=true;
    end;
    else print("Table already loaded");
quit;

Notice that the result=r syntax captures the result code from the tableExists action, which is evaluated before the loadtable and promote actions are executed. If the table is already loaded in CAS, “Table already loaded” is printed to the log. Otherwise, the loadtable and promote actions are executed.

The ability to output CAS action results to a CASL variable (such as result=r in this example) is an extremely powerful feature of CASL. I include another example of this further down, but you can learn more about this functionality from documentation or this handy blog post.

Saving your CAS data

Let’s pretend you’ve loaded your data, transformed it, and promoted it to global scope. You or your colleagues can access it from other CAS sessions. You finished your data preparation, right? Wrong. As the header of this section suggests, you also need to save your prepared CAS data. Why? Because up to this point, your processed and promoted data exists only in memory. You will lose your work if your SAS administrator reboots the server or restarts the CAS controller. If you need to quickly reload prepared data, you must back it up to a caslib’s data source. See the CAS data lifecycle for more details.

To save off CAS data, naturally, you use the table.save action. For example:

proc cas;
    table.save / table="TABLE_NAME" name="TABLE_NAME.sashdat" replace=true;
quit;

In this example, you save off the CAS table to disk as a SASHDAT file, defaulting to the location associated with your active caslib. You can modify the table.save parameters to save or export the data to an alternative data storage solution with full control over the file format (including but not limited to such popular options as HDFS, Oracle, SQL Server, Salesforce, Snowflake and Teradata), compression, partitioning and other options.

Dropping and deleting data

To drop a table from CAS, execute a table.droptable action. For example:

proc cas;
    table.droptable / name="TABLE_NAME" quiet=true;
quit;

The quiet=true option prevents CAS from generating an error if the table does not exist in CAS. Dropping a table deletes it from memory. It’s a good practice to drop tables you no longer need, particularly the one you have promoted. Local-scope tables disappear on their own when the session expires, whereas global tables will stay in memory until they are unloaded.

Dropping a table does not delete the underlying source data. To delete the source of a CAS table, use the table.deleteSource action. For example:

proc cas;
    table.deletesource / source="TABLE_NAME.sashdat" quiet=true;
quit;

Exploring Data Using PROC CAS

After taking a close look at moving the data using PROC CAS, let’s look at some useful ways to start exploring and manipulating CAS data.

Fetching sample data

When preparing data, I find it useful to look at sample data. The table.fetch action is conceptually similar to PROC PRINT and, by default, outputs the first 20 rows of a CAS table:

proc cas;
table.fetch / table="Table_Name";
quit;

You can modify the table.fetch options to control which observations and variables to display and how to display them. For example:

proc cas;
table.fetch / table={name="TABLE_NAME" where="VAR1 in ('value1','value2')"},              /*1*/
	      orderby={{name="VAR1"},                                                     /*2*/
                             {name="VAR2", order="descending"}
                             },	
	     fetchvars={{name="VAR1", label="Variable 1"},                                /*3*/
	                     {name="VAR2", label="Variable 2"}, 
 		             {name="VAR3", label="Variable 3", format=comma12.1}
                             },
	     to=50,                                                                       /*4*/
	     index=false;							          /*5*/
quit;

In the code snippet above:

  • #1 – where= statement limits the records to those meeting the where criteria.
  • #2 – orderby= option defines the sort order. Ascending is the default and is not required. If sorting by more than one variable, put them in a list inside curly brackets, as shown in this example. If a list item has a subparameter (such as order= here), encase each item in curly brackets.
  • #3 – fetchvars= option defines the variables to print as well as their display labels and formats. If you select more than one variable, put them in a list inside curly brackets, as shown here. And again, if a list item includes a subparmeter, then enclose each list item in curly brackets.
  • #4 – to= option defines the number of rows to print.
  • #5 – index= false option deactivates the index column in the output (the default is index=true). This is similar to the noobs option in PROC PRINT.

As mentioned earlier, make sure to watch your curly brackets!

Descriptive statistics and variable distributions

The next step in data exploration is looking at descriptive statistics and variable distributions. I would need a separate blog post to cover this in detail, so I only touch upon a few of the many useful CAS actions.

To look at statistics for numeric variables, use the simple.summary action, which computes standard descriptive statistics, such as minimum, maximum, mean, standard deviation, number missing, and so on. For example:

proc cas;
    simple.summary / table="TABLE_NAME";
quit;

Among its other features, the simple.summary action allows analysis by one or more group-by variables, as well as define the list of desired descriptive statistics. For example:

proc cas;
simple.summary / table={name="TABLE_NAME", groupBy="VAR1", vars={"NUMVAR1","NUMVAR2”}},
                 subSet={"MAX", "MIN", "MEAN", "NMISS"};
quit;

Another useful action is simple.topK, which selects the top K and bottom K values for variables in a data set, based on a user-specified ranking order. The example below returns the top 5 and bottom 5 values for two variables based on their frequency:

proc cas;
simple.topk / table="TABLE_NAME" 
              aggregator="N",                        
              inputs={"VAR1","VAR2"},
              topk=5,
              bottomk=5;
quit;

Simple is a rich action set with heaps of useful options covered in the documentation.

You may be wondering – what about crosstabs and frequency tables? The simple action set includes freq and crosstab actions. In addition, the action closely imitating the functionality of the beloved PROC FREQ is freqTab.freqTab. For example, the code snippet below creates frequency tables for VAR1, VAR2 and a crosstab of the two.

proc cas;
freqtab.freqtab / table="TABLE_NAME"
                  tabulate={"VAR1","VAR2",
                  {vars={"VAR1","VAR2"}}
                  };
quit;

Managing CAS Table Variables

Changing table metadata

One of the basic tasks after exploring your data is changing table metadata, such as dropping unnecessary variables, renaming tables and columns, and changing variable formats and labels. The table.altertable action helps you with these housekeeping tasks. For example, the code snippet below renames the table, drops two variables and renames and changes labels for two variables:

proc cas;
    table.altertable / table="TABLE_NAME" rename="ANALYTIC_TABLE"
                       drop={"VAR1",”VAR2”}
                       columns={{name="VAR3" rename="ROW_ID" label="Row ID"},
                                {name="VAR4" rename="TARGET" label="Outcome Variable"}
                                }
                       ;
quit;

Outputting variable list to a data set

Another useful trick I frequently use is extracting table columns as a SAS data set. Having a list of variables as values in a data set makes it easy to build data-driven scripts leveraging macro programming. The code snippet below provides an example. Here we encounter another example of capturing action result as a CASL variable and using it in further processing – I can’t stress enough how helpful this is!

proc cas;
    table.columninfo r=collinfo / table={name="TABLE_NAME"};       /*1*/
    collist=collinfo["ColumnInfo"];                                /*2*/
    saveresult collist casout="collist";                           /*3*/
quit;

In the snippet above:

  • #1 - the columninfo action collects column information. The action result is passed to a CASL variable collinfo. Notice, instead of writing out result=, I am using an alias r =.
  • #2 - the portion of the a CASL variable collinfo containing column data is extracted into another CASL variable collist.
  • #3 - the saveresult statement sends the data to a CAS table collist. If you want to send the results to a SAS7BDAT data set, replace casout= with dataout=, and provide the library.table_name information.

Transforming the Data

Lastly, let’s look at some ways to use CAS actions to transform your data. Proc SQL and DATA step are the two swiss-army knives in SAS 9 developers’ toolkit that take care of 90% of the data prep. The good news is you can execute both DATA Step and SQL directly from PROC CAS. In addition, call the transpose action to transpose your data.

Executing DATA Step code

The dataStep.runCode action enables you to run DATA step code directly inside PROC CAS. You must enclose your DATA step code in quotation marks after the code= statement. For example:

proc cas;
    dataStep.runCode /
    code="
        data table_name;
        set table_name;
        run;
        ";
quit;

Running DATA step code in CAS allows access to sophisticated group-by processing and the use of such popular programming techniques as first- and last-dot. Refer to the documentation for important nuances related to processing in a distributed, multi-threaded environment of CAS.

Executing FedSQL

To run SQL in CASL, use the fedSQL.execDirect action. Enclose the SQL query in quotation marks following the query= statement. Optionally, you can use the casout= statement to save the results to a CAS table. For example:

proc cas;
    fedsql.execDirect/
    query=
         "
          select  *
          from TABLE1 a  inner join TABLE2 b
          on a.VAR1 = b.VAR1
         "
    casout={name="TABLE3", replace=True};
quit;

Similarly to DATA step, be aware of the many nuances when executing SQL in CAS via FedSQL. Brian Kinnebrew provides an excellent overview of FedSQL in his SAS Communities article, and the documentation has up-to-date details on the supported functionality.

Transposing data

Transposing data in PROC CAS is a breeze. The example below uses transpose.transpose action to restructure rows into columns.

proc cas;
    transpose.transpose /
          table={name="TABLE_NAME", groupby={"VAR1"}}
          transpose={"VAR2"}
          id={"VAR3"}
          prefix="Prefix"
    casout={name="TRANSPOSED" replace=true};
run;

You can transpose multiple variables in the same transpose action. Simply place additional variables inside the curly brackets following transpose=, in quotes, separated by a comma.

Conclusion

PROC CAS is a wrapper procedure enabling you to leverage SAS’ new programming language - CASL. CASL enables you to submit CAS actions directly to SAS Cloud Analytic Services engine from a SAS client. This post provided examples of loading, managing, exploring and transforming your data through CAS actions. Certain new functionality in CAS is only available through CAS actions, so getting comfortable with CASL makes sense. Fear not, and let the curly brackets guide the way 😊.

Acknowledgement

I would like to thank Brian Kinnebrew for his thoughtful review and generous help on my journey learning CASL.

Challenge accepted: Learning data prep in CASL was published on SAS Users.

8月 082019
 

Some key components of CASL are the action statements. These statements perform tasks that range from configuring the connection to the server, to summarizing large amounts of data, to processing image files. Each action has its own purpose. However, there is some overlapping functionality between actions. For example, more than one action can summarize numeric variables.
This blog looks at three actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS. Each of these actions generates summary statistics. Though there might be more actions that generate the same statistics, these three are a good place to start as you learn CASL.

Create a CAS table for these examples

The following step generates a table called mydata, stored in the casuser caslib, that will be used for the examples in this blog.

cas;
libname myuser cas caslib='casuser';
data myuser.mydata;
	length color $8;
	array X{100};
	do k=1 to 9000;
   	do i=1 to 50;
      	X{i} = rand('Normal',0, 4000);
      end;
      do i=51 to 100;
      	X{i} = rand('Normal', 100000, 1000000);
      end;
      if x1 < 0 then color='red';
      	else if x1 < 3000 then color='blue';
         else color='green';
		output;
	end;
run;

SIMPLE.SUMMARY

The purpose of the Simple Analytics action set is to perform basic analytical functions. One of the actions in the action set is the SUMMARY action, used for generating descriptive statistics like the minimum, maximum, mean, and sum.
This example demonstrates obtaining the sum, mean, and n statistics for five variables (x1–x5) and grouping the results by color. The numeric input variables are specified in the INPUTS parameter. The desired statistics are specified in the SUBSET parameter.

proc cas;
   simple.summary / 
      inputs={"x1","x2","x3","x4","x5"},
      subset={"sum","mean","n"},
      table={caslib="casuser",name="mydata",groupBy={"color"}},
      casout={caslib="casuser", name="mydata_summary", replace=true};
run;
	table.fetch /
		table={caslib="casuser",name="mydata_summary" };
run;
quit;

The SUMMARY action creates a table that is named mydata_summary. The TABLE.FETCH action is included to show the contents of the table.

The mydata_summary table can be used as input for other actions, its variable names can be changed, or it can be transposed. Now that you have the summary statistics, you can use them however you need to.

AGGREGATION.AGGREGATE

Many SAS® procedures have been CAS-enabled, which means you can use a CAS table as input. However, specifying a CAS table does not mean all of the processing takes place on the CAS server. Not every statement, option, or statistic is supported on the CAS server for every procedure. You need to be aware of what is not supported so that you do not run into issues if you choose to use a CAS-enabled procedure. In the documentation, refer to the CAS processing section to find the relevant details.
When a procedure is CAS-enabled, it means that, behind the scenes, it is submitting an action. The MEANS and SUMMARY procedure steps submit the AGGREGATION.AGGREGATE action.
With PROC MEANS it is common to use a BY or CLASS statement and ask for multiple statistics for each analysis variable, even different statistics for different variables. Here is an example:

proc means sum data=myuser.mydata noprint;
  by color;
   var x1 x2 x3;
   output out=test(drop=_type_ _freq_) sum(x1 x3)=x1_sum x3_sum
   max(x2)=x2_max std(x3)=x3_std;
run;

The AGGREGATE action produces the same statistics and the same structured output table as PROC MEANS.

proc cas;
	aggregation.aggregate / 
		table={name="mydata",caslib="casuser",groupby={"color"}}
      casout={name="mydata_aggregate", caslib='casuser', replace=true}
      varspecs={{name='x1', summarysubset='sum', columnnames={'x1_sum'}}, 
                {name='x2', agg='max', columnnames={'x2_max'}},
                {name='x3', summarysubset={'sum','std'},
                columnnames={'x3_sum','x3_std'}}}
      savegroupbyraw=true, savegroupbyformat=false, raw=true;
run;
quit;

The VARSPECS parameter might be confusing. It is where you specify the variables that you want to generate statistics for, which statistics to generate, and what the resulting column should be called. Check the documentation: depending on the desired statistic, you need to use either SUMMARYSUBSET or AGG arguments.

If you are using the GROUPBY action, you most likely want to use the SAVEGROUPBYRAW=TRUE parameter. Otherwise, you must list every GROUPBY variable in the VARSPECS parameter. Also, the SAVEGROUPBYFORMAT=FALSE parameter prevents the output from containing _f versions (formatted versions) of all of the GROUPBY variables.

DATAPREPROCESS.RUSTATS

The RUSTATS action, in the Data Preprocess action set, computes univariate statistics, centralized moments, quantiles, and frequency distribution statistics. This action is extremely useful when you need to calculate percentiles. If you ask for percentiles from a procedure, all of the data will be moved to the compute server and processed there, not on the CAS server.
This example has an extra step. Actions require a list of variables, which can be cumbersome when you want to generate summary statistics for more than a handful of variables. Macro variables are a handy way to insert a list of strings, variable names in this case, without having to enter all of the names yourself. The SQL procedure step generates a macro variable containing the names of all of the numeric variables. The macro variable is referenced in the INPUTS parameter.
The RUSTATS action has TABLE and INPUTS parameters like the previous actions. The REQUESTPACKAGES parameter is the parameter that allows for a request for percentiles.
The example also contains a bonus action, TRANSPOSE.TRANSPOSE. The goal is to have a final table, mydata_rustats2, with a structure like PROC MEANS would generate. The tricky part is the COMPUTEDVARSPROGRAM parameter.
The table generated by the RUSTATS action has a column called _Statistic_ that contains the name of the statistic. However, it contains “Percentile” multiple times. A different variable, _Arg1_, contains the value of the percentiles (1, 10, 20, and so on). The values of _Statistic_ and _Arg1_ need to be combined, and that new combined value generates the new variable names in the final table.
The COMPUTEDVARS parameter specifies that the name of the new variable will hold the concatenation of _Statistic_ and _Arg1_. The COMPUTEDVARSPROGRAM parameter tells CAS how to create the values for NEWID. The NEWID value is then used in the ID parameter to make the new variable names—pretty cool!

proc sql noprint;
	select quote(strip(name)) into: numvars separated by ','
	from dictionary.columns 
 	where libname='MYUSER' and memname='MYDATA' and type='num';
quit;
 
proc cas;
	dataPreprocess.rustats / 
   	table={name="mydata",caslib="casuser"} 
   	inputs={&numvars}
   	requestpackages={{percentiles={1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95 99},scales={"std"}}}
   	casoutstats={name="mydata_rustats",caslib="casuser"} ;
 
	transpose.transpose / 
   	table={caslib='casuser', name="mydata_rustats", groupby={"_variable_"},
			computedvars={{name="newid",format="$20."}},computedvarsprogram="newid=strip(_statistic_)||compress(strip(_arg1_),'.-');"}
   	transpose={"_value_"} 
   	id={"newid"}
   	casOut={caslib='casuser', name="mydata_rustats2", replace=true};
   run;
quit;

Here is a small portion of the final table. Remember, you can use the TABLE.FETCH action to view the table contents as well.

Summary

Summarizing numeric data is an important step in analyzing your data. CASL provides multiple actions that generate summary statistics. This blog provided a quick overview of three of those actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS.
The wonderful part of so many choices is that you can decide which one best fits your needs. Summarizing your data with actions also ensures that all of the processing occurs on the CAS server and that you are taking full advantage of its capabilities.
Be sure to use the DROPTABLE action to delete any tables that you do not want taking up space in memory:

proc cas;
	table.droptable / caslib='casuser' name='mydata' quiet=true;
	table.droptable / caslib='casuser' name='mydata_summary' quiet=true;
	table.droptable / caslib='casuser' name='mydata_aggregate' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats' quiet=true;
	table.droptable / caslib='casuser' name='mydata_rustats2' quiet=true;
quit;
cas casauto terminate;

Learn More

Summarization in CASL was published on SAS Users.