CASL

12月 202022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. The previous posts show how to use the simple.freq CAS action to generate, save and group simple frequency tables. In this post I will show you how to use the freqTab.freqTab CAS action to generate more advanced one-way frequency and crosstabulation tables.

In this example, I will use the CAS language (CASL) to execute the freqTab CAS action. Instead of using CASL, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax in other languages.

Load the demonstration data into memory

I'll start by executing the loadTable action to load the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory. By default the Samples caslib should be available in your SAS Viya environment. I'll load the table to the Casuser caslib and then I'll clean up the CAS table by renaming and dropping columns to make the table easier to use. For more information how to rename columns check out my previous post. Lastly I'll execute the fetch action to preview 5 rows.

proc cas;
   * Specify the input/output CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Load the CAS table into memory *;
    table.loadtable / 
        path = "WARRANTY_CLAIMS_0117.sashdat", caslib = "samples",
        casOut = casTbl + {replace=TRUE};
 
* Rename columns with the labels. Spaces replaced with underscores *;
 
   *Store the results of the columnInfo action in a dictionary *;
   table.columnInfo result=cr / table = casTbl;
 
   * Loop over the columnInfo result table and create a list of dictionaries *;
   listElementCounter = 0;
   do columnMetadata over cr.ColumnInfo;
	listElementCounter = listElementCounter + 1;
	convertColLabel = tranwrd(columnMetadata['Label'],' ','_');
	renameColumns[listElementCounter] = {name = columnMetadata['Column'], rename = convertColLabel, label=""};
   end;
 
   * Rename columns *;
   keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                  'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'};
   table.alterTable / 
	name = casTbl['Name'], caslib = casTbl['caslib'], 
	columns=renameColumns,
	keep = keepColumns;
 
   * Preview CAS table *;
   table.fetch / table = casTbl, to = 5;
quit;

The results above show a preview of the warranty_claims CAS table.

One-way frequency tables

To create more advanced one-way frequency tables, use the freqTab.freqTab CAS action. In the freqTab action, use the table parameter to specify the CAS table and the tabulate parameter to specify the column, or columns, to analyze. The tabulate parameter is extremely flexible and provides a variety of ways to analyze your data. In this example, I'll specify the warranty_claims CAS table and the Campaign_Type and Make columns as a list in the tabulate parameter.

proc cas;
   * CAS table reference *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * One-way frequency tables *;
   freqTab.freqTab / 
       table = casTbl,
       tabulate = {'Campaign_Type','Make'};
quit;

Results

The results above show the freqTab action generates a variety additional information compared to the freq action. Information including the number of observations used, variable level and timing. The freqTab action frequency tables contain the expected total frequency of each value; moreover, it also includes the total percentage, cumulative frequency and cumulative percent.

Two-way crosstabulation tables

Instead of producing one-way frequency tables, you can also create two-way crosstabulation tables. One way is to continue to add elements in the list in the tabulate parameter. Here a one-way frequency table will be created for Campaign_Type and Make as seen before. Then, I'll add a dictionary in the list. The vars key specifies the Make column, and the cross key specifies the Campaign_Type and Model_Year columns. The columns in the cross key are paired with those specified in vars. This example will produce a two-way crosstabulation between Make by Campaign_Type and Make by Model_Year.

proc cas;
   * CAS table reference *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * One-way frequency and two-way crosstabulation tables *;
   freqTab.freqTab / 
      table = casTbl,
      tabulate = {
 		'Campaign_Type',
		'Make',
              	{vars = 'Make', cross = {'Campaign_Type', 'Model_Year'}}
      };
quit;

Partial results

The results above show the freqTab action returns the one-way frequency tables for Campaign_Type and Make as shown earlier and displays the two-way crosstabulation between Make by Campaign_Type and Make by Model_Year.

While this is great, what if I want to avoid the total row and display the Make for each row in the two-way crosstabulation as a single table?

Two-way crosstabulation as a single table

To present the results as a single table and remove the total row, add the tabDisplay parameter with the value list. In the code, I'll remove the one-way frequencies and add the tabDisplay parameter.

proc cas;
   * CAS table reference *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Two-way crosstabulation as a single table *;
   freqTab.freqTab / 
       table = casTbl,
       tabulate = {
              	{vars = 'Make', cross = {'Campaign_Type', 'Model_Year'}}
       }, 
       tabDisplay='list';
quit;

Partial results

The results show each two-way crosstabulation as a single table and the total row is removed.

Next, what if you want to create a three-way crosstabulation table? Well this is a bit tricky.

Three-way crosstabulation

To produce a three-way crosstabulation, specify the columns as a list within the vars parameter, with the tabDisplay parameter equal to list. If you do not specify the tabDisplay parameter, the freqTab action will return three two-way crosstabulation tables. Each table will be created for each distinct group of the first column specified in the list. This example will create a crosstabulation of Model_Year by Campaign_Type by Make.

proc cas;
   * CAS table reference *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Three-way crosstabulation table *;
   freqTab.freqTab / 
       table = casTbl,
       tabulate = {
		{vars={'Model_Year','Campaign_Type','Make'}}
       }, 
       tabDisplay='list';
quit;

Partial results

Notice in the results above a three-way crosstabulation table was created. The order of the columns is created by the order of the columns specified in the list.

Summary

The freqTab.freqTab CAS action provides a variety of ways to create frequency and crosstabulation tables in the distributed CAS server. There are a variety of parameters you can add to modify it to meet your objectives. We've only scratched the surface!

Additional resources

freqTab CAS action
SAS® Cloud Analytic Services: CASL Programmer’s Guide 
CAS Action! - a series on fundamentals
Getting Started with Python Integration to SAS® Viya® - Index
SAS® Cloud Analytic Services: Fundamentals

CAS-Action! Advanced Frequency Tables - Part 4 was published on SAS Users.

12月 162022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. In my previous part 1 and part 2 posts I reviewed how to use the simple.freq CAS action to generate frequency distributions for one or more columns and how to save the results. In this post I will show you how to group the results of the freq action.

In this example, I will use the CAS language (CASL) to execute the freq CAS action. Be aware, instead of using CASL, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax in other languages.

Load the demonstration data into memory

I'll start by executing the loadTable action to load the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory. By default the Samples caslib should be available in your SAS Viya environment. I'll load the table to the Casuser caslib and then I'll clean up the CAS table by renaming and dropping columns to make the table easier to use. For more information how to rename columns check out my previous post. Lastly I'll execute the fetch action to preview 5 rows.

proc cas;
   * Specify the input/output CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Load the CAS table into memory *;
    table.loadtable / 
        path = "WARRANTY_CLAIMS_0117.sashdat", caslib = "samples",
        casOut = casTbl + {replace=TRUE};
 
* Rename columns with the labels. Spaces replaced with underscores *;
 
   *Store the results of the columnInfo action in a dictionary *;
   table.columnInfo result=cr / table = casTbl;
 
   * Loop over the columnInfo result table and create a list of dictionaries *;
   listElementCounter = 0;
   do columnMetadata over cr.ColumnInfo;
	listElementCounter = listElementCounter + 1;
	convertColLabel = tranwrd(columnMetadata['Label'],' ','_');
	renameColumns[listElementCounter] = {name = columnMetadata['Column'], rename = convertColLabel, label=""};
   end;
 
   * Rename columns *;
   keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                  'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'};
   table.alterTable / 
	name = casTbl['Name'], caslib = casTbl['caslib'], 
	columns=renameColumns,
	keep = keepColumns;
 
   * Preview CAS table *;
   table.fetch / table = casTbl, to = 5;
quit;

The results above show a preview of the warranty_claims CAS table.

Add a grouping column

What if you want a frequency distribution of each Model_Year by Make? You can easily do that with the freq action. The key is adding the groupBy sub parameter when referencing the CAS table. Then use the grouped CAS table in the freq action and specify the column to analyze. In this example, the CAS table is grouped by Model_Year and the freq action specifies the Make column.

proc cas;
   * Reference the CAS table and group by Model_Year *;
   casTbl = {name = "WARRANTY_CLAIMS", 
             caslib = "casuser",
	     groupby = "Model_Year"};
 
   * Model_Year by Make frequency *;
   simple.freq / table = casTbl, input = 'Make';
quit;

 

Partial results

The above results show that the freq action returns a separate table for each distinct Model_Year. While this is great information, what if I want a single table with the Model_Year by Make?

Saving the results as a CAS table

One option is saving the results as a CAS table. This works similarly to my previous post CAS-Action! Saving Frequency Tables - Part 2. Simply add the casOut parameter to the freq action. I'll add a label to the new CAS table to give it a description and then preview the new CAS table with the fetch action.

proc cas;
   * Reference the CAS table and group by Model_Year *;
   casTbl = {name = "WARRANTY_CLAIMS", 
             caslib = "casuser",
	     groupby = "Model_Year"};
 
   * Specify the output CAS table information *;
   outputTbl = {name = "yearByMake", caslib = "casuser"};
 
   * Get a frequency of Model_Year by Make and create a CAS table *;
   simple.freq / 
	table = casTbl, 
	input = 'Make',
	casOut = outputTbl || {label = "Year by Make frequency table"};
 
   * Preview the CAS table *;
   table.fetch / table = outputTbl;
quit;

The results above show the freq action with the casOut parameter returns information about the newly created CAS table, and the fetch action returns a preview of the new CAS table. Notice the analysis is grouped by Model_Year and is consolidated into a single table.

Saving the results as a SAS data set

Instead of saving the results back to the CAS server, you can save them as a SAS data set. I did an example in my previous post CAS-Action! Saving Frequency Tables - Part 2. However, when you save the results of an action summarized by groups, you need to combine each individual group into a single result table. To do that you need to use the COMBINE_TABLES function on the dictionary returned from the CAS server. The COMBINE_TABLES function will combine each individual table in a dictionary and return a single result table. Then you can save the new table by using the SAVERESULT statement. Lastly, I'll view the SAS data set using the PRINT procedure.

proc cas;
   * Reference the CAS table and group by Model_Year *;
   casTbl = {name = "WARRANTY_CLAIMS", 
             caslib = "casuser",
	     groupby = "Model_Year"};
 
   * Specify the output CAS table information *;
   outputTbl = {name = "yearByMake", caslib = "casuser"};
 
   * Get a frequency of Model_Year by Make and store the results in a dictionary *;
   simple.freq result=freq_cr / 
	table = casTbl, 
	input = 'Make';
 
   * Combine all the tables in the dictionary and create a result table *;
   freqTbl = combine_tables(freq_cr);
 
   * Save the result table as a SAS data set *;
   saveresult freqTbl dataout=work.yearByMake;
quit;
 
* Preview the SAS data set *;
proc print data=work.yearByMake;
run;

The results above show the new SAS data set. Once you save the results of the CAS server as a SAS data set, you can use familiar SAS knowledge to continue processing the data on the compute server.

Plot the results of the freq action

Now, let's explore and visualize this data. You can use the SGPLOT procedure to visualize the summarized results from the CAS server that you saved as a SAS data set.

title height=14pt justify=left color=charcoal "Total Number of Warranty Claims by Model Year and Car Make";
title2 "";
proc sgplot data=work.yearByMake
			noborder;
	vline Model_Year / 
			group = CharVar 
			Response=Frequency
			markers;
	format Frequency comma16.;
	keylegend / position=topleft title='Car Makes';
	label Frequency='Warranty Claims';
	xaxis display=(nolabel);
run;

The visualization above shows the Zeus car Make is the primary cause of warranty claims in all years, and in the years 2016, 2017 and 2018 had a huge increase in warranty claims.

Summary

Using the groupBy sub parameter when referencing a CAS table enables you to easily group the results of a CAS action. When using the groupBy parameter:

  • the action will return separate result tables for each distinct grouped value
  • the casOut parameter in the action to creates a single CAS table with all the groups
  • the COMBINE_TABLES function combines each distinct group by result table in the dictionary to create a single result table, and then saves that as a SAS data set

Additional resources

freq action
COMBINE_TABLES function
SAVERESULT statement
Plotting a Cloud Analytic Services (CAS) In-Memory Table
SAS® Cloud Analytic Services: CASL Programmer’s Guide 
SAS® Cloud Analytic Services: Fundamentals
CAS Action! - a series on fundamentals
Getting Started with Python Integration to SAS® Viya® - Index

CAS-Action! Grouping Frequency Tables - Part 3 was published on SAS Users.

12月 122022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. In my previous post CAS-Action! Simple Frequency Tables - Part 1, I reviewed how to use the simple.freq CAS action to generate frequency distributions for one or more columns using the distributed CAS server. In this post I will show you how to save the results of the freq action as a SAS data set or a distributed CAS table.

In this example, I will use the CAS language (CASL) to execute the freq CAS action. Be aware, instead of using CASL, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax in other languages.

Load the demonstration data into memory

I'll start by executing the loadTable action to load the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory. By default the Samples caslib should be available in your SAS Viya environment. I'll load the table to the Casuser caslib and then I'll clean up the CAS table by renaming and dropping columns to make the table easier to use. For more information how to rename columns check out my previous post. Lastly I'll execute the fetch action to preview 5 rows.

proc cas;
   * Specify the input/output CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Load the CAS table into memory *;
    table.loadtable / 
        path = "WARRANTY_CLAIMS_0117.sashdat", caslib = "samples",
        casOut = casTbl + {replace=TRUE};
 
* Rename columns with the labels. Spaces replaced with underscores *;
 
   *Store the results of the columnInfo action in a dictionary *;
   table.columnInfo result=cr / table = casTbl;
 
   * Loop over the columnInfo result table and create a list of dictionaries *;
   listElementCounter = 0;
   do columnMetadata over cr.ColumnInfo;
	listElementCounter = listElementCounter + 1;
	convertColLabel = tranwrd(columnMetadata['Label'],' ','_');
	renameColumns[listElementCounter] = {name = columnMetadata['Column'], rename = convertColLabel, label=""};
   end;
 
   * Rename columns *;
   keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                  'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'};
   table.alterTable / 
	name = casTbl['Name'], caslib = casTbl['caslib'], 
	columns=renameColumns,
	keep = keepColumns;
 
   * Preview CAS table *;
   table.fetch / table = casTbl, to = 5;
quit;

The results above show a preview of the warranty_claims CAS table.

One Way Frequency for Multiple Columns

Next, I'll execute the freq action to generate a frequency distribution for multiple columns.

proc cas;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
   colNames = {'Model_Year', 
               'Vehicle_Assembly_Plant', 
	       {name = 'Claim_Repair_Start_Date', format = 'yyq.'}
   };
   simple.freq / table= casTbl, inputs = colNames;
quit;

The freq CAS action returns the frequency distribution of each column in a single result. While this is great,  what if you want to create a visualization with the data? Or continue processing the summarized data? How do you save this as a table? Well, you have a few options.

Save the results as a SAS data set

First, you can save the results of a CAS action as a SAS data set. The idea here is the CAS action will process the data in the distributed CAS server, and then the CAS server returns smaller, summarized results to the client (SAS Studio). The summarized results can then be saved as a SAS data set.

To save the results of a CAS action simply add the result option after the action with a variable name. The results of an action return a dictionary to the client and store it in the specified variable. For example, to save the results of the freq action as a SAS data set complete the following steps:

  1. Execute the same CASL code from above, but this time specify the result option with a variable name to store the results of the freq action. Here i'll save the results in the variable freq_cr.
  2. Use the DESCRIBE statement to view the structure and data type of the CASL variable freq_cr in the log (not required).
  3. Use the SAVERESULT statement to save the CAS action result table from the dictionary freq_cr as a SAS data set named warranty_freq. To do this specify the key Frequency that is stored in the dictionary freq_cr to obtain the result table.
proc cas;
   * Reference the CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Specify the columns to analyze *;
   colNames = {'Model_Year', 
               'Vehicle_Assembly_Plant', 
               {name = 'Claim_Repair_Start_Date', format = 'yyq.'}
   };
   * 1. Analyze the CAS table and store the results *;
   simple.freq result = freq_cr / table= casTbl, inputs = colNames;
 
   * 2. View the dictionary in the log *;
   describe freq_cr;
 
  * 3. Save the result table as a SAS data set *;
   saveresult freq_cr['Frequency'] dataout=work.warranty_freq;
quit;

SAS Log

In the log, the results of the DESCRIBE statement shows the variable freq_cr is a dictionary with one entry. It contains the key Frequency and the value is a result table. The table contains 22 rows and 6 columns. The NOTE in the log shows the SAVERESULT statement saved the result table from the dictionary as a SAS data set named warranty_freq in the work library.

Once the summarized results are stored in a SAS library, use your traditional SAS programming knowledge to process the SAS table. For example, now I can visualize the summarized data using the SGPLOT procedure.

* Plot the SAS data set *;
title justify=left height=16pt "Total Warranty Claims by Year";
proc sgplot data=work.warranty_freq noborder;
	where Column = 'Model_Year';
	vbar Charvar / 
		response = Frequency
		nooutline;
	xaxis display=(nolabel);
	label Frequency = 'Total Claims';
	format Frequency comma16.;
quit;

Save the Results as a CAS Table

Instead of saving the summarized results as a SAS data set, you can create a new CAS table on the CAS server. To do that all you need is to add the casOut parameter in the action. Here I'll save the results of the freq CAS action to a CAS table named warranty_freq in the Casuser caslib, and I will give the table a descriptive label.

proc cas;
   * Reference the CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Specify the columns to analyze *;
   colNames = {'Model_Year', 
               'Vehicle_Assembly_Plant', 
               {name = 'Claim_Repair_Start_Date', format = 'yyq.'}
   };
 
   * Analyze the CAS table and create a new CAS table *;
   simple.freq / 
	table= casTbl, 
	inputs = colNames,
	casOut = {
		name = 'warranty_freq',
		caslib = 'casuser',
		label = 'Frequency analysis by year, assembly plant and repair date by quarter'
	};
quit;

The results above show the freq action returned information about the newly created CAS table. Once you have a CAS table in the distributed CAS server you can continue working with it using CAS, or you can visualize the data like we did before using SGPLOT. The key concept here is the SGPLOT procedure does not visualize data on the CAS server. The SGPLOT procedure returns the entire CAS table back to SAS (compute server) as a SAS data set, then the visualization occurs on the client. This means if the CAS table is large, an error or slow processing might occur. However, in our scenario we created a smaller summarized CAS table, so sending 22 rows back to the client (compute server) isn't going to be an issue.

* Make a library reference to a Caslib *;
libname casuser cas caslib='casuser';
 
 
* Plot the SAS data set *;
title justify=left height=16pt "Total Warranty Claims by Year";
proc sgplot data=casuser.warranty_freq noborder;
	where _Column_ = 'Model_Year';
	vbar _Charvar_ / 
		response = _Frequency_
		nooutline;
	xaxis display=(nolabel);
	label _Frequency_ = 'Total Claims';
	format _Frequency_ comma16.;
quit;

Summary

Using the freq CAS action enables you to generate a frequency distribution for one or more columns and enables you to save the results as a SAS data set or a CAS table. They keys to this process are:

  • CAS actions execute on the distributed CAS server and return summarized results back to the client as a dictionary. You can store the dictionary using the result option.
  • Using dictionary manipulation techniques and the SAVERESULT statement you can save the summarized result table from the dictionary as a SAS data set. Once you have the SAS data set you can use all of your familiar SAS programming knowledge on the traditional compute server.
  • Using the casOut parameter in a CAS action enables you to save the summarized results in the distributed CAS server.
  • The SGPLOT procedure does not execute in CAS. If you specify a CAS table in the SGPLOT procedure, the entire CAS table will be sent back to SAS compute server for processing. This can cause an error or slow processing on large tables.
  • Best practice is to summarize large data in the CAS server, and then work with the summarized results on the compute server.

Additional resources

freq action
DESCRIBE statement
SAVERESULT statement
Plotting a Cloud Analytic Services (CAS) In-Memory Table
SAS® Cloud Analytic Services: CASL Programmer’s Guide 
SAS® Cloud Analytic Services: Fundamentals
CAS Action! - a series on fundamentals
Getting Started with Python Integration to SAS® Viya® - Index

 

CAS-Action! Saving Frequency Tables - Part 2 was published on SAS Users.

12月 072022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. If you'd like to start by learning more about the distributed CAS server and CAS actions, please see CAS Actions and Action Sets - a brief intro. Otherwise, let's learn how to generate frequency distributions for one or more columns using the simple.freq CAS action.

In this example, I will use the CAS language (CASL) to execute the freq CAS action. Be aware, instead of using CASL, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax in other languages.

Load the demonstration data into memory

I'll start by executing the loadTable action to load the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory. By default the Samples caslib should be available in your SAS Viya environment. I'll load the table to the Casuser caslib and then I'll clean up the CAS table by renaming and dropping columns to make the table easier to use. For more information how to rename columns check out my previous post. Lastly I'll execute the fetch action to preview 5 rows.

proc cas;
   * Specify the input/output CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Load the CAS table into memory *;
    table.loadtable / 
        path = "WARRANTY_CLAIMS_0117.sashdat", caslib = "samples",
        casOut = casTbl + {replace=TRUE};
 
* Rename columns with the labels. Spaces replaced with underscores *;
 
   *Store the results of the columnInfo action in a dictionary *;
   table.columnInfo result=cr / table = casTbl;
 
   * Loop over the columnInfo result table and create a list of dictionaries *;
   listElementCounter = 0;
   do columnMetadata over cr.ColumnInfo;
	listElementCounter = listElementCounter + 1;
	convertColLabel = tranwrd(columnMetadata['Label'],' ','_');
	renameColumns[listElementCounter] = {name = columnMetadata['Column'], rename = convertColLabel, label=""};
   end;
 
   * Rename columns *;
   keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                  'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'};
   table.alterTable / 
	name = casTbl['Name'], caslib = casTbl['caslib'], 
	columns=renameColumns,
	keep = keepColumns;
 
   * Preview CAS table *;
   table.fetch / table = casTbl, to = 5;
quit;

The results above show a preview of the warranty_claims CAS table.

One-way frequency table for a single column

To create a simple one-way frequency for a single column use the simple.freq CAS action. In the freq action, use the table parameter to specify the CAS table and the inputs parameter to specify the column to analyze. Here I'm using the warranty_claims CAS table and analyzing the Make column.

proc cas;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
   simple.freq / table= casTbl, inputs = 'Make';
quit;

The freq action generates a simple one-way frequency table in the distributed CAS server and returns the results to the client. The results of the freq action include:

  • the column that was analyzed in the Column column
  • the distinct values for that column are shown in the Character Value column
  • if a format is associated with that column it appears in the Formatted Value column; if no format exists, you see the same values
  • the Frequency column represents the number of times that value occurs in the column

In the results above, we see the Zeus car make has the most warranty claims.

One-way frequency for multiple columns

To specify multiples columns in the freq action, add a list of columns to the inputs parameter. Here, I'll create a variable named colNames to store a list. In the list, I'll specify the Model_Year, Vehicle_Assembly_Plant and Engine_Model columns, and then use the variable in the inputs parameter.

proc cas;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
   colNames = {'Model_Year', 'Vehicle_Assembly_Plant', 'Engine_Model'};
   simple.freq / table= casTbl, inputs = colNames;
quit;

In the results above, we see the action returns a single result table with all three columns summarized. The column that was analyzed is shown in the Column column.

Apply a SAS format in the freq action

What if you want to apply a SAS date format to a column during analysis? For example, the Claim_Repair_Start_Date column contains a SAS date value with the DATE9 format. Instead of the detailed DATE9 format, what if I want to see the total number of repairs by year and quarter? Or by year? Or by year and month? You can easily apply a SAS format when using CAS actions.

Let's start by executing the freq CAS action on the Claim_Repair_Start_Date column.

proc cas;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
   simple.freq / table= casTbl, inputs = 'Claim_Repair_Start_Date';
quit;

The results above show the action created a one way frequency of Claim_Repair_Start_Date using the DATE9 format and stored it in the CAS table. The Numeric Value column shows the raw SAS date values and the Formatted Value column shows the formatted dates.

Now, this analysis is too detailed. I don't want to see repairs by start date. Instead, I'll apply the YYQ format to summarize the dates by year and quarter. You can apply the format within the inputs parameter. In the inputs parameter specify a list of dictionaries. Here I'll use a single dictionary in the list and apply the YYQ format to the Claim_Repair_Start_Date column.

proc cas;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
   simple.freq / 
	table= casTbl, 
	inputs = {
		{name = 'Claim_Repair_Start_Date', format = 'yyq.'}
	};
quit;

The results above display the frequency by year and quarter. The ability to apply a SAS format during execution enables us to quickly summarize data in a variety of ways.

Create a calculated column in the freq action

Lastly, you can also create calculated columns within an action for ad-hoc analysis. Here I'll create a new column named Make_Platform that concatenates the Make and Platform columns by specifying an expression in the variable calculateMakePlatform. Then I'll add the calculation to the computedVarsProgram parameter in my CAS table reference. Finally, I'll add the new column name to the inputs parameter in the freq action. For more information about creating calculated columns in a CAS table, check out my previous post.

proc cas;
   calculateMakePlatform = 'Make_Platform = catx("-",Make,Platform)';
   casTbl = {name = "WARRANTY_CLAIMS", 
             caslib = "casuser",
             computedVarsProgram = calculateMakePlatform};
   simple.freq / 
	table= casTbl,
	inputs = 'Make_Platform';
quit;

 

The results above show the Zeus-XE has the highest amount of warranty claims.

While viewing the results of the analysis is great, how can a work with these results? Maybe I want to create a visualization? What about creating another CAS table or SAS data set with these results? What about an Excel report? How can we do this? Well, stay tuned for part 2!

Summary

Using the freq CAS action enables you to generate a frequency distribution for one or more columns, apply SAS formats during analysis, and even create calculated columns. CAS actions are optimized to run in the distributed CAS server, are flexible, and can be executed in a variety of languages like Python and R!

Additional resources

freq action
SAS® Cloud Analytic Services: CASL Programmer’s Guide 
CAS Action! - a series on fundamentals
Getting Started with Python Integration to SAS® Viya® - Index
SAS® Cloud Analytic Services: Fundamentals

CAS-Action! Simple Frequency Tables - Part 1 was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

9月 282021
 

SQL is an important language for any programmer working with data. In SAS Cloud Analytic Services (CAS) you can execute SQL queries using the fedSQL.execDirect CAS action!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the

proc cas;
    fedSQL.execDirect / 
        query="select Make,
                      Model,
                      MSRP,
                      mean(MPG_City,MPG_Highway) as MPG_Avg
               from casuser.cars
               where Make='Toyota'
               order by MPG_Avg desc";
quit;

And the results:

The execDirect action executes the query in the distributed CAS environment and returns the expected results.

Using a SOURCE block

While the query we executed was simple, this is not always the case. Adding a complicated query as a string can make writing the query difficult.

Instead of specifying the query as a string in the CAS procedure, use the SOURCE statement to embed text in a variable. In the following example, I'll execute the same query as before. However, this time I'll nest the query inside a SOURCE block by specifying the SOURCE statement and name the variable MPG_toyota. Then I'll add the query inside the SOURCE block and use the ENDSOURCE statement to end the block.

proc cas;
    source MPG_toyota;
        select Make,
               Model,
               MSRP,
               mean(MPG_City,MPG_Highway) as MPG_Avg
        from casuser.cars
        where Make='Toyota'
        order by MPG_Avg desc;
    endsource;
 
    fedSQL.execDirect / query=MPG_toyota;
quit;

After the SOURCE block is complete, you can reference the variable as the value to the query parameter in the execDirect action.

And the results:

The results returned are the same, but using a SOURCE block makes the code easier to write and maintain.

Summary

In SAS Viya, FedSQL provides a scalable, threaded, high-performance way to query data and create new CAS tables from existing tables in the CAS server. In this example we saw two distinct ways to run SQL code on SAS Viya. This is only the beginning. See the resources below for more details on PROC FedSQL.

Resources

SAS® Viya®: FedSQL Programming for SAS® Cloud Analytic Services
FEDSQL Procedure
execDirect CAS Action
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing SQL in SAS Viya was published on SAS Users.

9月 232021
 

In my previous post CAS-Action! Simply Distinct - Part 1 I reviewed using the simple.distinct CAS action to explore distinct and missing values in a distributed CAS table.

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

And now, back to the distinct action. What if we want to do more? Maybe you want to create a CSV file that documents the percentage of distinct values in each column? Let's explore some possibilities.

To complete the task I'll break it down into four steps.

Step 1 - Find the number of rows in the CAS table

To find the number of rows in a CAS table use the simple.numRows CAS action. Let's execute the numRows action and store the results in a variable. I'll also PRINT and DESCRIBE the results to take a closer look at the output.

proc cas;
    simple.numRows result=n / table={name="cars",caslib="casuser"};
    describe n;
    print n;
...

And the results:
Output of the simple.numRows CAS action

The results of the DESCRIBE statement show the output of the action is a dictionary with a key named numRows and an integer as the value. The PRINT statement shows the value of the dictionary, numRows=428.

Now that we have the total number of rows, we can use that number in our calculation.

Step 2 - Find the number of distinct values in each column

Next, let's execute the distinct action and store the results in a variable named d. Then execute the PRINT statement to confirm the results.

...
    simple.distinct result=d /
            table={name="cars",caslib="casuser"};
    print d;
...

And the resluts:
Results of the simple.distinct CAS action
The results of the distinct action are as expected. Each column with the number of distinct values.

Step 3 - Create a calculated column that computes the percentage of distinct values

Now that we have the number of distinct values in each column, and the total number of rows in the CARS CAS table, we can calculate the total percent of distinct values in each column.

Consider the code:

...
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
    print pctDistinct;
...

To add a calculated column to a result table use the compute operator. In the first argument, specify column metadata inside an array (column name, label and format). In the second argument, specify the expression. My expression nDistinct/n.numRows divides the distinct values in each column by the total number of rows in the CARS table.

After the compute operator, select specific rows and columns from the result table using bracket notation. Here I'll select all rows, and only the columns Column, nDistinct and PctDistinct.

Lastly, I used the PRINT statement to confirm the results.

New calculated result table

In the output we can see the new result table with the computed column.

Step 4 - Save the results table as a CSV file

Lastly, let's put it all together!

I'll add the code from the pervious steps, then save the table as a CSV file using the SAVERESULTS statement with the CSV= option.

%let outpath=/*specify output file location*/;
 
proc cas;
* Specify the CAS table *;
    casTbl={name="cars", caslib="casuser"};
 
* Store the number of rows in the CAS table *;
    simple.numRows result=n / table=casTbl;
 
* Store the number of distinct values in each column *;
    simple.distinct result=d / table=casTbl;
 
* Calculate the percentage of distinct values in each column *;
    pctDistinct=d.Distinct.compute({"PctDistinct","Percent Distinct",percent7.2}, nDistinct/n.numRows)
                          [ , {"Column","NDistinct","PctDistinct"} ];
 
* Save the result table as a CSV file *;
    saveresult pctDistinct csv="&outpath/pctDistinctCars.csv";
quit;

In the CSV= option I specified the outpath macro variable that contains the location of output folder, and add the name of the CSV file.

After executing the code the log indicates everything ran successfully, and a CSV file was created in the specified location. Next I'll find and open the CSV file.

My CSV file is located in my outfiles folder:

Then double click on the file to open it in SAS Studio:

Summary

The distinct action is a flexible and easy way to explore your data. It allows you to quickly explore your distributed CAS tables, then process and save the results in a variety of formats to fit your needs.

Additional Resources

distinct CAS action
CAS-Action! Simply Distinct - Part 1
CASL Result Tables
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Simply Distinct - Part 2 was published on SAS Users.

9月 152021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dive into exploring your data by viewing the number of distinct and missing values that exist in each column using the simple.distinct CAS action.

In this example, I will use the CAS procedure to execute the distinct action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax from other languages.

Determine the Number of Distinct and Missing Values in a CAS Table

To begin, let's use the simple.distinct CAS action on the CARS in-memory table to view the action's default behavior.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"};
quit;

In the preceeding code, I specify the CAS procedure, the action, then reference the in-memory table. The results of the call are displayed below.

The results allow us to quickly explore the CAS table and see the number of distinct and missing values. That's great, but what if you only want to see specific columns?

Specify the Columns in the Distinct Action

Sometimes your CAS tables contain hundreds of columns, but you are only interested in a select few. With the distinct action, you can specify a subset of columns using the inputs parameter. Here I'll specify the Make, Origin and Type columns.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        inputs={"Make","Origin","Type"};
quit;

After executing the code the results return the information for only the Make, Origin and Type columns.

Next, let's explore what we can do with the results.

Create a CAS Table with the Results

Some actions allow you to create a CAS table with the results. You might want to do this for a variety of reasons like use the new CAS table in a SAS Visual Analytics dashboard or in a data visualization procedure like SGPLOT.

To create a CAS table with the distinct action result, add the casOut parameter and specify new CAS table information, like name and caslib.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        casOut={name="distinctCars", caslib="casuser"};
quit;

After executing the code, the action returns information about the name and caslib of the new CAS table, and the number of rows and columns.

Visualize the Number of Distinct Values in Every Column

Lastly, what if you want to create a data visualization to better explore the table? Maybe you want to visualize the number of distinct values for each column? This task can be accomplished with variety of methods. However, since I know my newly created distinctCars CAS table has only 15 rows, I'll reference the CAS table directly using SGPLOT procedure.

This method works as long as the LIBNAME statement references your caslib correctly. I recommend this method when you know the CAS table is a manageable size. This is important because the CAS server does not execute the SGPLOT procedure on a distributed CAS table. The CAS server instead transfers the entire CAS table back to the client for processing.

To begin, the following LIBNAME statement will reference the casuser caslib.

libname casuser cas caslib="casuser";

Once the LIBNAME statement is correct, all you need to do is specify the CAS table in the DATA option of the SGPLOT procedure.

title justify=left height=14pt "Number of Distinct Values for Each Column in the CARS Table";
proc sgplot data=casuser.distinctCars
            noborder nowall;
    vbar Column / 
        response=NDistinct
        categoryorder=respdesc
        nooutline
        fillattrs=(color=cx0379cd);
    yaxis display=(NOLABEL);
    xaxis display=(NOLABEL);
quit;

The results show a bar chart with the number of distinct values for each column.

Summary

The simple.distinct CAS action is an easy way to explore a distributed CAS table. With one simple action, you can easily see how many distinct values are in each column, and the number of missing rows!

In Part 2 of this post, I'll further explore the simple.distinct CAS action and offer more ideas on how to interpret and use the results.

Additional Resources

distinct CAS action
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code

CAS-Action! Simply Distinct - Part 1 was published on SAS Users.

8月 282021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dig in with the table.columnInfo action.

In this post we'll l look at exploring the column attributes of a CAS table. Knowing the column names, data types, and any additional formats or labels associated with the columns makes it easier to work with the data. One way to see this type of information on a CAS table is to use the table.columnInfo CAS action!

In this example, I will use the CAS procedure to execute the columnInfo action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language.

Return CAS table Column Information

To view the column information of your CAS table, use the columnInfo CAS action with the table parameter. That's it! Refer to the code below.

proc cas;
    table.columnInfo / table={name="cars", caslib="casuser"};
quit;

The results would appear similar to the following:

table.columnInfo CAS action results

and return a variety of information about each column in the CAS table:

  • the column name
  • a label if one has been applied
  • the Id, which indicates the position of the column in the table
  • the data type of the column
  • the column length
  • the format, formatted length, width and decimal

Create a CSV Data Dictionary Using the ColumnInfo Action

What if instead, you want to create a data dictionary documenting the CAS table? With the columnInfo action you can export the results to a CSV file!

I'll use the columnInfo action again, but this time I'll store the results in a variable. The variable is a dictionary, so I need to reference the dictionary ci, then the columnInfo key to access the table. Next, I'll create two computed columns using the compute operator. The first column contains the name of the CAS table, and the second, the caslib of the table. I'll print the new table to confirm the results.

proc cas;
    table.columnInfo result=ci / table={name="cars", caslib="casuser"};
 
    ciTbl = ci.ColumnInfo.compute({"TableName","Table Name"}, "cars")
                         .compute({"Caslib"}, "casuser");
 
    print ciTbl;
    ...

The code produces the folloiwng result:

The new result table documents the CAS table columns, table name and caslib.

Lastly, I'll export the result table to a CSV file. First, I'll specify the folder location using the outpath variable. Then use the SAVERESULT statement to save the result table as a CSV file named carsDataDictionary.csv.

    ...
    outpath="specify a folder location";
    saveresult ciTbl csv=outpath || "carsDataDictionary.csv";
quit;

 

After I execute the CAS procedure I can find and open the CSV file to view the documented CAS table!

Summary

The table.columnInfo CAS action is a simple and easy way to show column information about your distributed CAS table. Using the results of the action allow you to create a data dictionary in a variety of formats.

Additional resources

table.columnInfo CAS action
CAS Action! - a series on fundamentals
SAS® Cloud Analytic Services: Fundamentals
CASL Result Tables
SAVERESULT Statement
Code

CAS-Action! Show me the ColumnInfo! was published on SAS Users.