Developer

9月 142022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on how to summarize columns.

Load and explore data

Let's first load some data. In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and preview 5 rows using the SWAT head method. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.head()

The water_cluster CAS table contains daily water usage for a variety properties. The Daily_W_C_M3 column displays the water used per location in cubic meters (m3) .

Simple summarizations

Next, let's execute methods for simple summarizations on CAS table columns. If you've used Pandas before these techniques will be very familiar.

I'll start by getting the total water consumption for all the data using the sum method in the SWAT package on the CAS table column Daily_W_C_M3.

(tbl
 .Daily_W_C_M3
 .sum()
)
# and the results
401407.88800000015

Next, I'll view the maximum and minimum water consumption value in the data using the max and min methods in the SWAT package.

(tbl
 .Daily_W_C_M3
 .max()
)
# and the results
11910.0
 
(tbl
 .Daily_W_C_M3
 .min()
)
# and the results
0.0

Lastly, I'll find the average water consumption value in the CAS table using the mean method.

(tbl
 .Daily_W_C_M3
 .mean()
)
# and the results
8.591778424657537

Instead of manually specifying each method to view summary information, I could of used the summary CAS action or the describe method from the SWAT package. For more information check out my previous post Descriptive Statistics.

Find the property with the max water consumption

What if I want to find the property or properties with the max water consumption? One technique is to find the max water consumption value, then filter the CAS table for the row (or rows) with that value. Here I will store the max water consumption value in the variable maxWaterConsumption. Then I'll use the variable in the query method to filter the CAS table. For more information on filtering CAS tables, check out my previous blog post Filtering CAS Tables.

## Store the max water consumption value
maxWaterConsumption = (tbl
                      .Daily_W_C_M3
                      .max()
)
 
## Filter the CAS table for the property with the max water usage
(tbl
 .query(f"Daily_W_C_M3 = {maxWaterConsumption }")
 .head()
)

The results show that the property address 1660 S DAIRY ASHFORD ST had the largest daily water consumption rate.

Find the top 10 daily water consumption values and properties

Lastly, to find the top 10 daily water consumption properties and values I can use the nlargest method in the SWAT package. This method is similar to the Pandas nlargest method.  One difference is the SWAT nlargest method does not implement the keep parameter.

I'll specify the SWAT nlargest method and store the results in the variable df_top10. Then I'll display the results.

df_top10 = tbl.nlargest(10, 'Daily_W_C_M3')
display(df_top10)

The CAS server will execute the request and return a SASDataFrame to the client. The SASDataFrame shows that a single property contains the top 10 largest water consumption values.

Remember, a SASDataFrame is a subclass of a Pandas DataFrame. As a result, you can work with them as you normally would a Pandas DataFrame. To learn more about working with CAS objects check out my previous post Working with CAS Actions and CASResults Objects.

Summary

The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. This enables you to utilize the massively parallel processing power of the CAS server in SAS Viya using Python. Here we learned about using familiar Pandas methods in the SWAT package like sum, min, mean, max and nlargest.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 9 - Summarize Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

6月 302022
 

Welcome to the seventh installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to learn how to filter CAS tables.

Load and explore data

First, I'll load the cars.csv file into memory from the sassfotware GitHub page. I'll start by using my CAS connection object conn, followed by the upload_file method from the SWAT package. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. The data in this example is small, for training purposes. Processing data in the CAS server is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv',
      casout={'name':'cars', 'caslib':'casuser'})

The response from the command is minimal, but there's much going on. Specifically:

  • Cloud Analytic Services renders the uploaded file available as table CARS in caslib CASUSER(Peter).
  • The table CARS is created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.

Next, I'll view the type and value of the tbl variable to confirm it's a CASTable object that references the CARS table in the CAS server.

display(type(tbl), tbl)
# and the command results:
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

The results show tbl is a CASTable object and references the CARS table in the CAS server. Now, when I execute methods on the tbl object, the processing occurs in CAS.

Further, let's preview the CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame as expected.

Finally, I'll execute the SWAT shape attribute to view the number of rows and columns in the CAS table.

tbl.shape
# and the results
(428, 15)

Notice, the CARS CAS table contains 428 rows and 15 columns.

Filter a CAS Table

Now that we have created the table in the caslib and are familiar with the data, let's learn how to filter CAS tables!

Python Technique

First, I'll start with the traditional Pandas filtering technique. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. This will filter the CAS table for rows where the Make of the car is Acura or Toyota,

makeFilter = (tbl['Make']=='Acura') | (tbl['Make']=='Toyota')
tbl[makeFilter].shape
# and the results
(35, 15)

The results show that 35 rows in the CARS CAS table have the Make Toyota or Acura.

Query Method

You can also filter a CAS table using the query method in the SWAT package. Wait a minute, does this look familiar? I mentioned SWAT blends the world of Pandas as SAS. I'll again filter for a Make of Toyota or Acura .

tbl.query("Make = 'Acura' or Make = 'Toyota'").shape
# and the results
(35, 15)

Notice, the results are the same.

isin Method

All right. Sorry, but again I'll show you another familiar method from the SWAT package. Since I am filtering for specific categories, I can also use the SWAT isin method, which works similarly to the Pandas isin method!

tbl[tbl.Make.isin(['Acura', 'Toyota'])].shape
# and the results
(35, 15)

Notice, again the results are the same.

CAS Table WHERE Parameter

Lastly, I'll show you a specific CAS technique, which is very efficient if you need to filter data for multiple methods or actions. To begin, let's display the type and value of the tbl object again.

display(type(tbl), tbl)
# and the results
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

Notice the tbl variable is a CASTable object, which is a reference to a CAS table in the CAS server, not an actual table. In this example, it references the CARS table in the Casuser caslib.

The CASTable object enables you to add a variety of parameters to the object to filter, create columns, group and more. In this example, I'll add the where parameter to the CASTable object to filter for car makes of Toyota and Acura. Then I'll display the value of the CASTable object again.

tbl.where = 'Make = "Acura" or Make = "Toyota"'
display(tbl)

CASTable('CARS', caslib='CASUSER(Peter)', where='Make = "Acura" or Make = "Toyota"')

Notice the CASTable still references the CARS table in the Casuser caslib, but the where parameter has been added to the object. The where parameter is applied when the object is used with an action or method.

Let's check it out. Here I'll specify the tbl variable (the CASTable object) then the shape attribute.

tbl.shape
# and the results
(35, 15)

Notice the results again show 35 cars are either Toyota or Acura.

Now, what is nice about using this method is if I want to continue to use this filter with other methods or actions, all I need to do is use the tbl object. Perhaps you want the value counts of each Make? The SWAT package has the value_counts method available.

tbl.Make.value_counts()
# and the results
Toyota    28
Acura      7

The response displays the counts only for rows where Make is Acura or Toyota. This occurs because the where parameter is applied to the CAS table when the value_counts method is executed.

Delete the WHERE parameter

Lastly, if you want to delete the where parameter from the CASTable object, use the del_params method.

tbl.del_params('where')
display(tbl)
# and the results
CASTable('CARS', caslib='CASUSER(Peter)')

Once the parameter is deleted you can revert back to analyzing the entire table. Here I'll use the shape attribute again to view how many rows are in the original CAS table.

tbl.shape
# and the results
(428, 15)

Summary

SAS Viya offers various filtering options for CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 7 - Filtering CAS Tables was published on SAS Users.

1月 072022
 

Welcome to the sixth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to generate simple descriptive statistics of a CAS table.

Let's begin by confirming the cars table is loaded into memory. With a connection to CAS established, execute the tableInfo action to view available in-memory tables. If necessary, you can execute the following code in SAS Studio to load the sashelp.cars table into memory.

conn.tableinfo(caslib="casuser")

The results show the cars table is loaded into memory and available for processing. Next, reference the cars table in the variable tbl. Then use the print function to show the value of the variable.

tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)
CASTable('cars', caslib='casuser')

The results show that the tbl variable references the cars table in the CAS server.

Preview the CAS Table

First things first. Remember, the SWAT package blends the world of Pandas and CAS into one. So you can begin with the traditional head method to preview the CAS table.

tbl.head()

The SWAT head method returns five rows from the CAS server to the client as expected.

The Describe Method

Next, let's retrieve descriptive statistics of all numeric columns by using the familiar describe method on the CAS table.

tbl.describe()

The SWAT describe method returns the same descriptive statistics as the Pandas describe method. The only difference is that the SWAT version uses the CAS API to convert the describe method into CAS actions behind the scenes to process the data on the distributed CAS server. CAS processes the data and returns summarized results back to the client as a SASDataFrame, which is a subclass of the Pandas DataFrame. You can now work with the results as you would a Pandas DataFrame.

Summary CAS Action

Instead of using the familiar describe method, let's use a CAS action to do something similar. Here I'll use the summary CAS action.

tbl.summary()

Summary CAS Action

The results of the summary action return a CASResults object (Python dictionary) to the client. The CASResults object contains a single key named Summary with a SASDataFrame as the value. The SASDataFrame shows a variety of descriptive statistics.  While the summary action does not return exactly the same statistics as the describe method, it can provide additional insights into your data.

What if we don't want all the statistics for all of the data?

Selecting Columns and Summary Statistics with the Summary Action

Let's add additional parameters to the summary action. I'll add the inputs parameter to specify the columns to analyze in the CAS server.

tbl.summary(inputs = ['MPG_City','MPG_Highway'])

The results show only the MPG_City and MPG_Highway columns were analyzed.

Next, I'll use the subSet parameter to specify the summary statistics to produce. Here I'll obtain the MEAN, MIN and MAX.

tbl.summary(inputs = ['MPG_City','MPG_Highway'],
                       subSet = ['mean','min','max'])

The results processed only the MPG_City and MPG_Highway columns, and returned only the specified summary statistics to the client.

Creating a Calculated Column

Lastly, let's create a calculated column within the summary action. There are a variety of ways to do this. I like to add it as a parameter to the CASTable object. You can do that by specifying the tbl object, then computedVarsProgram parameter. Within computedVarsProgram you can use SAS assignment statements with most SAS functions. Here we will create a new column name MPG_Avg that takes the mean of MPG_City and MPG_Highway. Lastly, add the new column to the inputs parameter.

tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

In the results I see the calculated column and requested summary statistics.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package, or the flexible, highly optimized CAS actions like summary to easily obtain summary statistics of your data in the massively parallel processing CAS engine.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Index
SWAT API Reference
CAS Action Documentation
SAS® Cloud Analytic Services: Fundamentals
SAS Scripting Wrapper for Analytics Transfer (SWAT)
CAS Action! - a series on fundamentals
Execute the following code in SAS Studio to load the sashelp.cars table into memory

Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

10月 092021
 

Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.

In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.

Why use the DATA Step?

It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.

In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column.  Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.

proc cas;
    simple.freq / 
          table={name='cars', caslib='casuser'},
           input='Origin';
quit;

The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.

Execute DATA Step in SAS Viya's CAS Server

One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!

In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.

proc cas;
    source originTables;
        data casuser.Asia
             casuser.Europe
             casuser.USA;
            set casuser.cars;
            if Origin='Asia' then output casuser.Asia;
            else if Origin='Europe' then output casuser.Europe;
            else if Origin='USA' then output casuser.USA;
        run;
    endsource;
 
    dataStep.runCode / code=originTables;
quit;

The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.

DATA Step in CAS has Limitations

Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.

proc cas;
    source originTables;
        data casuser.bad;
            set casuser.cars;
            NewCol=first(Model);
        run;
    endsource;
    dataStep.runCode / code=originTables;
quit;

 

The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).

For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.

Summary

In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.

Additional Resources

runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Executing the SAS DATA Step in SAS Viya was published on SAS Users.

3月 272019
 

PAYG financial services: coming to a bank near you

You walk into your neighborhood bank to see about a mortgage. You and your spouse have your eye on the perfect 3BR, 2BA brick ranch near your child's school, and it won't be on the market long. An hour later, you burst through the front door with a bottle of champagne: "We're qualified!"

Also celebrating is your bank's branch manager. She was skeptical when headquarters analysts equipped branches for "Cloud-based application using SAS" , saying it would speed up loan applications. But your quick, frictionless transaction proved them right.The bank's accountants are happy too. The new pay-as-you-go mode of using SAS software in the cloud means big savings.

The above scenario is possible now through serverless functions, which enable your SAS Viya applications to take input from end users, score the loan application, and return results.

The rest of this post gets into the nitty gritty of serverless functions and SAS Viya, detailing what happens in a bank's computers after a customer applies for a loan. The qualification process starts by running a previously built scoring model to generate a score. You will see how the combination of REST APIs in SAS Viya, analytic models and the restaf library make the task of building the serverless function relatively simple.

The blog titled "SAS REST APIs: a sample application" demonstrated building a SAS Viya application using REST APIs, SAS Visual Analytics and SAS Operational Research. This is typical web applications with application server and SAS Viya running on premise.

If you are one of many users using(or considering) a cloud provider, serverless functions is an useful alternate way to deliver your applications to your users. This eliminates the need to manage the application server associated with your application. Additionally you get zero administration and auto-scaling among other benefits. Many SAS applications that respond quickly to user requests are ideal candidates to be deployed as serverless functions.

The example in this article is available on SAS software’s GitHub site in the viya-apps-serverless-score repository.  If you want to see the end application for frame of reference, see the Using the serverless functions section at the bottom of this article.

Let’s begin with a bit of background on serverless computing and then dig into the details of the application and functions.

Serverless computing explained

The benefits of serverless functions as touted by AWS serverless, Azure and serverless.com:

AWS Lambda

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume– there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

What is serverless computing?

According to Azure serverless computing is the abstraction of servers, infrastructure, and operating systems. When you build serverless apps you don’t need to provision and manage any servers, so you can take your mind off infrastructure concerns. Serverless computing is driven by the reaction to events and triggers happening in near-real-time—in the cloud. As a fully managed service, server management and capacity planning are invisible to the developer and billing is based just on resources consumed or the actual time your code is running.

Four core benefits of serverless computing from serverless.com:

  1. Zero administration – Deploy code without provisioning anything beforehand or managing anything afterward. There is no concept of a fleet, an instance, or even an operating system. No more bothering the Ops department.
  2. Auto-scaling – Let your service providers manage the scaling challenges. No need to fire alerts or write scripts to scale up and down. Handle quick bursts of traffic and weekend lulls the same way — with peace of mind.
  3. Pay-per-use – Function-as-a-service compute and managed services charged based on usage rather than pre-provisioned capacity. You can have complete resource utilization without paying a cent for idle time. The results? 90% cost-savings over a cloud VM, and the satisfaction of knowing that you never pay for resources you don’t use.
  4. Increased velocity – Shorten the loop between having an idea and deploying to production. Because there’s less to provision up front and less to manage after deployment, smaller teams can ship more features. It’s easier than ever to make your idea live.

OK, so there is a server involved in serverless computing. The beauty in this technology is that once you deploy your code, you don't have to worry about the underlying infrastructure. You just know that the app should work and you only incur costs when the app is running.

Basic flow

Serverless functions are loaded and executed based on the occurrence of one of the triggers/events supported by the cloud vendor. In this example the API Gateway triggers the serverless functions when an http call invokes the function. The API Gateway calls the handler for the function and passes in the user data. On return from the handler the response is sent to the client. This article focuses on the code inside the Serverless Function box in the picture below.

Figure 1: Request Workflow

This example utilizes two key functions:

  1. app – This function serves up an html application for user to enter the data. This is an example of a web application as a serverless function.
  2. score – This function takes user input from the web app, executes scoring on a Viya Server and returns the results.

Serverless.yml

The serverless.yml defines the serverless functions and the handlers, used to execute and other system related information. We will focus only on the application specific information.

The code snippet below shows the definition of the path and handler for the two functions in the serverles.yml file.

functions:
  app: 
    handler: src/app.app
    events:
      - http:
          path: app
          method: get
          cors: 
            origin: '*'
          request:
            parameters:
              paths:
                id: true  
 
  score:
    handler: src/score.score
    events:
      - http:
          path: score
          method: post
          cors: 
            origin: '*'

The functions(app & score) in the yaml define:

  1. event - http event will trigger this function
  2. path - this is path to the function - similar to what you define in Express or hapijs
  3. method - http standard GET, PUT etc...
  4. others - refer to the cloud vendor's documentation for other available options.

The serverless.yml file also sets application related information using environment variables. In this particular use case we define how to access SAS Viya and which scoring model to use.

environment:
#
# Information for logging into SAS Viya
#
  VIYA_SERVER: http://example.viya.server.com
  CLIENTID: raf
  CLIENTSECRET: raf
  USER:rafuser
  PASSWORD: rafpass
 
#
# astore to be used for scoring
#
  caslib: casuser
  name: GRADIENT_BOOSTING___BAD_2

A note on securing your password

In this example we store the userid and password in environment variables. This is to the keep the focus on the internals of serverless functions for SAS Viya. Locally you can use "serverless variables" to secure the information during development. However, for production deployment, refer to your provider's recommendations and the user community for best practices.

Sounds like a followup blog in the future 🙂

Anatomy of the serverless function

Figure 2 shows the flow inside the serverless function for this example. This pattern will repeat itself in your serverless functions.

Figure 2: Serverless Function Flow

Serverless function score

The code below is the handler for the score function. The rest of this section will discuss each of the key features of the handler.

//
// See src/score.js for the full code
//
module.exports.score = async function (event, context ) {
 
   let store      =  restaf.initStore(); /* initialize restaf     */
   let inParms = parseEvent(event);  /* get user input        */
   let payload = getLogonPayload(); /* get logon information */
 
   return store.logon(payload)               /* logon to SAS Viya */
        .then (()    > scoreMain( store, inParms )) /* score     */
        .then(result > setPayload(result)) /* return results     */
        .catch(err   > setError(err))	      /* else return errors */
}

Step 1: Parse the input

The event parameter contains the input from the caller (web application, another serverless function, etc).
The content of the event parameter is whatever the designer of the serverless function desires. In this particular case, a sample event data is shown below.

{
    "input": {
        "JOB"    : "J1",
        "CLAGE"  : 100,
        "CLNO"   : 20,
        "DEBTINC": 20,
        "DELINQ" : 2,
        "DEROG"  : 0,
        "MORTDUE": 4000,
        "NINQ"   : 1,
        "YOJ"    : 10,
        "LOAN"   : 10000,
        "VALUE"  : 1000000
    }
}

The parseEvent function validates the incoming information.

module.exports = function parseEvent(event)
    let input = null;
    let body = {};
    let rstore = {
        caslib:  process.env.ASTORE_CASLIB,
        name  : process.env.ASTORE_NAME
    }
    if ( event.body !=  null ) {
        body = ( typeof event.body === 'string') ? JSON.parse(event.body) : Object.assign({}, event.body);
       if ( body.hasOwnProperty('input') === true ) {
          input = body.input;
    }
    return { rstore: rstore, input: input }
}

Step 2: Logon to SAS Viya

The server.yml defines the SAS Viya logon information. Note there are other secure ways to manage sensitive information like passwords. You should refer to your provider’s documentation.

module.exports = function getLogonPayload() {
    let p = {
        authType    : 'password',
        host        : `${process.env.VIYA_SERVER}`,
        user        : process.env['USER'],
        password    : process.env['PASSWORD'],
        clientID    : process.env['CLIENTID'],
        clientSecret: (process.env.hasOwnProperty('CLIENTSECRET')) ? process.env[ 'CLIENTSECRET' ] : ''
        };
    return p;
 }

The line restaf.logon(payload) in function in the handler code logs on to the SAS Viya Server using this information.

Step 3 and Step 4: Create Payload and make REST API calls

On successful logon the server is called to do the scoring. This particular example uses the sccasl.runcasl method to run CAS Language (CASL) statements and return the scores. Creating the score has two steps:

  1. upload user input: The user input is converted to a csv and uploaded to a CAS table
  2. Submit CASL statements to SAS Viya (CAS) to do the scoring

The code in src/scoreMain in the repository accomplishes both these steps.

Each of these steps use a CAS action:

    • table.upload – to upload the user data into a CAS Table. The input data is converted into a comma-delimited file(csv) and then uploaded. The REST call using restaf looks like this:
    let csv = makecsv(input); /* create a csv */
    let JSON_Parameters = {
        casout: {
            caslib : 'casuser', /* a valid caslib */
            name   : 'INPUTDATA', /* name of output file on cas server */
            replace: true
        },
 
        importOptions: {
            fileType: 'csv' /* type of the file being uploaded */
        }
    };
 
    let payload = {
        headers: { 'JSON-Parameters': JSON_Parameters },
        data   : csv,
        action : 'table.upload'
    };
 
    let result = await store.runAction(session, payload);
    • sccasl.runcasl – execute CASL statements to do the scoring
 // Setup casl statements 	 	 
 let caslStatements = `	 	 
 loadactionset "astore";	 	 
 action table.loadTable /	 	 
 caslib = "${rstore.caslib}" 	 	 
 path = "${rstore.name}.sashdat"	 	 
 casout = { caslib = "${rstore.caslib}" name = "${rstore.name}" replace=TRUE};	 	 
 
 action astore.score /	 	 
 table = { caslib= 'casuser' name = 'INPUTDATA' } 	 	 
 rstore = { caslib= "${rstore.caslib}" name = '${rstore.name}' }	 	 
 out = { caslib = 'casuser' name = 'OUTPUTDATA' replace= TRUE};	 	 
 action table.fetch r = result/	 	 
 format = TRUE	 	 
 table = { caslib = 'casuser' name = 'OUTPUTDATA' } ;	 	 
 send_response(result);	 	 
 `;	 	 
 // execute cas actions	 	 
 payload = {	 	 
 action: 'sccasl.runcasl',	 	 
 data : { code: caslStatements}	 	 
 }	 	 
 result = await store.runAction(session, payload);

Step 5: Create response

AWS serverless functions must return data and error(s) in a certain form. The two functions setPayload.js and setError.js accomplish this.

module.exports = function setPayload (body) {
    return {
        "statusCode": 200,
        "headers"   : {
            'Access-Control-Allow-Origin'     : '*',
            'Access-Control-Allow-Credentials': true
          },
        "isBase64Encoded": false,
        "body"           : JSON.stringify(body)
    }
  }

Using the serverless functions

When the serverless function is deployed you will get a link for each of the functions. In our case we receive the request shown below (with xxxx replaced with appropriate information).

GET - https://xxxx.amazonaws.com/demo/app

The first link serves up the web application. The user enters some values and the app calls the score serverless function to get the results.
Alternatively, you can write your own application and make an http POST call to the score function using a link such as:

POST - https://xxxx.amazonaws.com/demo/score

To invoke the web application, you will visit the link

https://xxxx.amazonaws.com/demo/app

with your browser. You should see a display shown in Figure 3:

Figure 3: Application Input Screen

Entering values into the two fields and pressing Submit calls the second serverless function, score, and results in a pie chart as seen in Figure 4:

Figure 4: Score Report Screen

Please see the loan.html file in the GitHub repository for details on the application. Displayed below is the relevant part of the Javascript in the loan.html. The score-function-url is the url for the score function. The payload was described earlier in this article. The http call is made using axios.

async function runScore(inputValues ){
 
    let payload = {
        astore: {
            caslib: 'Public',
            name: 'GRADIENT_BOOSTING___BAD_2'
        },
        input: inputValues
    }
    let config = {
        url: {score-function-url}
        method: 'POST',
        data: payload
    }
    let r = await axios(config);
    return r.data.score;
 
}

Porting to other cloud providers

The cloud provider dependent information is handled in the following functions: score.js, parseEvent.js, setPayload.js and setError.js. The rest of the code is host agnostic. In writing your own functions the recommendation is to follow the same pattern as much as possible. The generic code is then available in its own repository for reuse with other providers and applications.

Go try it yourself

I have shown you how to deliver your SAS Viya applications as serverless functions. To access more examples please see the GitHub restaf-demos repository.

Supporting Resources

Serverless functions and SAS Viya - a good match was published on SAS Users.