Python Integration to SAS Viya

9月 142022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on how to summarize columns.

Load and explore data

Let's first load some data. In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and preview 5 rows using the SWAT head method. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.head()

The water_cluster CAS table contains daily water usage for a variety properties. The Daily_W_C_M3 column displays the water used per location in cubic meters (m3) .

Simple summarizations

Next, let's execute methods for simple summarizations on CAS table columns. If you've used Pandas before these techniques will be very familiar.

I'll start by getting the total water consumption for all the data using the sum method in the SWAT package on the CAS table column Daily_W_C_M3.

(tbl
 .Daily_W_C_M3
 .sum()
)
# and the results
401407.88800000015

Next, I'll view the maximum and minimum water consumption value in the data using the max and min methods in the SWAT package.

(tbl
 .Daily_W_C_M3
 .max()
)
# and the results
11910.0
 
(tbl
 .Daily_W_C_M3
 .min()
)
# and the results
0.0

Lastly, I'll find the average water consumption value in the CAS table using the mean method.

(tbl
 .Daily_W_C_M3
 .mean()
)
# and the results
8.591778424657537

Instead of manually specifying each method to view summary information, I could of used the summary CAS action or the describe method from the SWAT package. For more information check out my previous post Descriptive Statistics.

Find the property with the max water consumption

What if I want to find the property or properties with the max water consumption? One technique is to find the max water consumption value, then filter the CAS table for the row (or rows) with that value. Here I will store the max water consumption value in the variable maxWaterConsumption. Then I'll use the variable in the query method to filter the CAS table. For more information on filtering CAS tables, check out my previous blog post Filtering CAS Tables.

## Store the max water consumption value
maxWaterConsumption = (tbl
                      .Daily_W_C_M3
                      .max()
)
 
## Filter the CAS table for the property with the max water usage
(tbl
 .query(f"Daily_W_C_M3 = {maxWaterConsumption }")
 .head()
)

The results show that the property address 1660 S DAIRY ASHFORD ST had the largest daily water consumption rate.

Find the top 10 daily water consumption values and properties

Lastly, to find the top 10 daily water consumption properties and values I can use the nlargest method in the SWAT package. This method is similar to the Pandas nlargest method.  One difference is the SWAT nlargest method does not implement the keep parameter.

I'll specify the SWAT nlargest method and store the results in the variable df_top10. Then I'll display the results.

df_top10 = tbl.nlargest(10, 'Daily_W_C_M3')
display(df_top10)

The CAS server will execute the request and return a SASDataFrame to the client. The SASDataFrame shows that a single property contains the top 10 largest water consumption values.

Remember, a SASDataFrame is a subclass of a Pandas DataFrame. As a result, you can work with them as you normally would a Pandas DataFrame. To learn more about working with CAS objects check out my previous post Working with CAS Actions and CASResults Objects.

Summary

The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. This enables you to utilize the massively parallel processing power of the CAS server in SAS Viya using Python. Here we learned about using familiar Pandas methods in the SWAT package like sum, min, mean, max and nlargest.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 9 - Summarize Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

6月 302022
 

Welcome to the seventh installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to learn how to filter CAS tables.

Load and explore data

First, I'll load the cars.csv file into memory from the sassfotware GitHub page. I'll start by using my CAS connection object conn, followed by the upload_file method from the SWAT package. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. The data in this example is small, for training purposes. Processing data in the CAS server is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv',
      casout={'name':'cars', 'caslib':'casuser'})

The response from the command is minimal, but there's much going on. Specifically:

  • Cloud Analytic Services renders the uploaded file available as table CARS in caslib CASUSER(Peter).
  • The table CARS is created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.

Next, I'll view the type and value of the tbl variable to confirm it's a CASTable object that references the CARS table in the CAS server.

display(type(tbl), tbl)
# and the command results:
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

The results show tbl is a CASTable object and references the CARS table in the CAS server. Now, when I execute methods on the tbl object, the processing occurs in CAS.

Further, let's preview the CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame as expected.

Finally, I'll execute the SWAT shape attribute to view the number of rows and columns in the CAS table.

tbl.shape
# and the results
(428, 15)

Notice, the CARS CAS table contains 428 rows and 15 columns.

Filter a CAS Table

Now that we have created the table in the caslib and are familiar with the data, let's learn how to filter CAS tables!

Python Technique

First, I'll start with the traditional Pandas filtering technique. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. This will filter the CAS table for rows where the Make of the car is Acura or Toyota,

makeFilter = (tbl['Make']=='Acura') | (tbl['Make']=='Toyota')
tbl[makeFilter].shape
# and the results
(35, 15)

The results show that 35 rows in the CARS CAS table have the Make Toyota or Acura.

Query Method

You can also filter a CAS table using the query method in the SWAT package. Wait a minute, does this look familiar? I mentioned SWAT blends the world of Pandas as SAS. I'll again filter for a Make of Toyota or Acura .

tbl.query("Make = 'Acura' or Make = 'Toyota'").shape
# and the results
(35, 15)

Notice, the results are the same.

isin Method

All right. Sorry, but again I'll show you another familiar method from the SWAT package. Since I am filtering for specific categories, I can also use the SWAT isin method, which works similarly to the Pandas isin method!

tbl[tbl.Make.isin(['Acura', 'Toyota'])].shape
# and the results
(35, 15)

Notice, again the results are the same.

CAS Table WHERE Parameter

Lastly, I'll show you a specific CAS technique, which is very efficient if you need to filter data for multiple methods or actions. To begin, let's display the type and value of the tbl object again.

display(type(tbl), tbl)
# and the results
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

Notice the tbl variable is a CASTable object, which is a reference to a CAS table in the CAS server, not an actual table. In this example, it references the CARS table in the Casuser caslib.

The CASTable object enables you to add a variety of parameters to the object to filter, create columns, group and more. In this example, I'll add the where parameter to the CASTable object to filter for car makes of Toyota and Acura. Then I'll display the value of the CASTable object again.

tbl.where = 'Make = "Acura" or Make = "Toyota"'
display(tbl)

CASTable('CARS', caslib='CASUSER(Peter)', where='Make = "Acura" or Make = "Toyota"')

Notice the CASTable still references the CARS table in the Casuser caslib, but the where parameter has been added to the object. The where parameter is applied when the object is used with an action or method.

Let's check it out. Here I'll specify the tbl variable (the CASTable object) then the shape attribute.

tbl.shape
# and the results
(35, 15)

Notice the results again show 35 cars are either Toyota or Acura.

Now, what is nice about using this method is if I want to continue to use this filter with other methods or actions, all I need to do is use the tbl object. Perhaps you want the value counts of each Make? The SWAT package has the value_counts method available.

tbl.Make.value_counts()
# and the results
Toyota    28
Acura      7

The response displays the counts only for rows where Make is Acura or Toyota. This occurs because the where parameter is applied to the CAS table when the value_counts method is executed.

Delete the WHERE parameter

Lastly, if you want to delete the where parameter from the CASTable object, use the del_params method.

tbl.del_params('where')
display(tbl)
# and the results
CASTable('CARS', caslib='CASUSER(Peter)')

Once the parameter is deleted you can revert back to analyzing the entire table. Here I'll use the shape attribute again to view how many rows are in the original CAS table.

tbl.shape
# and the results
(428, 15)

Summary

SAS Viya offers various filtering options for CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 7 - Filtering CAS Tables was published on SAS Users.

1月 072022
 

Welcome to the sixth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to generate simple descriptive statistics of a CAS table.

Let's begin by confirming the cars table is loaded into memory. With a connection to CAS established, execute the tableInfo action to view available in-memory tables. If necessary, you can execute the following code in SAS Studio to load the sashelp.cars table into memory.

conn.tableinfo(caslib="casuser")

The results show the cars table is loaded into memory and available for processing. Next, reference the cars table in the variable tbl. Then use the print function to show the value of the variable.

tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)
CASTable('cars', caslib='casuser')

The results show that the tbl variable references the cars table in the CAS server.

Preview the CAS Table

First things first. Remember, the SWAT package blends the world of Pandas and CAS into one. So you can begin with the traditional head method to preview the CAS table.

tbl.head()

The SWAT head method returns five rows from the CAS server to the client as expected.

The Describe Method

Next, let's retrieve descriptive statistics of all numeric columns by using the familiar describe method on the CAS table.

tbl.describe()

The SWAT describe method returns the same descriptive statistics as the Pandas describe method. The only difference is that the SWAT version uses the CAS API to convert the describe method into CAS actions behind the scenes to process the data on the distributed CAS server. CAS processes the data and returns summarized results back to the client as a SASDataFrame, which is a subclass of the Pandas DataFrame. You can now work with the results as you would a Pandas DataFrame.

Summary CAS Action

Instead of using the familiar describe method, let's use a CAS action to do something similar. Here I'll use the summary CAS action.

tbl.summary()

Summary CAS Action

The results of the summary action return a CASResults object (Python dictionary) to the client. The CASResults object contains a single key named Summary with a SASDataFrame as the value. The SASDataFrame shows a variety of descriptive statistics.  While the summary action does not return exactly the same statistics as the describe method, it can provide additional insights into your data.

What if we don't want all the statistics for all of the data?

Selecting Columns and Summary Statistics with the Summary Action

Let's add additional parameters to the summary action. I'll add the inputs parameter to specify the columns to analyze in the CAS server.

tbl.summary(inputs = ['MPG_City','MPG_Highway'])

The results show only the MPG_City and MPG_Highway columns were analyzed.

Next, I'll use the subSet parameter to specify the summary statistics to produce. Here I'll obtain the MEAN, MIN and MAX.

tbl.summary(inputs = ['MPG_City','MPG_Highway'],
                       subSet = ['mean','min','max'])

The results processed only the MPG_City and MPG_Highway columns, and returned only the specified summary statistics to the client.

Creating a Calculated Column

Lastly, let's create a calculated column within the summary action. There are a variety of ways to do this. I like to add it as a parameter to the CASTable object. You can do that by specifying the tbl object, then computedVarsProgram parameter. Within computedVarsProgram you can use SAS assignment statements with most SAS functions. Here we will create a new column name MPG_Avg that takes the mean of MPG_City and MPG_Highway. Lastly, add the new column to the inputs parameter.

tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

In the results I see the calculated column and requested summary statistics.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package, or the flexible, highly optimized CAS actions like summary to easily obtain summary statistics of your data in the massively parallel processing CAS engine.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Index
SWAT API Reference
CAS Action Documentation
SAS® Cloud Analytic Services: Fundamentals
SAS Scripting Wrapper for Analytics Transfer (SWAT)
CAS Action! - a series on fundamentals
Execute the following code in SAS Studio to load the sashelp.cars table into memory

Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics was published on SAS Users.

6月 232021
 

Welcome to the fifth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, how to work with the results, and how your data is organized on the CAS server. In this post I'll discuss loading files into the CAS server. Before proceeding, I need to clear up some terminology. There are two sources for loading files into memory, server-side and client-side. It's important to understand the difference.

Server-Side vs Client-Side

Let's start off with an image depicting server- and client-side file locations.

Server-side files are mapped to a caslib. That is, files in the data source portion of a caslib are server-side. For more infomation on data sources, see Part 4 of this series. Client-side files are files that are not mapped to a caslib. These could be files stored in a SAS library or other local files. That's about it!

In this post, I'll focus on loading server-side files into memory (Part 6 of the series will cover client-side files). In Python, we have two data loading options, the loadTable CAS action, or the load_path method. The techniques are almost identical. The difference is the loadTable action returns a dictionary of information and the load_path method returns a CASTable. If you use the loadTable action you must reference a new table in a separate step. In general, I prefer the actions as they are available in any language such as CASL, R, and Python. Conversely, methods like load_path are specific to a language.

Let's look at a few examples of loading different types of datafiles into CAS.

Loading a SAS Data Set into Memory

In the first example, I want to load the cars.sas7bdat SAS data set from the casuser caslib into memory. Let's confirm the table exists in the caslib by executing the fileInfo action. I made my connection to CAS, naming it conn. For more information about making a connection to CAS, visit Part 1 of the series.

conn.fileInfo(caslib="casuser")

In the results I can see that the cars.sas7bdat file is available in the casuser caslib.

Next, I'll use the loadTable action. First I'll specify the path parameter with the full file name, then the caslib parameter specifying the input caslib, and lastly the casOut parameter to specify the new CAS table. In the casOut parameter you specify a dictionary of key value pairs. Here, I want to name the new table cars_sas and place the table in the casuser caslib.

conn.loadTable(path="cars.sas7bdat", caslib="casuser",
               casOut={
                       "name":"cars_sas",
                       "caslib":"casuser"
               })

After executing the loadTable action, the result returns a dictionary of information about the new table.

Now that the file is loaded into memory, I can process the data by making a reference to the in-memory table using the CASTable method. Then I'll use the head method to view the first 5 rows.

cars = conn.CASTable("cars_sas",caslib="casuser")
cars.head()

Loading a CSV file into Memory

In the next example, I'll load a CSV file into memory. Even though it's a different file type, we still use the loadTable action. The only change needed is the file name in the path parameter. That's it!

conn.loadTable(path="cars.csv", caslib="casuser",
           casOut={
                "name":"cars_csv",
                "caslib":"casuser"
           })
 
csv = conn.CASTable("cars_csv", caslib="casuser")
csv.head()

 

Loading a Text file into Memory

Lastly, data is not always stored in the correct format. Sometimes when loading files you need to modify the default parameters. In this final scenario, let's explore an example using the importOptions parameter. importOptions takes a variety of key value pairs as the values to dictate how to load the file. The options vary depending on the file type.

The file in this example contains no column names, and the delimiter is a vertical bar (or pipe).

I'll begin with specifying the column names in a variable named colNames. Then, in the loadTable action I'll add the importOptions parameter and specify the following:

  • file type I want to import, using the fileType subparameter,
  • the delimiter, using the delimiter subparameter,
  • the getNames subparameter with the value False, since no column names exist,
  • and lastly, the vars subparameter with the names of the columns.

And as in the previous examples, I'll print out the first five lines of the resulting table from memory.

colNames=['Make', 'Model', 'Type', 'Origin', 'DriveTrain', 'MSRP', 'Invoice',
          'EngineSize', 'Cylinders', 'Horsepower', 'MPG_City', 'MPG_Highway',
          'Weight', 'Wheelbase', 'Length']
 
conn.loadTable(path="cars_delim_bar.txt", caslib="casuser",
           casOut={
                "name":"cars_text",
                "caslib":"casuser"
           },
           importOptions={"fileType":"DELIMITED",
                          "delimiter":"|",
                          "getNames":False,
                          "vars":colNames}
          )
 
csv = conn.CASTable("cars_text", caslib="casuser")
csv.head()

Summary

The loadTable action is flexible and an easy way to load tables into memory. Benefits of the loadTable action include:

  • it's filetype agnostic
  • has many options available based on the file types being imported into memory
  • available in any CAS-compatible language like Python, R and CASL with slight changes to the syntax based on the language used

 

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 5 - Loading Server-Side Files into Memory was published on SAS Users.

6月 232021
 

Welcome to the fifth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, how to work with the results, and how your data is organized on the CAS server. In this post I'll discuss loading files into the CAS server. Before proceeding, I need to clear up some terminology. There are two sources for loading files into memory, server-side and client-side. It's important to understand the difference.

Server-Side vs Client-Side

Let's start off with an image depicting server- and client-side file locations.

Server-side files are mapped to a caslib. That is, files in the data source portion of a caslib are server-side. For more infomation on data sources, see Part 4 of this series. Client-side files are files that are not mapped to a caslib. These could be files stored in a SAS library or other local files. That's about it!

In this post, I'll focus on loading server-side files into memory (Part 6 of the series will cover client-side files). In Python, we have two data loading options, the loadTable CAS action, or the load_path method. The techniques are almost identical. The difference is the loadTable action returns a dictionary of information and the load_path method returns a CASTable. If you use the loadTable action you must reference a new table in a separate step. In general, I prefer the actions as they are available in any language such as CASL, R, and Python. Conversely, methods like load_path are specific to a language.

Let's look at a few examples of loading different types of datafiles into CAS.

Loading a SAS Data Set into Memory

In the first example, I want to load the cars.sas7bdat SAS data set from the casuser caslib into memory. Let's confirm the table exists in the caslib by executing the fileInfo action. I made my connection to CAS, naming it conn. For more information about making a connection to CAS, visit Part 1 of the series.

conn.fileInfo(caslib="casuser")

In the results I can see that the cars.sas7bdat file is available in the casuser caslib.

Next, I'll use the loadTable action. First I'll specify the path parameter with the full file name, then the caslib parameter specifying the input caslib, and lastly the casOut parameter to specify the new CAS table. In the casOut parameter you specify a dictionary of key value pairs. Here, I want to name the new table cars_sas and place the table in the casuser caslib.

conn.loadTable(path="cars.sas7bdat", caslib="casuser",
               casOut={
                       "name":"cars_sas",
                       "caslib":"casuser"
               })

After executing the loadTable action, the result returns a dictionary of information about the new table.

Now that the file is loaded into memory, I can process the data by making a reference to the in-memory table using the CASTable method. Then I'll use the head method to view the first 5 rows.

cars = conn.CASTable("cars_sas",caslib="casuser")
cars.head()

Loading a CSV file into Memory

In the next example, I'll load a CSV file into memory. Even though it's a different file type, we still use the loadTable action. The only change needed is the file name in the path parameter. That's it!

conn.loadTable(path="cars.csv", caslib="casuser",
           casOut={
                "name":"cars_csv",
                "caslib":"casuser"
           })
 
csv = conn.CASTable("cars_csv", caslib="casuser")
csv.head()

 

Loading a Text file into Memory

Lastly, data is not always stored in the correct format. Sometimes when loading files you need to modify the default parameters. In this final scenario, let's explore an example using the importOptions parameter. importOptions takes a variety of key value pairs as the values to dictate how to load the file. The options vary depending on the file type.

The file in this example contains no column names, and the delimiter is a vertical bar (or pipe).

I'll begin with specifying the column names in a variable named colNames. Then, in the loadTable action I'll add the importOptions parameter and specify the following:

  • file type I want to import, using the fileType subparameter,
  • the delimiter, using the delimiter subparameter,
  • the getNames subparameter with the value False, since no column names exist,
  • and lastly, the vars subparameter with the names of the columns.

And as in the previous examples, I'll print out the first five lines of the resulting table from memory.

colNames=['Make', 'Model', 'Type', 'Origin', 'DriveTrain', 'MSRP', 'Invoice',
          'EngineSize', 'Cylinders', 'Horsepower', 'MPG_City', 'MPG_Highway',
          'Weight', 'Wheelbase', 'Length']
 
conn.loadTable(path="cars_delim_bar.txt", caslib="casuser",
           casOut={
                "name":"cars_text",
                "caslib":"casuser"
           },
           importOptions={"fileType":"DELIMITED",
                          "delimiter":"|",
                          "getNames":False,
                          "vars":colNames}
          )
 
csv = conn.CASTable("cars_text", caslib="casuser")
csv.head()

Summary

The loadTable action is flexible and an easy way to load tables into memory. Benefits of the loadTable action include:

  • it's filetype agnostic
  • has many options available based on the file types being imported into memory
  • available in any CAS-compatible language like Python, R and CASL with slight changes to the syntax based on the language used

 

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 5 - Loading Server-Side Files into Memory was published on SAS Users.