Programming Tips

9月 142022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on how to summarize columns.

Load and explore data

Let's first load some data. In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and preview 5 rows using the SWAT head method. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.head()

The water_cluster CAS table contains daily water usage for a variety properties. The Daily_W_C_M3 column displays the water used per location in cubic meters (m3) .

Simple summarizations

Next, let's execute methods for simple summarizations on CAS table columns. If you've used Pandas before these techniques will be very familiar.

I'll start by getting the total water consumption for all the data using the sum method in the SWAT package on the CAS table column Daily_W_C_M3.

(tbl
 .Daily_W_C_M3
 .sum()
)
# and the results
401407.88800000015

Next, I'll view the maximum and minimum water consumption value in the data using the max and min methods in the SWAT package.

(tbl
 .Daily_W_C_M3
 .max()
)
# and the results
11910.0
 
(tbl
 .Daily_W_C_M3
 .min()
)
# and the results
0.0

Lastly, I'll find the average water consumption value in the CAS table using the mean method.

(tbl
 .Daily_W_C_M3
 .mean()
)
# and the results
8.591778424657537

Instead of manually specifying each method to view summary information, I could of used the summary CAS action or the describe method from the SWAT package. For more information check out my previous post Descriptive Statistics.

Find the property with the max water consumption

What if I want to find the property or properties with the max water consumption? One technique is to find the max water consumption value, then filter the CAS table for the row (or rows) with that value. Here I will store the max water consumption value in the variable maxWaterConsumption. Then I'll use the variable in the query method to filter the CAS table. For more information on filtering CAS tables, check out my previous blog post Filtering CAS Tables.

## Store the max water consumption value
maxWaterConsumption = (tbl
                      .Daily_W_C_M3
                      .max()
)
 
## Filter the CAS table for the property with the max water usage
(tbl
 .query(f"Daily_W_C_M3 = {maxWaterConsumption }")
 .head()
)

The results show that the property address 1660 S DAIRY ASHFORD ST had the largest daily water consumption rate.

Find the top 10 daily water consumption values and properties

Lastly, to find the top 10 daily water consumption properties and values I can use the nlargest method in the SWAT package. This method is similar to the Pandas nlargest method.  One difference is the SWAT nlargest method does not implement the keep parameter.

I'll specify the SWAT nlargest method and store the results in the variable df_top10. Then I'll display the results.

df_top10 = tbl.nlargest(10, 'Daily_W_C_M3')
display(df_top10)

The CAS server will execute the request and return a SASDataFrame to the client. The SASDataFrame shows that a single property contains the top 10 largest water consumption values.

Remember, a SASDataFrame is a subclass of a Pandas DataFrame. As a result, you can work with them as you normally would a Pandas DataFrame. To learn more about working with CAS objects check out my previous post Working with CAS Actions and CASResults Objects.

Summary

The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. This enables you to utilize the massively parallel processing power of the CAS server in SAS Viya using Python. Here we learned about using familiar Pandas methods in the SWAT package like sum, min, mean, max and nlargest.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 9 - Summarize Columns was published on SAS Users.

8月 302022
 

A key tool for SAS Viya admins and many users is the command line interface (CLI). You can think of it in terms of other CLIs, but with direct access to SAS Viya. The CLI facilitates automation of tasks as well as other administrative functions. You can find information about installing the CLI and its plugins SAS Users YouTube tutorial.

SAS Viya Workload Management Environment. SAS Viya Workload Management was developed by SAS, starting with the cloud-native version of SAS Viya and relies on Kubernetes. Kubernetes comes with capabilities to balance workload of jobs; however, SAS Viya Workload Orchestrator is designed to enhance Kubernetes’ workload capabilities by adding prioritization for jobs, queues management, etc.

Now, I am predicting you know what a plug-in is but just in case you don’t, a plug-in manages and monitors the service for which you are seeking or providing information. Before you use you any SAS Viya plug-in via the CLI, you must create a profile and authenticate. Here is the documentation which details those steps. You may also refer to the YouTube tutorial linked earlier.

Workload orchestrator plugin

In the example in Figure 1, I am using a command prompt for the CLI to connect to my SAS Viya environment. Make sure you read the links above on how to create a profile and authenticate. In the shell, navigate to the directory with the CLI executable. SAS Viya commands take the form:

> sas-viya --profile <profile name> command [command options] [arguments]

Now let’s look at some basic commands. We’ll increase the complexity of the command as we go along.

List command options

The most generic command for workload-orchestrator is listed in Figure 1.

Figure 1: List of command Line Options for workload orchestrator

The command returns the name of the CLI, the usage syntax and commands at my disposal.

Config command

Next, in Figure 2, let’s run the config command.

Figure 2: The config command displays granular details for the grid or tenant

The response displays details for additional commands for the grid or tenant configuration.

Extending the config code from above, we’ll use the config grid (and tenant) commands. With these commands you have the options get and set. Get returns the current grid information and set applies a new configuration to the grid.

Figure 3: The config command followed by grid, tenant and get

As you can see in Figure 3, I used the get command for both grid and tenant to display the configurations.

Hosts command

Now let’s look at the hosts command. In Figure 4, we use the hosts command to display the additional commands close, list and open hosts.

Figure 4: The hosts command

Figure 5 uses the list command and displays information about the current SAS Viya host.

Figure 5: The hosts command with the list command

Info command

Moving on to the info command in Figure 6, we see the available commands are grid, manager and tenant.

Figure 6: The info command with the list commands

In Figure 7, we use the three commands.

Figure 7: The info command with the grid, manager and tenant commands

The grid command lists information about the product. Manager lists information about the current usage of the manager. Finally, the tenant command lists information about the tenant.

Jobs command

Moving on to the jobs command in Figure 8.

Figure 8: The jobs command with the various commands

We see the results with the available commands including cancelling, listing, resume processing and suspending jobs.
We have a couple of commands remaining. Let’s move on.

Logs command

The logs command, as seen in Figure 9, allows you to set the log level or list specified logs.

Figure 9: The logs command with the various commands

The levels command also has get and set sub-commands. Get displays the logger levels for a specified host or all hosts. The set command sets a specified level for one more logger.

Figure 10 displays the logs command with the list command, and it shows info and error message collected.

Figure 10: The logs command with the list command

Queues command

The final command we’ll look at is the queues command. In Figure 11, you can see there are multiple commands available for queues.

Figure 11: The queues command with the various commands

Figure 12 is the list command of the queues command.

Figure 12: The queues command with the list command

Feel free to try out the other queues commands on your own.

Finally

Fairly straight forward, eh? I hope this helps you understand how you can use the workload-orchestration SAS Viya CLI plug to retrieve and configure various settings of the SAS Viya Workload Management Environment.

SAS Viya Workload Orchestrator CLI Plug-in was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

6月 302022
 

Welcome to the seventh installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to learn how to filter CAS tables.

Load and explore data

First, I'll load the cars.csv file into memory from the sassfotware GitHub page. I'll start by using my CAS connection object conn, followed by the upload_file method from the SWAT package. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. The data in this example is small, for training purposes. Processing data in the CAS server is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv',
      casout={'name':'cars', 'caslib':'casuser'})

The response from the command is minimal, but there's much going on. Specifically:

  • Cloud Analytic Services renders the uploaded file available as table CARS in caslib CASUSER(Peter).
  • The table CARS is created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.

Next, I'll view the type and value of the tbl variable to confirm it's a CASTable object that references the CARS table in the CAS server.

display(type(tbl), tbl)
# and the command results:
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

The results show tbl is a CASTable object and references the CARS table in the CAS server. Now, when I execute methods on the tbl object, the processing occurs in CAS.

Further, let's preview the CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame as expected.

Finally, I'll execute the SWAT shape attribute to view the number of rows and columns in the CAS table.

tbl.shape
# and the results
(428, 15)

Notice, the CARS CAS table contains 428 rows and 15 columns.

Filter a CAS Table

Now that we have created the table in the caslib and are familiar with the data, let's learn how to filter CAS tables!

Python Technique

First, I'll start with the traditional Pandas filtering technique. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. This will filter the CAS table for rows where the Make of the car is Acura or Toyota,

makeFilter = (tbl['Make']=='Acura') | (tbl['Make']=='Toyota')
tbl[makeFilter].shape
# and the results
(35, 15)

The results show that 35 rows in the CARS CAS table have the Make Toyota or Acura.

Query Method

You can also filter a CAS table using the query method in the SWAT package. Wait a minute, does this look familiar? I mentioned SWAT blends the world of Pandas as SAS. I'll again filter for a Make of Toyota or Acura .

tbl.query("Make = 'Acura' or Make = 'Toyota'").shape
# and the results
(35, 15)

Notice, the results are the same.

isin Method

All right. Sorry, but again I'll show you another familiar method from the SWAT package. Since I am filtering for specific categories, I can also use the SWAT isin method, which works similarly to the Pandas isin method!

tbl[tbl.Make.isin(['Acura', 'Toyota'])].shape
# and the results
(35, 15)

Notice, again the results are the same.

CAS Table WHERE Parameter

Lastly, I'll show you a specific CAS technique, which is very efficient if you need to filter data for multiple methods or actions. To begin, let's display the type and value of the tbl object again.

display(type(tbl), tbl)
# and the results
swat.cas.table.CASTable
CASTable('CARS', caslib='CASUSER(Peter)')

Notice the tbl variable is a CASTable object, which is a reference to a CAS table in the CAS server, not an actual table. In this example, it references the CARS table in the Casuser caslib.

The CASTable object enables you to add a variety of parameters to the object to filter, create columns, group and more. In this example, I'll add the where parameter to the CASTable object to filter for car makes of Toyota and Acura. Then I'll display the value of the CASTable object again.

tbl.where = 'Make = "Acura" or Make = "Toyota"'
display(tbl)

CASTable('CARS', caslib='CASUSER(Peter)', where='Make = "Acura" or Make = "Toyota"')

Notice the CASTable still references the CARS table in the Casuser caslib, but the where parameter has been added to the object. The where parameter is applied when the object is used with an action or method.

Let's check it out. Here I'll specify the tbl variable (the CASTable object) then the shape attribute.

tbl.shape
# and the results
(35, 15)

Notice the results again show 35 cars are either Toyota or Acura.

Now, what is nice about using this method is if I want to continue to use this filter with other methods or actions, all I need to do is use the tbl object. Perhaps you want the value counts of each Make? The SWAT package has the value_counts method available.

tbl.Make.value_counts()
# and the results
Toyota    28
Acura      7

The response displays the counts only for rows where Make is Acura or Toyota. This occurs because the where parameter is applied to the CAS table when the value_counts method is executed.

Delete the WHERE parameter

Lastly, if you want to delete the where parameter from the CASTable object, use the del_params method.

tbl.del_params('where')
display(tbl)
# and the results
CASTable('CARS', caslib='CASUSER(Peter)')

Once the parameter is deleted you can revert back to analyzing the entire table. Here I'll use the shape attribute again to view how many rows are in the original CAS table.

tbl.shape
# and the results
(428, 15)

Summary

SAS Viya offers various filtering options for CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 7 - Filtering CAS Tables was published on SAS Users.

6月 162022
 

MS Excel logo No matter which powerful analytical tools data professionals use for their data processing, MS Excel remains the output of choice for many users and whole industries.

In banking and finance, for example, I have seen many SAS users create quite sophisticated data queries and data analysis projects in SAS Enterprise Guide. Yet, at the end, when the final datasets have been produced and validated, comes the manual part when users export those tables into Excel, and then combine and rearrange them by copying-pasting into a desired workbook for storing and distributing.

However, this heavily manual process can be not just fully automated, but also enhanced compared with the point-and-click “export to Excel” and “copy-paste” interactive process. Here is how.

Creating a single simple Excel sheet

Suppose, we want to convert SASHELP.CLASS data table to Excel. Here is a bare-bone solution using SAS Output Delivery System:

ods excel file='C:\Projects\SAS_to_Excel\Single_sheet.xlsx';
 
proc print data=SASHELP.CLASS noobs;
run;
 
ods excel close;

This code is pretty much self-explanatory. It will produce Single_sheet.xlsx Excel workbook file in the folder C:\Projects\SAS_to_Excel. When opened in Excel, it will look as follows:

If you browse through the ODS EXCEL documentation you will find a variety of options that allow you to customize Excel output. Let’s get a little fancy by utilizing some of them.

Creating several customized sheets in Excel workbook

The following code example creates two sheets (tabs) in a single workbook. In addition, it demonstrates some other features to enhance data visualization.

/* -------------------------------------------- */
/* Two sheets workbook with enhanced appearance */
/* -------------------------------------------- */
 
/* Formats for background & foreground coloring */
proc format;
   value hbg 
      50 <- 60   = #66FF99
      60 <- 70   = #FFFF99
      70 <- high = #FF6666
      ;
   value hfg
      low -< 50, 70 <- high = white;
run;
 
/* Define custom font style for ODS TEXT */
proc template;
   define style styles.MyStyle;
   parent=styles.htmlblue;
   style usertext from usertext /
      foreground  = #FF33CC
      font_weight = bold
      ;
   end;
run;
 
/* ODS Excel output file destination */
ods excel file = 'C:\Projects\SAS_to_Excel\Two_sheets_fancy.xlsx';
 
   /* Excel options for 1st sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CLASS'
        frozen_headers  = 'on'
        embedded_titles = 'on' )
      style = styles.MyStyle;
 
   title justify=left color='#4D7EBF' 'This is TITLE for SASHELP.CLASS';
 
   ods text='This is TEXT for SASHELP.CLASS';
 
   proc print data=SASHELP.CLASS noobs;
      var NAME;
      var SEX AGE / style = {just=C};
      var HEIGHT  / style = {background=hbg. foreground=hfg.};
      var WEIGHT;
   run;
 
   /* Excel options for 2nd sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CARS'
        frozen_headers  = 'on'
        embedded_titles = 'on' );
 
   title 'This is TITLE for SASHELP.CARS';
 
   proc print data=SASHELP.CARS noobs;
   run;
 
ods excel close;

Here are the code highlights:

  • PROC FORMAT creates two formats, HBG and HFG, which are used in the first PROC PRINT to illustrate cell text and background coloring.

    NOTE: In SAS, colors are specified as hexadecimal RGB values (cxRRGGBB). However, I found (although it seems undocumented) that in PROC FORMAT and PROC TEMPLATE these colors can be written as quoted (double or single) or unquoted values, as well as prefixed with either ‘#’ or ‘cx’ (for hexadecimal). For example, the following are all valid values: #FFAADD, '#FFAADD', cxFFAADD, 'cxFFAADD'.
  • PROC TEMPLATE defines usertext custom font style for ODS TEXT, which used in ODS EXCEL along with and as alternative to the TITLE statement.
  • ODS EXCEL FILE=file-specification statement opens the EXCEL destination and specifies the output file for ODS EXCEL. The output file-specification can be either a physical file name (must be in quotes) or a fileref (without quotes) assigned with FILENAME statement. It can point to a location on the machine where SAS is run (SAS server), or a network drive accessible from the SAS server.

    This statement follows by two separate ODS EXCEL OPTIONS statements – one per corresponding sheet (tab).
  • ODS EXCEL OPTIONS statement specifies destination-specific suboptions with space-delimited name='value' pairs. In particular, we use the following options:

    • SHEET_NAME= specifies the name for the next worksheet (worksheet names can be up to 28 characters long).
    • FROZEN_HEADERS='ON' specifies that headers are not scrolled when table is vertically scrolled (default is OFF). It is very convenient feature that keeps the title(s) and column names in view while user scrolls through the table rows.
    • EMBEDDED_TITLES='ON' specifies whether titles should appear in the worksheet (default is OFF).

    There is a variety of other useful ODS EXCEL options allowing further customization of your Excel workbook appearance and functionality.

  • TITLE statement is highly customizable with Output Delivery System as shown in this custom TITLE with ODS example.
  • ODS TEXT= statement inserts text into your ODS output. Unlike TITLE for which ODS EXCEL merges several cells, ODS TEXT places its text in a single cell (see screenshot below). The UserText style element that we modified using PROC TEMPLATE controls the font style, font color, and other attributes of the text that the ODS TEXT= statement produces.
  • In PROC PRINT, we use multiple VAR statements to select variables, determine their order and apply styles (text and background colors) to the printed values. There are much more ODS styles with PROC PRINT available for further customizations.
  • The next section of the code ( /* Excel options for 2nd sheet (tab) */ ) creates the second sheet in the same Excel workbook. Similarly, you can create as many sheets/tabs as you wish.
  • The last statement ods excel close; closes the ODS Excel destination so nothing more is written to the output file.

The following are the two screenshots illustrating the two sheets in the produced Excel workbook:

SAS creates Excel workbook with several sheets/tabs

Questions

Do you find this post useful? Do you have questions, comments, suggestions, other tips or tricks about creating MS Excel workbooks in SAS? Please share with us below.

Additional Resources

TUNE IN NOW | LEARN HOW TO READ AND WRITE EXCEL FILES WITH SAS

Automating Excel workbooks creation using SAS was published on SAS Users.

5月 312022
 

If you use formatted variables in a CLASS statement in procedures such as MEANS or UNIVARIATE, SAS will use the formatted values of those variables and not the internal values. For example, consider you have a data set (Health) with variables Subj, Age, Weight, and Height. You want to see the mean Weight and Height for three age groups: 0 to 40, 41-70, and greater than 70. A program to create the Health data set and compute the means is shown below.

Data Health;
   input Subj $ Age Weight Height;
datalines;
001 23 120 72
002 35 200 80
003 77 98 63
004 51 230 75
005 15 98 54
006 44 160 68
007 83 111 65
008 10 . 45
009 60 200 .
;
proc format;
   value AgeGrp 0-40 = '0 to 40'
                41-70 = '41 to 70'
                71-high = '&gt;70';
run;
 
proc means data=Health n nmiss mean;
   class Age;
   format Age AgeGrp.;
   var Weight Height;
run;

Here is the output:

Notice that there was no need to run another DATA step and create a variable representing age ranges. Not only is this an easier way to get these results—it is also much more efficient.

The other day, I received an email from a person and he posed the following question. How can I see the mean of the variable Sales (in the SASHELP data set Shoes) for three ranges of Sales: 0 to 100,000; 100,000 to 200,000; 200,000 and above?

My first thought was to create a variable (called Range) and then use that variable in a CLASS statement with PROC MEANS. However, after some thought, I wondered if I could use the Sales variable in the CLASS statement (and supply a format) and use the same variable in the VAR statement. As is often the case, the best way to answer a SAS programming question is to write a short program and see what happens. Here is what I suggested:

proc format;
  value SalesRange 0 - &lt;100000 = 'Lower'
              100000 - &lt;200000 = 'Middle'
              200000 - high = 'Upper';
run;
 
proc means data=sashelp.Shoes n nmiss mean std;
   class Sales;  
   format Sales SalesRange.;
   var Sales;
run;

Here is the output:

As you can see from the output, it worked just fine.

Before we leave this topic, let's use the first example, referencing the Health data set, to demonstrate a Multi-Label Format. Suppose you want to see means of Weight and Height for two different ranges of Age. One range is the original 0–40, 41–70, and 70+; the other is 0–50 and 51+. Here is a way to accomplish this using a single format:

proc format;
   value AgeGrp (multilabel) 0-40 = '0 to 40'
                             41-70 = '41 to 70'
                             71-high = '&gt;70'
 
                             0-50 = '0 to 50'
                             51-high = '&gt;50';
run;
 
proc means data=Health n nmiss mean;
   class Age / MLF;
   format Age AgeGrp.;
   var Weight Height;
run;

You need to add the option "multilabel" following the format name on the VALUE statement.  Without this option, you will get an error message if you have overlapping ranges. Next, use the option MLF (multi-label format) on the CLASS statement. Here is the output:

You now have the mean Weight and Height for two separate ranges of Age.

Consider using a formatted variables in CLASS statements for those procedures that allow this statement. It will save you time and reduce CPU time as well.

If you liked this blog, then you might like my latest book: Cody’s Collection of SAS Blogs. It gathers all of my best tips and tricks from my blogs in one place for easy reference.

Using formatted CLASS variables was published on SAS Users.

4月 202022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. This post builds upon CAS-Action! Create Columns in CAS Tables - Part 1 by showing how to add formats and modify the length of computed columns.

I'll start by building off the following code where I create three new columns in the PRODUCTS table:

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

Notice in the results the Total_Price column does not have a format, and the Return_fix column truncates the value Yes. Let's fix these issues.

Add Formats to a Computed Column

First, I'll add a format to the Total_Price computed column. To add formats to a computed column use the computedVars sub-parameter. ComputedVars enables you to specify column metadata (labels and formats) for computed columns. The parameter expects a list of dictionaries, even if one column is specified.

For example, in the computedVars sub-parameter I'll add list with a single dictionary that specifies the name of a column Total_Price and apply the SAS dollar format. Next, I'll execute the fetch action to preview the table.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVars = {
                            {name = 'Total_Price', format = 'dollar16.2'}
                   },
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

Notice in the results above the format was applied to the Total_Price column. However, now the additional computed columns are not displayed in the output. To fix this issue, add all of the computed columns to the computedVars sub-parameter as a dictionary with the name key, even if you do not modify the column metadata.

In the code below, the Product_fix and Return_fix columns were added to the list in the computedVars sub-parameter.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVars = {
                            {name = 'Total_Price', format = 'dollar16.2'},
                            {name = 'Product_fix'},
                            {name = 'Return_fix'}
                   },
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

Notice in the results that adding each computed column in the computedVars sub-parameter fixed the issue and the computed columns are shown.

However, we still have one more issue. Look at the truncated values in the Return_fix column. How can we fix that?

Modify Computed Column Lengths

When using an IF/THEN/ELSE statement to create a new column in SAS, it sets the column width by using the length of the first assignment statement. In this example, the first assignment statement sets the value of the Return_fix column to No, so the column length is set to 2.

To modify the length of the computed column add a SAS LENGTH statement in the computedVarsProgram parameter prior to the first assignment statement. Here I'll set the Return_fix column to length of 3.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
 
        length Return_fix varchar(3);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVars = {
                            {name = 'Total_Price', format = 'dollar16.2'},
                            {name = 'Product_fix'},
                            {name = 'Return_fix'}
                   },
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the results:

In the results the truncation issues in the Return_fix column are corrected.

Summary

In summary, using the computedVars sub parameter allows you to add formats to the computed columns, and the computedVarsProgram parameter enables you to set the column lengths. A few key points:

  • When using the computedVars sub-parameter, all computed columns from the computedVarsProgram sub-parameter must be added. Otherwise, the computed columns are not returned.
  • The computedVars sub-parameter takes a list of dictionaries as it's value.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code
Create Data Code

CAS-Action! Create Columns in CAS Tables - Part 2 was published on SAS Users.

4月 132022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's learn how to create calculated columns in CAS tables.

In this example, I will use the CAS procedure to create columns in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.

I'll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then I'll create a variable named productsTbl to reference the PRODUCTS CAS table, and use the variable as the value to the table parameter in the fetch CAS action.

proc cas;
    productsTbl = {name = 'products', caslib = 'casuser'};
 
    table.fetch / table = productsTbl;
quit;

And the response:

Create a Calculated Column

To create a calculated column you add the CAS table reference. The computedVarsProgram parameter specifies an expression for each computed variable and takes a string as it's value. The string can be:

  • an assignment statement that can include most SAS functions.
  • a SAS IF/THEN/ELSE statement
  • a SAS LENGTH or LABEL statement

I'll start by creating a simple calculated column named Total_Price that multiplies Price by Quantity. This calculation triggers when referencing the productsTbl CAS table in the table parameter of a CAS action. Here I'll use the fetch action to view 20 rows of the table.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = 'Total_Price = Price * Quantity;'};
 
    table.fetch / table = productsTbl;
quit;

And the response:

Notice in the results above, the new column Total_Price was created during the execution of the fetch action. The original CAS table was not modified.

Create Multiple Calculated Columns

Next, I'll create a second column by adding a new assignment statement in the string. The second calculated column, Product_fix renders upper case values in the original Product column. In the string you must end each assignment statement with a semicolon. Above all, a new column is required since the computedVarsProgram sub parameter cannot overwrite existing columns in the table. To update values in a CAS table column, use the table.update action.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = 'Total_Price = Price * Quantity;
                                          Product_fix = upcase(Product);'};
 
    table.fetch / table = productsTbl;
quit;

And the response:

Notice in the results above, the fetch action created the Total_Price and Product_fix columns.

Use Conditional Logic

In addition, you can also use a traditional SAS IF/THEN/ELSE statement in the computedVarsProgram sub parameter. For example, I'll create a new column named Return_fix that replaces missing values in the Return column with the value No.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = 'Total_Price = Price * Quantity;
                                          Product_fix = upcase(Product);
                                          if Return = "" then Return_fix = "No"; 
                                              else Return_fix = "Yes";'
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the response:

In the results above there are three new calculated columns: Total_Price, Product_fix and Return_fix.

Storing the Calculated Columns in a Variable

Lastly, instead of adding the calculated columns as a string, I like to store them in a variable using the CAS language SOURCE block statement. Storing the string in the the SOURCE block enables you to embed the calculated columns as a text string and avoid quotation issues that can occur. I'll name my SOURCE block variable createColumns and use it as the value in the computedVarsProgram sub parameter.

proc cas;
    source createColumns;
        Total_Price = Price * Quantity;
        Product_fix = upcase(Product);
        if Return = "" then Return_fix = "No"; 
           else Return_fix = "Yes";
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl;
quit;

And the response:

The results show the same result as the previous section.

While this is great, how can I format the new Total_Price column using the traditional SAS dollar format? Look closely at the Return_fix column. Do you see an issue with the Yes values? Well, stay tuned for part 2!

Summary

In summary, using the computedVarsProgram sub parameter allows you to easily create calculated columns in a CAS tables during the execution of CAS actions. A few key points:

  • The computedVarsProgram sub parameter takes a string as it's value and uses traditional SAS assignment statements, IF/THEN/ELSE statements, and you can also use the LABEL and LENGTH statements.
  • Using the computedVarsProgram sub parameter does not modify the original CAS table. It is simply applied during the execution of a CAS action.
  • If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
  • Using a SOURCE block enables you to embed your text in a variable without using a string.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Create Columns in CAS Tables - Part 1 was published on SAS Users.

4月 112022
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's learn how to filter rows in CAS tables.

In this example, I will use the CAS procedure to filter rows in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.

I'll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then, I'll create a variable named productsTbl to reference the PRODUCTS CAS table, and use that variable as the value to the table parameter in the numRows and fetch CAS actions.

proc cas;
    productsTbl = {name = 'products', caslib = 'casuser'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

The numRows action shows the PRODUCTS table has 80 million rows, and the fetch action shows the first 20 rows of the table.

Create a Simple Filter

To filter a CAS table you add the where sub parameter to the castable reference with the where expression as a string. The string is similar to using the WHERE statement in SAS.

Here, I'll add the where key to the productsTbl dictionary that references the CAS table, and filter for all rows where the Product column equals Hats. The where expression is only applied when an action is executed on the productsTbl variable (the CAS table reference). The underlying CAS table is not modified.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'Product = "Hats"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

After executing the numRows and fetch actions, the results show that there are 8,001,336 rows that contain the product Hats.

Use a SAS Function to Filter a CAS Table

Within the where expression you can also use most SAS Functions. Here, I'll use the UPCASE function to upper case all values in the Product column to standardize the case, then filter for rows with the value HATS. The results will be the same as the previous section since there are no underlying case issues in the CAS table. Where you would see a difference would be if the table contained mixed case versions of Hats, such as hats, HAts, hATs, etc. Without the function, the query is case sensitive and would only pick up Hats. All iterations of Hats would be picked up with using the UPCASE function.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Multiple WHERE Expressions

Just like the WHERE statement in SAS, you can specify two or more where expressions with a logical operator. Here I'll filter the CAS table for all rows where the Product column contains the value HATS, and a StoreID less than 15. The entire where expression is stored as a string.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and StoreID < 15'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice that the results show that 1,789,106 rows have the product hats with a store ID less than 15.

Create a Calculated Column as a Filter

You can create a calculated expression to use as a filter. The calculated expression is not added as a column to the table output, instead it is used to filter the CAS table during the execution of the actions. Here I'll multiple the Price by Quantity, and filter for all rows greater than 40.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and 
                                    StoreID < 15 and 
                            Price * Quantity > 40'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice in the results the calculated column is not output, it was only used to filter the rows. With all three where expressions, we now have a total of 436,728 rows.

Storing the Where Expressions in a Variable

Lastly, I want to clean up my code. Instead of adding the where expressions as a single string, I can use a SOURCE block in the CAS language. The SOURCE block enables embedding text in the program and assigning it to a given variable. It avoids any possible quoting issues and helps compartmentalize your code. Here, I'll name my SOURCE block filter and add the where expressions. Then, I'll use the filter variable as the value to the where key. The results are the same as the previous code.

proc cas;
    source filter;
        upcase(Product) = "HATS" and 
        StoreID < 15 and 
        Price * Quantity > 40;
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = filter};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Summary

In summary, using the where sub-parameter allows you to easily filter your tables for use in CAS actions. A few key points:

  • The where sub parameter takes a string as its value and is similar to using the WHERE expression in SAS.
  • Using the where sub-parameter does not modify the original CAS table. It is simply applied when executing an action on the CAS table reference.
  • Within the where expression you can use most SAS functions.
  • If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
  • Using a SOURCE block enables you to embed your where expressions in a variable without using a string.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Filtering Rows in CAS Tables was published on SAS Users.