tips & techniques

1月 172023
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, working with CAS actions and CASResults objects, and how to summarize columns. Now it's time to focus on how to get the count of unique values in a CAS table column.

Load and prepare data

First, I connected my Python client to the distributed CAS server and named my connection conn. Then I created a function to load and prepare my CAS table. The custom function loads the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory, renames the columns using the column labels and drops unnecessary columns. This simplifies the table for the demonstration.

The Samples caslib should be available in your SAS Viya environment and contains sample tables. For more information on how to rename columns in a CAS table view Part 11 - Rename Columns.

## Packages
import swat
import pandas as pd
 
## Options
pd.set_option('display.max_columns', 50)
 
## Connect to CAS
conn = ## your connection information
 
def prep_data():
    ## Load the data into CAS
    conn.loadTable(path='WARRANTY_CLAIMS_0117.sashdat', caslib='samples',
                   casout={'name':'warranty_claims', 'caslib':'casuser'})
 
    ## Reference the CAS table in an object
    castbl = conn.CASTable('warranty_claims', caslib = 'casuser')
 
    ## Store the column names and labels in a dataframe
    df_col_names = castbl.columnInfo()['ColumnInfo'].loc[:,['Column','Label']]
 
    ## Create a list of dictionaries of how to rename each column using the column labels
    renameColumns = []
    for row in df_col_names.iterrows():
        colName = row[1].values[0]
        labelName = row[1].values[1].replace(' ','_')
        renameColumns.append(dict(name=colName, rename=labelName))
 
    ## List of columns to keep in the CAS table
    keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                   'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'}
 
    ## Rename and drop columns to make the table easier to use
    castbl.alterTable(columns = renameColumns, keep = keepColumns)
 
    return castbl

Next, I'll execute the user defined function and store the CAS table object in the variable tbl  and view it's type.

tbl = prep_data()
type(tbl)
 
# and the results
NOTE: Cloud Analytic Services made the file WARRANTY_CLAIMS_0117.sashdat available as table WARRANTY_CLAIMS in caslib CASUSER(Peter).
swat.cas.table.CASTable

The results show that the WARRANTY_CLAIMS_0117.sashdat is available in the CAS server, and tbl is a CASTable object.

Lastly, I'll preview the distributed CAS table using the SWAT package head method.

tbl.head()

The results show a preview of the WARRANTY_CLAIMS CAS table. The table provides data on warranty claims for car repairs. The data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

Using the Pandas API in the SWAT package - value_counts method

I'll begin by using the Pandas API in the SWAT package which provides the value_counts method. The value_counts method works like it's Pandas counterpart. For example, I'll obtain the count of unique values in the Engine_Model CAS table column. I'll store the results of in vc, then display the type and value of vc.

vc = (tbl               ## CAS table
      .Engine_Model     ## CAS table column
      .value_counts()   ## SWAT value_counts method
     )
 
## Display the type and value
display(type(vc),vc)

The SWAT value_counts method summarizes the data in the distributed CAS server and returns a Pandas Series to the Python client. Once you have the Pandas Series on the client, you can work with it as you normally would. For example, I'll plot the Series using the Pandas plot method.

vc.plot(kind = 'bar', figsize=(8,6));

In this example, I used the Pandas API in the SWAT package to summarize data on the CAS server's massively parallel processing environment to return smaller, summarized results to the Python client. Once the summarized results are on the client, I'll work with them using other Python packages like Pandas.

Using the freq CAS action

Instead of using the Pandas API in the SWAT package you can achieve similar results using native CAS actions. In SWAT, CAS actions are simply specified as a method. One action that provides the count of unique values is the simple.freq CAS action.

For example, I can find the count of unique values for multiple columns within the freq action. Here, I'll specify the Engine_Model, Model_Year and Campaign_Type columns in the the inputs parameter. Then, I'll call the Frequency key after the action to obtain the SASDataFrame stored in the dictionary returned to the Python client. Remember, CAS actions always return a dictionary, or CASResults object, to the Python client. You must use familiar dictionary manipulation techniques to work with the results of an action. For more information on working with results of CAS actions, check out Part 2 - Working with CAS Actions and CASResults Objects.

## Columns to analyze
colNames = ['Engine_Model', 'Model_Year', 'Campaign_Type']
 
## Execute the freq CAS action and store the SASDataFrame
freq_df = tbl.freq(inputs = colNames)['Frequency']
 
## Display the type and DataFrame
display(type(freq_df), freq_df)

Again, the action processes the data in the distributed CAS server and returns results to the Python client. The results show the freq action counts the unique values of each column and stores the results in a single SASDataFrame. Once you have the SASDataFrame on the client, you can work with it like you would a Pandas DataFrame.

For example, I'll loop over each analysis column, query the SASDataFrame for the specific column name, and then plot the count of unique values of each column using the familiar Pandas package.

for column in colNames:
    (freq_df
     .query('Column == @column')
     .loc[:,['CharVar','Frequency']]
     .set_index('CharVar')
     .sort_values(by = 'Frequency', ascending=False)
     .plot(kind='bar', figsize=(8,6), title = f'The {column} Column')
    )

The loop produces a visualization of the count of unique values for each analysis column. This was all done using familiar Pandas code on the client side. Remember, the distributed CAS server did all of the processing and summarization, then returned smaller summarized results to the Python client.

Using the freqTab CAS action

Lastly, you can use the freqTab.freqTab CAS action to construct frequency and crosstabulation tables. The freqTab action provides a variety of additional features and information. The action is not loaded by default, so I'll begin by loading the action set.

conn.loadActionSet('freqTab')

Then I'll use the freqTab action in the freqTab action set to count the unique values for the Model_Year and Engine_Model columns, and also count the unique values of Engine_Model by Model_Year.

tbl.freqTab(tabulate = [
                'Model_Year',
                'Engine_Model',
                {'vars':['Engine_Model','Model_Year']}
            ]
    )

The results above show the freqTab action returns a dictionary with a variety of information. The first SASDataFrame is level information, the second SASDataFrame shows the number of observations used, and the remaining SASDataFrames show the two one-way frequency tables for Model_Year, and Engine_Model, and the crosstabulation between Engine_Model by Model_Year (also includes the totals).

With the results on the Python client, you can begin accessing and manipulating the SASDataFrames as needed.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package like value_counts, or the flexible, highly optimized CAS actions like simple.freq and freqTab.freqTab to obtain counts of unique values in the massively parallel processing CAS engine. For more examples on the freq or freqTab CAS actions, check out my CAS action four part series (part 1, part 2, part 3 and part 4). The four part series executes CAS actions using the native CAS language. However, with some small changes to the syntax you can execute the same actions using Python.

Additional and related resources

Getting started with Python integration to SAS® Viya® - Part 12 - Count of Unique Values was published on SAS Users.

12月 222022
 

The addition of the PYTHON procedure and Python editor in SAS Viya enables users to execute Python code in SAS Studio. This new capability in SAS Viya adds another tool to SAS's existing collection. With this addition I thought, how can I utilize this new found power?

In this example, I'll keep it simple. I want to create a Microsoft Excel report using a combination of SAS, Python and SQL. I'll use data that's stored in a SAS library; however, the library could be using data stored anywhere, like a path, database or in the cloud. I'll write a program that executes the following:

All code used in this post is located on GitHub, here.

Set folder path and file name

To begin, I'll create a macro variable to specify the output folder path and Microsoft Excel workbook name.

%let path=/*Enter your output folder path*/;
%let xlFileName = myExcelReport.xlsx;

Prepare data

Further, I'll prepare the data using the SAS DATA step. I'll use the available sashelp.cars table, create a new column named MPG_Avg, and drop unnecessary columns. Instead of using the DATA step you can use Python or SQL to prepare the data. Whatever tool works best for you.

data work.cars;
    set sashelp.cars;
    MPG_Avg=mean(MPG_City, MPG_Highway);
    drop Wheelbase Weight Length;
run;

Create the Microsoft Excel workbook

After the data is ready, I'll use the ODS EXCEL statement to create the Excel spreadsheet. The following ODS options are used:

  • FILE - specifies the file path and name.
  • STYLE - modifies the appearance of the SAS output
  • EMBEDDED_TITLES - specifies titles should appear in the worksheet
  • SHEET_INTERVAL - enables manual control when to create a new worksheet
ods excel file="&path./&xlFileName" 
		  style=ExcelMidnight   
		  options(embedded_titles="on");

Worksheet 1

Print the data using SAS

With the ODS EXCEL destination open I'll name the first worksheet Data, and manually specify when a new sheet is created. Next, I'll use the PRINT procedure to print the detailed data to Excel. The PRINT procedure will print the entire SAS data set with the associated formats and styles to Excel.

* Sheet 1 - Print the data using SAS *;
ods excel options(sheet_name='Data' sheet_interval='none');
title height=16pt color=white "Detailed Car Data";
proc print data=work.cars noobs;
run;

Worksheet 2

Create violin plots using Python

Next, I want to create violin plots on a new worksheet named Origin_MPG. Now, these can be created in SAS, but I personally found the matplotlib package in Python a bit easier to use . With the PYTHON procedure, I can include the Python code within the SAS program (or you can reference a .py file) to create the visualization. Then I'll use the SAS.pyplot method to save and render the visualization. Since the pyplot callback renders the image in the results tab, it exports the image to the Excel workbook by default.

First I'll use ODS EXCEL to create the new worksheet and the TITLE statement to add a title to the Excel worksheet.

ods excel options(sheet_name='Origin_MPG' sheet_interval='now');
title justify=left height=16pt color=white "Analyzing MPG by Each Car Origin";

Then I'll execute the PYTHON procedure to execute my Python code.

* Create violin plots using Python *;
proc python;
submit;
 
##
## Import packages and options
##
 
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
outpath = SAS.symget('path')
 
##
## Data prep for the visualization
##
 
## Load the SAS table as a DataFrame
df = (SAS
      .sd2df('work.cars')                 ## SAS callback method to load the SAS data set as a DataFrame
      .loc[:,['Origin','MPG_Avg']]        ## Keep the necessary columns
)
 
 
## Create a series of MPG_Avg for each distinct origin for the violin plots
listOfUniqueOrigins = df.Origin.unique().tolist()
 
mpg_by_origin = {}
for origin in listOfUniqueOrigins:
    mpg_by_origin[origin] = df.query(f'Origin == @origin ').MPG_Avg
 
 
##
## Create the violin plots
##
 
## Violin plot
fig, ax = plt.subplots(figsize = (8,6))
ax.violinplot(mpg_by_origin.values(), showmedians=True)
 
## Plot appearance
ax.set_title('Miles per Gallon (MPG) by Origin')
rename_x_axis = {'position': [1,2,3], 'labels':listOfUniqueOrigins}
ax.set_xticks(rename_x_axis['position'])
ax.set_xticklabels(rename_x_axis['labels']);
 
## Save and render the image file
SAS.pyplot(plt, filename='violinPlot',filepath=outpath)
 
endsubmit;
quit;
title;

SQL Aggregation

SQL is an extremely common and useful language for data analysts and scientists. I find using SQL for aggregation easy, so I will create a simple aggregation and add it below the visualization on the same worksheet in the the Excel report.

* SQL Aggregation *;
title justify=left "Average MPG by Car Makes";
proc sql;
select Origin, round(mean(MPG_Avg)) as AverageMPG
	from work.cars
	group by Origin
	order by AverageMPG desc;
quit;
title;

Add text

At the end of the same worksheet I'll add some simple text using the ODSTEXT procedure to give some information about the data.

proc odstext;
   heading 'NOTES';
   p 'Using the SASHELP.CARS data. The following car Origins were analyzed:';
   list ;
      item 'Asia';
      item 'Europe';
      item 'USA';
   end;    
   p 'Created by Peter S';
quit;

Close the Excel workbook

Lastly, I'll close the ODS EXCEL destination since I am done writing out to Excel.

ods excel close;

Results

That's it! Now I'll execute the entire program and view the Excel workbook.

Summary

With the capabilities of SAS and the new ability to execute Python code in SAS Studio, teams have a variety of tools in SAS Viya for their analytic needs.

Additional resources

PYTHON Procedure documentation
SAS opens its code editor interface to Python users
Using PROC PYTHON to augment your SAS programs
ODS Excel Statement

Creating a Microsoft Excel report using SAS, Python and SQL! was published on SAS Users.

10月 152022
 

In my previous blog Programmatically export a Visual Analytics report to PDF - SAS Users, I use the SAS Visual Analytics SDK to export a report to PDF, which is quite simple if we have basic knowledge with JavaScript programming. It works for both the latest version of SAS Viya and version 3.5. The new version of SAS Viya offers improvements and we have the option to export VA report to PDF -- using REST API, without need of JavaScript programming. This is what I’ll discuss in this post.

The API under Visual Analytics category in latest SAS Viya, provides the ability to export a report, or a report object, to a PDF file. It also provides the ability to create and run a job to do the exporting. Actually, we can export a report PDF, image, package, and data using the APIs. All are quite straight forward. In this article, I will show how to export a report or report object to PDF file directly, and how to create and run a job to export to a PDF.

Get all the API links of Visual Analytics

The API under Visual Analytics provides the ability to retrieve all the API links via the http ‘GET’ method. Be sure to set the "Accept" = "application/vnd.sas.api+json" in the HEADERS of PROC http. Below is my sample code snippet, I define a json library so we can view the output of PROC http visually.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME vaJason TEMP ENCODING='UTF-8';
FILENAME hdrout TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services out=vaJason headerout=hdrout
    URL = "&BASE_URI/visualAnalytics/";
    HEADERS "Accept" = "application/vnd.sas.api+json";
RUN;
LIBNAME vaJason json;

If we see the message of ‘200 OK’ returned (something like below), we know the PROC runs successfully.

Now in SAS Studio, if I go to the ‘Libraries’ tab, double click the LINKS table in the VAJASON library, all the API links of Visual Analytics are listed in the ‘href’ columns as shown below. We see the support of exporting the report PDF, image, package, and data with corresponding method and href.

Export a report or report object to PDF

Now, let me export a report to PDF directly. Below is the code snippet I used. With the FILENAME statement, the exported PDF will be saved in a physical location (I save it as rpt.pdf file in the /tmp/ folder). Be sure to set the "Accept" = "application/pdf" in the HEADERS of PROC http. In my example, I export a report with two report objects: a bar chart and a forecasting object.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile "/tmp/rpt.pdf"; 
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=rptFile headerout=hdrout 
    URL = "&BASE_URI/visualAnalytics/reports/d940126c-f917-4a13-8e1a-51b6729f50ec/pdf";
    HEADERS "Accept" = "application/pdf"
            "Accept-Language" = "*"
            "Accept-Locale" = "en-US";
RUN;

Run the code, and if we see the message of ‘200 OK’ returned, we know the export succeeded. We can go to the /tmp/ folder and check the rpt.pdf file there.

Next, let me export a report object to PDF. If you are not familiar with objects composition of a VA report, refer to my earlier post Discover Visual Analytics Report Paths with REST APIs. Different from exporting a report, I need to set the parameter ‘reportObjects’ for the exported object. With the ‘GET’ method in PROC http, I use the QUERY option to set all the parameters I want to use for the object. For example, I set some cover page text. Below is the code snippet for report object exporting.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile "/tmp/rpt.pdf"; 
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=rptFile headerout=hdrout 
    URL = "&BASE_URI/visualAnalytics/reports/d940126c-f917-4a13-8e1a-51b6729f50ec/pdf"  
    QUERY = ("reportObjects"="ve58" "includeCoverPage"=true "coverPageText"="This is cover page for a report object.");
    HEADERS "Accept" = "application/pdf"
            "Accept-Language" = "*"
            "Accept-Locale" = "en-US";
RUN;

Similarly, if we see the message of ‘200 OK’ returned, we know the export runs successfully. The following screenshots show the exported report PDF and the exported report object PDF, respectively.

Create and run a job to export a PDF

Besides exporting a report or report object directly, the API under Visual Analytics provides the ability to asynchronously execute the export job. The differences between directly export and job export are:

  • The ‘POST’ method is used for the job export action.
  • To export a report or report object by running a job, we need to apply the rendering option values in the request object, as well as options for the creation of the PDF.
  • Job export will save the export pdf file to the SAS Viya Content server folder, not a physical disk location. The PDF file can be then downloaded to local disk from SAS Studio or SAS Drive.

Below is the code snippet of creating a job to export report pdf. Be sure to set the "Accept" = "application/vnd.sas.visual.analytics.report.export.pdf.job+json", and "Content-Type" = "application/vnd.sas.visual.analytics.report.export.pdf.request+json" in the HEADERS of PROC http.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME hdrout TEMP ENCODING='UTF-8';
 
PROC HTTP METHOD="POST" oauth_bearer=sas_services headerout=hdrout 
    URL = "&BASE_URI/visualAnalytics/reports/d940126c-f917-4a13-8e1a-51b6729f50ec/exportPdf"  
     IN = '{
            "resultFolder": "/folders/folders/9d78f045-e7d9-4e82-b4aa-c7220cb85558",
            "resultFilename": "Exported PDF File.pdf",
            "nameConflict": "replace",
            "wait": 30,
            "timeout": 60,
            "options": {
                "orientation": "landscape",
                "paperSize": "A4",
                "showPageNumbers": true,
                "includeCoverPage": true,
                "coverPageText": "This is cover page for export pdf job."
                },
            "version": 1
            }'
      ;
    HEADERS "Accept" = "application/vnd.sas.visual.analytics.report.export.pdf.job+json"
            "Content-Type" = "application/vnd.sas.visual.analytics.report.export.pdf.request+json" 
            "Accept-Language" = "*"
            "Accept-Locale" = "en-US";
RUN;

If we see the message of ‘201 Created’ returned as shown below, we know the export job runs successfully.

Below screenshot shows the exported report PDF.

Finally

In summary, for the latest version of SAS Viya, the REST API under Visual Analytics category provides an easy way to export a report or a report object to a PDF file, either directly, or by a job asynchronously. We can also easily export the report object to image, the report data to CSV, TSV, XLSX, and the report resources to a package. You are encouraged to find more at Visualization – Visualization API Reference (sas.com).

Export a Visual Analytics report using REST APIs was published on SAS Users.

10月 142022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to summarize columns. Now it's time to focus on how to rename columns in CAS tables.

Load and explore data

In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and view the column information using the columnInfo CAS action. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.columnInfo()

Rename CAS table columns

To rename a column in a CAS table you will need to use the alterTable CAS action. The alterTable action can rename columns, but it can also rename tables, add labels to tables, drop or keep columns, and modify column attributes like column labels. This post will only focus on how use the rename feature of the action.

As of the SWAT package version 1.11.0 there is no Pandas equivalent to rename the column. Techniques like the rename method or by explicitly assigning a list of new column names to the CASTable object are not available.

Rename a single column

For example, what if you want to rename the Year column to Year_Value? To rename columns with the alterTable action you specify the columns parameter with a list of dictionaries. Each dictionary specifies the column to modify. In this example specify the columns parameter and add a list with a single dictionary. The dictionary requires the name key to specify the column name and the rename key specifies the new column name. Lastly, execute the columnInfo action to view the updated columns names.

tbl.alterTable(columns = [
                {'name':'Year', 'rename':'Year_Value'}
              ])
tbl.columnInfo()

 

Notice in the results the Year column has been renamed to Year_Value.

Rename multiple columns

To rename multiple columns you simply add additional dictionaries to the list. For example, if you want to rename the Month column to Month_Value, and the Day column to Day_Value, add two dictionaries to the list. Here I'll create the dictionary in the variable renameColumns then use that in the alterTable action.

## Columns to rename
renameColumns = [
    {'name': 'Month', 'rename':'Month_Value'},
    {'name': 'Day', 'rename':'Day_Value'}
]
 
## Rename the CAS table columns
tbl.alterTable(columns = renameColumns)
 
## View the column information of the CAS table
tbl.columnInfo()

In the results notice that the Month column has been renamed to Month_Value, and the Day column has been renamed to Day_Value.

Dynamically rename columns

What if you want to dynamically rename columns? For example, what if you want to remove all underscores in the column names and lowercase all characters? You could manually create a dictionary for each column to rename, but that's a lot of manual work. Instead, you can use a Python list comprehension to create a list of dictionaries with the columns to rename. I'll store the results of the list comprehension in the variable newColumnNames.

newColumnNames = [{'name':colName, 'rename':colName.replace("_","").lower()} for colName in tbl.columns]
display(newColumnNames)
# and the results
[{'name': 'Year_Value', 'rename': 'yearvalue'},
 {'name': 'Month', 'rename': 'month'},
 {'name': 'Day', 'rename': 'day'},
 {'name': 'Date', 'rename': 'date'},
 {'name': 'Serial', 'rename': 'serial'},
 {'name': 'Property', 'rename': 'property'},
 {'name': 'Address', 'rename': 'address'},
 {'name': 'City', 'rename': 'city'},
 {'name': 'Zip', 'rename': 'zip'},
 {'name': 'Lat', 'rename': 'lat'},
 {'name': 'Long', 'rename': 'long'},
 {'name': 'Property_type', 'rename': 'propertytype'},
 {'name': 'Meter_Location', 'rename': 'meterlocation'},
 {'name': 'Clli', 'rename': 'clli'},
 {'name': 'DMA', 'rename': 'dma'},
 {'name': 'Weekday', 'rename': 'weekday'},
 {'name': 'Weekend', 'rename': 'weekend'},
 {'name': 'Daily_W_C_M3', 'rename': 'dailywcm3'},
 {'name': 'Week', 'rename': 'week'},
 {'name': 'US Holiday', 'rename': 'us holiday'},
 {'name': 'CLUSTER', 'rename': 'cluster'}]

Notice it creates a list of dictionaries, one for each column to rename. The list comprehension removed all underscores if they exist in a column name, then lowercases all the characters in the column.

Now that the list of dictionaries is created, add it to the alterTable action's columns parameter.

"tbl.alterTable(columns = newColumnNames)
tbl.columnInfo()

Summary

The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. However, in this example, familiar Pandas techniques to rename columns is currently not available in the SWAT package. Instead, you can use the alterTable action to easily rename columns.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 11 - Rename Columns was published on SAS Users.

10月 062022
 

Group and aggregate CAS tables

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to summarize columns. Now it's time to focus on how to group and aggregate CAS tables.

Load and explore data

First, let's load some data. In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and preview 5 rows using the SWAT head method. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.head()

 

The water_cluster CAS table contains daily water usage for a variety properties. The Daily_W_C_M3 column displays the water used per location in cubic meters (m3), the Serial column identifies each property, and the Weekend column identifies if the reading occurred on the weekend or not.

Using the SWAT groupby method

The SWAT package contains the groupby method. It is defined to match the pandas.DataFrame.groupby() method. For example, I'll specify the CAS table object tbl, then the groupby method and specify the Serial column, and then enclose it with the type function to view the object.

type(tbl.groupby('Serial')) 
# and the results
swat.cas.table.CASTableGroupBy

Notice the results return a CASTableGroupBy object. This works similarly to the pandas DataFrameGroupBy object. This enables you to add aggregation methods to aggregate by groups. For example, what if you want to see the total water consumption for each Serial value?

First, specify the CASTable object tbl, followed by the groupby method to aggregate by each Serial group. Then specify the column to aggregate, Daily_W_C_M3, and then add the sum aggregation method. Here the results are stored in the variable df_serial and displayed.

df_serial = (tbl                    ## CAS table reference
             .groupby('Serial')     ## Group the CAS table
             .Daily_W_C_M3          ## Specify the CAS table column to aggregate
             .sum()                 ## Specify the aggregation
)
 
display(df_serial)

partial results

The code executes in the distributed CAS server and returns a Series object to the client. Once you have a Series on the client you can work with it as your normally would in Pandas.

Let's answer another question, is water consumption higher on the weekend or weekday? Using the same technique from above, let's view the mean water consumption on weekends and weekdays. I'll begin by specifying the CASTable object followed by the groupby method to group the Weekend column. The Weekend column indicates if it's a weekend, 1, or a weekday, 0. Then I'll specify the CAS column Daily_W_C_M3 and the mean method.  Lastly, after the CAS server processes the data in parallel it returns a Series to the client. On the client-side Series object I'll chain the rename method to rename the values 0 and 1 to Weekday and Weekend respectively.

(tbl                      ## CAS table reference          
 .groupby('Weekend')      ## Group the CAS table
 .Daily_W_C_M3            ## Specify the CAS table column to aggregate
 .mean()                  ## Specify the aggregation
 .rename({0:'Weekday',    ## Rename the values in the Series object returned from the CAS server on the client
          1:'Weekend'})
)

The results show that water consumption seems to be higher on weekends.

Using the CASTable groupby parameter

You can also achieve the same results using the CASTable object with the groupby parameter. For example, what if I wanted to answer the same question as the previous example about water consumption on weekends vs weekdays?

First, let's display the parameters of the tbl object using the params attribute.

tbl.params
# and the results
{'name': 'water_cluster', 'caslib': 'casuser'}

The results show the CASTable object has the name and caslib parameters. The name and caslib parameters simply reference the CAS table.

Next, add the groupby parameter to the tbl CASTable object with the column or columns to group by. Here I'll specify the Weekend column.

tbl.groupby = 'Weekend'
# and the results
CASTable('water_cluster', caslib='casuser', groupby='Weekend')

Notice that the CASTable object now contains the groupby parameter with the value Weekend. I'll confirm that groupby parameter was permanently added to the CASTable object by checking the parameters of the object again.

tbl.params
# and the results
{'name': 'water_cluster', 'caslib': 'casuser', 'groupby': 'Weekend'}

The results show the groupby parameter was added to the CASTable object. Now I can simply use the CASTable object to aggregate by the group. Specify the CASTable object followed by the CAS table column to aggregate. Then add the mean aggregation method to aggregate the data by groups on the CAS server. Lastly, the rename method will rename the Series object returned by the CAS server on the client.

(tbl                      ## CAS table reference with the groupby parameter
 .Daily_W_C_M3            ## Specify the CAS table column to aggregate
 .mean()                  ## Specify the aggregation
 .rename({0:'Weekday',    ## Rename the values in the Series object returned from the CAS server on the client
          1:'Weekend'})
)


Notice the results are the same.

Summary

The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. This enables you to utilize the massively parallel processing power of the CAS server in SAS Viya using Python. Here we learned about using the familiar Pandas groupby method in the SWAT package and the CAS table groupby parameter to aggregate CAS tables by groups.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 10 - Group and Aggregate CAS Tables was published on SAS Users.

9月 142022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on how to summarize columns.

Load and explore data

Let's first load some data. In SAS Viya we can use the Samples caslib for demonstration data. I'll load the data into the distributed CAS server using my CAS connection object conn, followed by the loadTable action. The loadTable action loads a server-side file into memory. Then I'll reference the CAS table in the tbl object and preview 5 rows using the SWAT head method. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

conn.loadTable(path = 'WATER_CLUSTER.sashdat', caslib = 'samples',
                            casOut = dict(caslib = 'casuser'))
 
tbl = conn.CASTable('water_cluster', caslib='casuser')
 
tbl.head()

The water_cluster CAS table contains daily water usage for a variety properties. The Daily_W_C_M3 column displays the water used per location in cubic meters (m3) .

Simple summarizations

Next, let's execute methods for simple summarizations on CAS table columns. If you've used Pandas before these techniques will be very familiar.

I'll start by getting the total water consumption for all the data using the sum method in the SWAT package on the CAS table column Daily_W_C_M3.

(tbl
 .Daily_W_C_M3
 .sum()
)
# and the results
401407.88800000015

Next, I'll view the maximum and minimum water consumption value in the data using the max and min methods in the SWAT package.

(tbl
 .Daily_W_C_M3
 .max()
)
# and the results
11910.0
 
(tbl
 .Daily_W_C_M3
 .min()
)
# and the results
0.0

Lastly, I'll find the average water consumption value in the CAS table using the mean method.

(tbl
 .Daily_W_C_M3
 .mean()
)
# and the results
8.591778424657537

Instead of manually specifying each method to view summary information, I could of used the summary CAS action or the describe method from the SWAT package. For more information check out my previous post Descriptive Statistics.

Find the property with the max water consumption

What if I want to find the property or properties with the max water consumption? One technique is to find the max water consumption value, then filter the CAS table for the row (or rows) with that value. Here I will store the max water consumption value in the variable maxWaterConsumption. Then I'll use the variable in the query method to filter the CAS table. For more information on filtering CAS tables, check out my previous blog post Filtering CAS Tables.

## Store the max water consumption value
maxWaterConsumption = (tbl
                      .Daily_W_C_M3
                      .max()
)
 
## Filter the CAS table for the property with the max water usage
(tbl
 .query(f"Daily_W_C_M3 = {maxWaterConsumption }")
 .head()
)

The results show that the property address 1660 S DAIRY ASHFORD ST had the largest daily water consumption rate.

Find the top 10 daily water consumption values and properties

Lastly, to find the top 10 daily water consumption properties and values I can use the nlargest method in the SWAT package. This method is similar to the Pandas nlargest method.  One difference is the SWAT nlargest method does not implement the keep parameter.

I'll specify the SWAT nlargest method and store the results in the variable df_top10. Then I'll display the results.

df_top10 = tbl.nlargest(10, 'Daily_W_C_M3')
display(df_top10)

The CAS server will execute the request and return a SASDataFrame to the client. The SASDataFrame shows that a single property contains the top 10 largest water consumption values.

Remember, a SASDataFrame is a subclass of a Pandas DataFrame. As a result, you can work with them as you normally would a Pandas DataFrame. To learn more about working with CAS objects check out my previous post Working with CAS Actions and CASResults Objects.

Summary

The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques to work with your distributed CAS tables. This enables you to utilize the massively parallel processing power of the CAS server in SAS Viya using Python. Here we learned about using familiar Pandas methods in the SWAT package like sum, min, mean, max and nlargest.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 9 - Summarize Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 072022
 

Multi-tab address bookIn this post, we expand on my previous post Automating Excel workbooks creation using SAS and demonstrate how to automatically (programmatically) split a data table into separate worksheets (tabs) of a single Microsoft Excel workbook.

While there are multiple ways of splitting data tables, for the purpose of this post we are going to split up a data table based on the value of one of its categorical variables (columns).

The approach described here is applicable to both SAS data tables and non-SAS data tables. In either case, you don’t even have to explicitly create interim data table subsets – each data table gets split directly into multiple Excel sheets.

In addition, there are no manual steps, the whole process is data-driven and is suitable for automatic execution either on its own schedule or as part of another automated process.

SAS macro solution for splitting a dataset into Excel worksheets

Let’s consider the following task. We have a data table SASHELP.FISH that has 159 observations and 7 variables. One of the variables, SPECIES, has few unique values (such as ‘Perch’, ‘Whitefish, ‘Pike’, ‘Smelt’, etc.) We want to split this dataset by the unique SPECIES values so all the observations pertaining to each species are presented in separate worksheets of a single Excel workbook. These worksheets will be accessible by clicking on the corresponding tabs of the Excel workbook.

The following code demonstrates how this can be done using SAS macro with a macro loop.

/* Split data table into multiple sheets of an Excel workbook */
 
/* Create a list of and count unique SPECIES  */
proc sql noprint;
   select distinct SPECIES, count(distinct SPECIES)  
      into :sp_list separated by ' ', :nsp   
   from SASHELP.FISH;
quit;
 
/* Create TABS color list */
%let color_list = #feb8b6 #c4feb6 #fefdb6 #feb6fa #b6d5fe #f0b6fe #fee6b6;
 
/* Initiate ODS EXCEL destination */
filename target 'C:\Projects\SAS_to_Excel\Fish.xlsx';
ods excel file=target options (frozen_headers='on' embedded_titles='on');
 
/* Macro to loop through the list of unique TABS values */
%macro split_data_to_excel (intable=, tabvar=);
   %do i=1 %to &nsp;
      %let sp = %scan(&sp_list, &i, %str( ));
      %let tc = %scan(&color_list, &i, %str( ));
      ods excel options (sheet_name="&sp" tab_color="&tc");
      title "&intable where &tabvar=&sp";
      proc print data=&intable noobs; 
         where &tabvar="&sp";
      run;
   %end;
%mend split_data_to_excel;
 
%split_data_to_excel (intable=SASHELP.FISH, tabvar=SPECIES)
ods excel close;

Here are the code highlights:

  • The first step, PROC SQL, creates two macro variables - sp_list containing a list of unique (distinct) values of variable SPECIES, and nsp containing the number (count) of the unique values for SPECIES.
  • %let color_list assigns a list of colors (hexadecimal values) that we use to color-code our tabs.
  • Then we specify the output Excel workbook file and open ODS EXCEL destination. Here we specify options (frozen_headers='on' embedded_titles='on') which are going to apply to all the tabs (sheets) in our Excel workbook.
  • Then we define macro split_data_to_excel that accepts two parameters:
    • intable – input table name;
    • tabvar – tab variable whose unique values are used for subsetting the input data.
  • Within this macro we have a macro loop with macro variable index i that iterates from 1 to &nsp (number of unique &tabvar values).
  • Within the macro loop, we scan &sp_list and &color_list for &i-th word and use that word in the following statement:
    ods excel options (sheet_name="&sp" tab_color="&tc");
    With each iteration of the macro %DO-loop, this statement creates a new sheet, gives it a name and assigns a tab color.
  • The following TITLE and PROC PRINT will be generated for each %DO iteration and produce a subset of the input data table on the corresponding sheet.
  • After the macro definition, we invoke this macro as
    %split_data_to_excel (intable=SASHELP.FISH, tabvar=SPECIES)
    and close ODS Excel.

Running this code will produce the following Excel workbook:

Multi-tab Excel workbook with colored tabs
As you can see, each SPECIES is presented in its own worksheet (tab) and each tab is color-coded and properly labeled.

Data step solution for splitting a dataset into Excel worksheets

Besides the above macro solution, there is an alternative solution using a single SAS data step with CALL EXECUTE to dynamically generate SAS code and push it outside of the data step boundaries for execution. The following data step solution replaces the macro definition and macro invocation in the above macro solution:

data _null_;
   intable = 'SASHELP.FISH';
   tabvar = 'SPECIES';
   do i=1 to &nsp;
      sp = scan("&sp_list",i,' ');
      tc = scan("&color_list",i,' ');
      call execute('ods excel options (sheet_name="'||trim(sp)||'" tab_color="'||trim(tc)||'");');
      call execute('title "'||intable||' where '||tabvar||'='||trim(sp)||'";');
      call execute('proc print data='||intable||' noobs;');
      call execute('   where '||tabvar||'="'||trim(sp)||'";');
      call execute('run;');
   end;
run;

For each unique SPECIES value, CALL EXECUTE will generate SAS code and push it outside of the data step in a queue where SAS compiles and executes it after the data step in the order it was generated. For each do-loop iteration, you will see the following corresponding snippets (marked with + sign) of the generated code in the SAS log:

NOTE: CALL EXECUTE generated line.
1         + ods excel options (sheet_name="Bream" tab_color="#feb8b6");
2         + title "SASHELP.FISH where SPECIES=Bream";
3         + proc print data=SASHELP.FISH noobs;
4         +    where SPECIES="Bream";
5         + run;
 
NOTE: There were 35 observations read from the data set SASHELP.FISH.
      WHERE SPECIES='Bream';

The produced output will be identical to the above macro solution.

Questions

Which solution is more appealing to you? Do you have questions, comments, suggestions, ideas, other solutions, tips or tricks about splitting a data table into multiple Excel worksheets? Please share with us in the Comments section below.

Additional Resources

Splitting a data table into multiple sheets of an Excel workbook was published on SAS Users.

6月 162022
 

MS Excel logo No matter which powerful analytical tools data professionals use for their data processing, MS Excel remains the output of choice for many users and whole industries.

In banking and finance, for example, I have seen many SAS users create quite sophisticated data queries and data analysis projects in SAS Enterprise Guide. Yet, at the end, when the final datasets have been produced and validated, comes the manual part when users export those tables into Excel, and then combine and rearrange them by copying-pasting into a desired workbook for storing and distributing.

However, this heavily manual process can be not just fully automated, but also enhanced compared with the point-and-click “export to Excel” and “copy-paste” interactive process. Here is how.

Creating a single simple Excel sheet

Suppose, we want to convert SASHELP.CLASS data table to Excel. Here is a bare-bone solution using SAS Output Delivery System:

ods excel file='C:\Projects\SAS_to_Excel\Single_sheet.xlsx';
 
proc print data=SASHELP.CLASS noobs;
run;
 
ods excel close;

This code is pretty much self-explanatory. It will produce Single_sheet.xlsx Excel workbook file in the folder C:\Projects\SAS_to_Excel. When opened in Excel, it will look as follows:

If you browse through the ODS EXCEL documentation you will find a variety of options that allow you to customize Excel output. Let’s get a little fancy by utilizing some of them.

Creating several customized sheets in Excel workbook

The following code example creates two sheets (tabs) in a single workbook. In addition, it demonstrates some other features to enhance data visualization.

/* -------------------------------------------- */
/* Two sheets workbook with enhanced appearance */
/* -------------------------------------------- */
 
/* Formats for background & foreground coloring */
proc format;
   value hbg 
      50 <- 60   = #66FF99
      60 <- 70   = #FFFF99
      70 <- high = #FF6666
      ;
   value hfg
      low -< 50, 70 <- high = white;
run;
 
/* Define custom font style for ODS TEXT */
proc template;
   define style styles.MyStyle;
   parent=styles.htmlblue;
   style usertext from usertext /
      foreground  = #FF33CC
      font_weight = bold
      ;
   end;
run;
 
/* ODS Excel output file destination */
ods excel file = 'C:\Projects\SAS_to_Excel\Two_sheets_fancy.xlsx';
 
   /* Excel options for 1st sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CLASS'
        frozen_headers  = 'on'
        embedded_titles = 'on' )
      style = styles.MyStyle;
 
   title justify=left color='#4D7EBF' 'This is TITLE for SASHELP.CLASS';
 
   ods text='This is TEXT for SASHELP.CLASS';
 
   proc print data=SASHELP.CLASS noobs;
      var NAME;
      var SEX AGE / style = {just=C};
      var HEIGHT  / style = {background=hbg. foreground=hfg.};
      var WEIGHT;
   run;
 
   /* Excel options for 2nd sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CARS'
        frozen_headers  = 'on'
        embedded_titles = 'on' );
 
   title 'This is TITLE for SASHELP.CARS';
 
   proc print data=SASHELP.CARS noobs;
   run;
 
ods excel close;

Here are the code highlights:

  • PROC FORMAT creates two formats, HBG and HFG, which are used in the first PROC PRINT to illustrate cell text and background coloring.

    NOTE: In SAS, colors are specified as hexadecimal RGB values (cxRRGGBB). However, I found (although it seems undocumented) that in PROC FORMAT and PROC TEMPLATE these colors can be written as quoted (double or single) or unquoted values, as well as prefixed with either ‘#’ or ‘cx’ (for hexadecimal). For example, the following are all valid values: #FFAADD, '#FFAADD', cxFFAADD, 'cxFFAADD'.
  • PROC TEMPLATE defines usertext custom font style for ODS TEXT, which used in ODS EXCEL along with and as alternative to the TITLE statement.
  • ODS EXCEL FILE=file-specification statement opens the EXCEL destination and specifies the output file for ODS EXCEL. The output file-specification can be either a physical file name (must be in quotes) or a fileref (without quotes) assigned with FILENAME statement. It can point to a location on the machine where SAS is run (SAS server), or a network drive accessible from the SAS server.

    This statement follows by two separate ODS EXCEL OPTIONS statements – one per corresponding sheet (tab).
  • ODS EXCEL OPTIONS statement specifies destination-specific suboptions with space-delimited name='value' pairs. In particular, we use the following options:

    • SHEET_NAME= specifies the name for the next worksheet (worksheet names can be up to 28 characters long).
    • FROZEN_HEADERS='ON' specifies that headers are not scrolled when table is vertically scrolled (default is OFF). It is very convenient feature that keeps the title(s) and column names in view while user scrolls through the table rows.
    • EMBEDDED_TITLES='ON' specifies whether titles should appear in the worksheet (default is OFF).

    There is a variety of other useful ODS EXCEL options allowing further customization of your Excel workbook appearance and functionality.

  • TITLE statement is highly customizable with Output Delivery System as shown in this custom TITLE with ODS example.
  • ODS TEXT= statement inserts text into your ODS output. Unlike TITLE for which ODS EXCEL merges several cells, ODS TEXT places its text in a single cell (see screenshot below). The UserText style element that we modified using PROC TEMPLATE controls the font style, font color, and other attributes of the text that the ODS TEXT= statement produces.
  • In PROC PRINT, we use multiple VAR statements to select variables, determine their order and apply styles (text and background colors) to the printed values. There are much more ODS styles with PROC PRINT available for further customizations.
  • The next section of the code ( /* Excel options for 2nd sheet (tab) */ ) creates the second sheet in the same Excel workbook. Similarly, you can create as many sheets/tabs as you wish.
  • The last statement ods excel close; closes the ODS Excel destination so nothing more is written to the output file.

The following are the two screenshots illustrating the two sheets in the produced Excel workbook:

SAS creates Excel workbook with several sheets/tabs

Questions

Do you find this post useful? Do you have questions, comments, suggestions, other tips or tricks about creating MS Excel workbooks in SAS? Please share with us below.

Additional Resources

TUNE IN NOW | LEARN HOW TO READ AND WRITE EXCEL FILES WITH SAS

Automating Excel workbooks creation using SAS was published on SAS Users.