tips & techniques

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 222022
 

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns was published on SAS Users.

7月 072022
 

Multi-tab address bookIn this post, we expand on my previous post Automating Excel workbooks creation using SAS and demonstrate how to automatically (programmatically) split a data table into separate worksheets (tabs) of a single Microsoft Excel workbook.

While there are multiple ways of splitting data tables, for the purpose of this post we are going to split up a data table based on the value of one of its categorical variables (columns).

The approach described here is applicable to both SAS data tables and non-SAS data tables. In either case, you don’t even have to explicitly create interim data table subsets – each data table gets split directly into multiple Excel sheets.

In addition, there are no manual steps, the whole process is data-driven and is suitable for automatic execution either on its own schedule or as part of another automated process.

SAS macro solution for splitting a dataset into Excel worksheets

Let’s consider the following task. We have a data table SASHELP.FISH that has 159 observations and 7 variables. One of the variables, SPECIES, has few unique values (such as ‘Perch’, ‘Whitefish, ‘Pike’, ‘Smelt’, etc.) We want to split this dataset by the unique SPECIES values so all the observations pertaining to each species are presented in separate worksheets of a single Excel workbook. These worksheets will be accessible by clicking on the corresponding tabs of the Excel workbook.

The following code demonstrates how this can be done using SAS macro with a macro loop.

/* Split data table into multiple sheets of an Excel workbook */
 
/* Create a list of and count unique SPECIES  */
proc sql noprint;
   select distinct SPECIES, count(distinct SPECIES)  
      into :sp_list separated by ' ', :nsp   
   from SASHELP.FISH;
quit;
 
/* Create TABS color list */
%let color_list = #feb8b6 #c4feb6 #fefdb6 #feb6fa #b6d5fe #f0b6fe #fee6b6;
 
/* Initiate ODS EXCEL destination */
filename target 'C:\Projects\SAS_to_Excel\Fish.xlsx';
ods excel file=target options (frozen_headers='on' embedded_titles='on');
 
/* Macro to loop through the list of unique TABS values */
%macro split_data_to_excel (intable=, tabvar=);
   %do i=1 %to &nsp;
      %let sp = %scan(&sp_list, &i, %str( ));
      %let tc = %scan(&color_list, &i, %str( ));
      ods excel options (sheet_name="&sp" tab_color="&tc");
      title "&intable where &tabvar=&sp";
      proc print data=&intable noobs; 
         where &tabvar="&sp";
      run;
   %end;
%mend split_data_to_excel;
 
%split_data_to_excel (intable=SASHELP.FISH, tabvar=SPECIES)
ods excel close;

Here are the code highlights:

  • The first step, PROC SQL, creates two macro variables - sp_list containing a list of unique (distinct) values of variable SPECIES, and nsp containing the number (count) of the unique values for SPECIES.
  • %let color_list assigns a list of colors (hexadecimal values) that we use to color-code our tabs.
  • Then we specify the output Excel workbook file and open ODS EXCEL destination. Here we specify options (frozen_headers='on' embedded_titles='on') which are going to apply to all the tabs (sheets) in our Excel workbook.
  • Then we define macro split_data_to_excel that accepts two parameters:
    • intable – input table name;
    • tabvar – tab variable whose unique values are used for subsetting the input data.
  • Within this macro we have a macro loop with macro variable index i that iterates from 1 to &nsp (number of unique &tabvar values).
  • Within the macro loop, we scan &sp_list and &color_list for &i-th word and use that word in the following statement:
    ods excel options (sheet_name="&sp" tab_color="&tc");
    With each iteration of the macro %DO-loop, this statement creates a new sheet, gives it a name and assigns a tab color.
  • The following TITLE and PROC PRINT will be generated for each %DO iteration and produce a subset of the input data table on the corresponding sheet.
  • After the macro definition, we invoke this macro as
    %split_data_to_excel (intable=SASHELP.FISH, tabvar=SPECIES)
    and close ODS Excel.

Running this code will produce the following Excel workbook:

Multi-tab Excel workbook with colored tabs
As you can see, each SPECIES is presented in its own worksheet (tab) and each tab is color-coded and properly labeled.

Data step solution for splitting a dataset into Excel worksheets

Besides the above macro solution, there is an alternative solution using a single SAS data step with CALL EXECUTE to dynamically generate SAS code and push it outside of the data step boundaries for execution. The following data step solution replaces the macro definition and macro invocation in the above macro solution:

data _null_;
   intable = 'SASHELP.FISH';
   tabvar = 'SPECIES';
   do i=1 to &nsp;
      sp = scan("&sp_list",i,' ');
      tc = scan("&color_list",i,' ');
      call execute('ods excel options (sheet_name="'||trim(sp)||'" tab_color="'||trim(tc)||'");');
      call execute('title "'||intable||' where '||tabvar||'='||trim(sp)||'";');
      call execute('proc print data='||intable||' noobs;');
      call execute('   where '||tabvar||'="'||trim(sp)||'";');
      call execute('run;');
   end;
run;

For each unique SPECIES value, CALL EXECUTE will generate SAS code and push it outside of the data step in a queue where SAS compiles and executes it after the data step in the order it was generated. For each do-loop iteration, you will see the following corresponding snippets (marked with + sign) of the generated code in the SAS log:

NOTE: CALL EXECUTE generated line.
1         + ods excel options (sheet_name="Bream" tab_color="#feb8b6");
2         + title "SASHELP.FISH where SPECIES=Bream";
3         + proc print data=SASHELP.FISH noobs;
4         +    where SPECIES="Bream";
5         + run;
 
NOTE: There were 35 observations read from the data set SASHELP.FISH.
      WHERE SPECIES='Bream';

The produced output will be identical to the above macro solution.

Questions

Which solution is more appealing to you? Do you have questions, comments, suggestions, ideas, other solutions, tips or tricks about splitting a data table into multiple Excel worksheets? Please share with us in the Comments section below.

Additional Resources

Splitting a data table into multiple sheets of an Excel workbook was published on SAS Users.

6月 162022
 

MS Excel logo No matter which powerful analytical tools data professionals use for their data processing, MS Excel remains the output of choice for many users and whole industries.

In banking and finance, for example, I have seen many SAS users create quite sophisticated data queries and data analysis projects in SAS Enterprise Guide. Yet, at the end, when the final datasets have been produced and validated, comes the manual part when users export those tables into Excel, and then combine and rearrange them by copying-pasting into a desired workbook for storing and distributing.

However, this heavily manual process can be not just fully automated, but also enhanced compared with the point-and-click “export to Excel” and “copy-paste” interactive process. Here is how.

Creating a single simple Excel sheet

Suppose, we want to convert SASHELP.CLASS data table to Excel. Here is a bare-bone solution using SAS Output Delivery System:

ods excel file='C:\Projects\SAS_to_Excel\Single_sheet.xlsx';
 
proc print data=SASHELP.CLASS noobs;
run;
 
ods excel close;

This code is pretty much self-explanatory. It will produce Single_sheet.xlsx Excel workbook file in the folder C:\Projects\SAS_to_Excel. When opened in Excel, it will look as follows:

If you browse through the ODS EXCEL documentation you will find a variety of options that allow you to customize Excel output. Let’s get a little fancy by utilizing some of them.

Creating several customized sheets in Excel workbook

The following code example creates two sheets (tabs) in a single workbook. In addition, it demonstrates some other features to enhance data visualization.

/* -------------------------------------------- */
/* Two sheets workbook with enhanced appearance */
/* -------------------------------------------- */
 
/* Formats for background & foreground coloring */
proc format;
   value hbg 
      50 <- 60   = #66FF99
      60 <- 70   = #FFFF99
      70 <- high = #FF6666
      ;
   value hfg
      low -< 50, 70 <- high = white;
run;
 
/* Define custom font style for ODS TEXT */
proc template;
   define style styles.MyStyle;
   parent=styles.htmlblue;
   style usertext from usertext /
      foreground  = #FF33CC
      font_weight = bold
      ;
   end;
run;
 
/* ODS Excel output file destination */
ods excel file = 'C:\Projects\SAS_to_Excel\Two_sheets_fancy.xlsx';
 
   /* Excel options for 1st sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CLASS'
        frozen_headers  = 'on'
        embedded_titles = 'on' )
      style = styles.MyStyle;
 
   title justify=left color='#4D7EBF' 'This is TITLE for SASHELP.CLASS';
 
   ods text='This is TEXT for SASHELP.CLASS';
 
   proc print data=SASHELP.CLASS noobs;
      var NAME;
      var SEX AGE / style = {just=C};
      var HEIGHT  / style = {background=hbg. foreground=hfg.};
      var WEIGHT;
   run;
 
   /* Excel options for 2nd sheet (tab) */
   ods excel options
      ( sheet_name      = 'SASHELP.CARS'
        frozen_headers  = 'on'
        embedded_titles = 'on' );
 
   title 'This is TITLE for SASHELP.CARS';
 
   proc print data=SASHELP.CARS noobs;
   run;
 
ods excel close;

Here are the code highlights:

  • PROC FORMAT creates two formats, HBG and HFG, which are used in the first PROC PRINT to illustrate cell text and background coloring.

    NOTE: In SAS, colors are specified as hexadecimal RGB values (cxRRGGBB). However, I found (although it seems undocumented) that in PROC FORMAT and PROC TEMPLATE these colors can be written as quoted (double or single) or unquoted values, as well as prefixed with either ‘#’ or ‘cx’ (for hexadecimal). For example, the following are all valid values: #FFAADD, '#FFAADD', cxFFAADD, 'cxFFAADD'.
  • PROC TEMPLATE defines usertext custom font style for ODS TEXT, which used in ODS EXCEL along with and as alternative to the TITLE statement.
  • ODS EXCEL FILE=file-specification statement opens the EXCEL destination and specifies the output file for ODS EXCEL. The output file-specification can be either a physical file name (must be in quotes) or a fileref (without quotes) assigned with FILENAME statement. It can point to a location on the machine where SAS is run (SAS server), or a network drive accessible from the SAS server.

    This statement follows by two separate ODS EXCEL OPTIONS statements – one per corresponding sheet (tab).
  • ODS EXCEL OPTIONS statement specifies destination-specific suboptions with space-delimited name='value' pairs. In particular, we use the following options:

    • SHEET_NAME= specifies the name for the next worksheet (worksheet names can be up to 28 characters long).
    • FROZEN_HEADERS='ON' specifies that headers are not scrolled when table is vertically scrolled (default is OFF). It is very convenient feature that keeps the title(s) and column names in view while user scrolls through the table rows.
    • EMBEDDED_TITLES='ON' specifies whether titles should appear in the worksheet (default is OFF).

    There is a variety of other useful ODS EXCEL options allowing further customization of your Excel workbook appearance and functionality.

  • TITLE statement is highly customizable with Output Delivery System as shown in this custom TITLE with ODS example.
  • ODS TEXT= statement inserts text into your ODS output. Unlike TITLE for which ODS EXCEL merges several cells, ODS TEXT places its text in a single cell (see screenshot below). The UserText style element that we modified using PROC TEMPLATE controls the font style, font color, and other attributes of the text that the ODS TEXT= statement produces.
  • In PROC PRINT, we use multiple VAR statements to select variables, determine their order and apply styles (text and background colors) to the printed values. There are much more ODS styles with PROC PRINT available for further customizations.
  • The next section of the code ( /* Excel options for 2nd sheet (tab) */ ) creates the second sheet in the same Excel workbook. Similarly, you can create as many sheets/tabs as you wish.
  • The last statement ods excel close; closes the ODS Excel destination so nothing more is written to the output file.

The following are the two screenshots illustrating the two sheets in the produced Excel workbook:

SAS creates Excel workbook with several sheets/tabs

Questions

Do you find this post useful? Do you have questions, comments, suggestions, other tips or tricks about creating MS Excel workbooks in SAS? Please share with us below.

Additional Resources

TUNE IN NOW | LEARN HOW TO READ AND WRITE EXCEL FILES WITH SAS

Automating Excel workbooks creation using SAS was published on SAS Users.

2月 112022
 

Ever had to work with data that had crazy column names, custom formats you didn’t have access to, or text columns sized way too big for the actual data they contained? Annoying, isn’t it? Well, you’re not alone and, as it turns out, it’s not that hard to fix. Let’s take a peek at renaming, reformatting, relabeling and resizing columns in your data using PROC SQL and PROC DATASETS. PROC SQL could handle most of these tasks, but you can’t rename a column in SQL without creating a new copy of the table. PROC DATASETS can do it all, but the syntax can be trickier. When we resize a column, we're going to have to re-write the data anyway, so I'll use PROC SQL for resizing, and PROC DATASETS for everything else. For example, we have this existing table:

Year Make Model MSRP Invoice Engine Size (L)
2004 Acura MDX $36,945 $33,337 3.5
2004 Audi A4 1.8T 4dr $25,940 $23,508 1.8
2004 Buick Rainier $37,895 $34,357 4.2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP num 8 DOLLAR8.
Invoice num 8 DOLLAR8.
EngineSize Engine Size (L) num 8

 
I want to modify the table to use European formats for the numeric values, to rename the EngineSize column as Size leaving its label unchanged, and to add descriptive labels for MSRP and Invoice.
 
The general syntax for this in PROC DATASETS is:
 
PROC DATASETS LIBRARY=;
MODIFY sas-dataset <(options)>;
FORMAT variable-1 <format-1> <variable-2 <format-2> ...>;
LABEL variable-1=<'label-1' | ' '> <variable-2=<'label-2' | ' '> ...>;
RENAME variable-1=new-name-1 <variable-2=new-name-2 ...>;

After running this code:

proc datasets library=work nolist nodetails;
   modify cars;
      format MSRP eurox8. Invoice eurox8. EngineSize commax5.1;
      label MSRP="Sticker Price" Invoice="Wholesale Price" ;
      rename EngineSize=Size;
run; quit;

 
the data now looks like this:
 

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
In the report, the longest value in the Make column is 5 characters long but the table metadata shows a column width of 13 characters. I can shorten the Make column to 5 characters without truncating the values, making my table size smaller on disk and in memory.

PROC SQL with an ALTER TABLE statement and a MODIFY clause will change a table’s descriptor portion (metadata) for one or more columns. Changing the length of a column will automatically rewrite the data portion of the table, too. The syntax looks like this:

ALTER TABLE table-name MODIFY column-definition-1 <, column-definition-2, ..., column-definition-n>;

After running this code:

proc sql;
alter table work.cars
   modify Make char(5)
;
quit;

The data is still correct:

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 
and the column size is now smaller:

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 5
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
Now, unless you want to write this type of code by hand every time you need it, you’ll want macros to do the work for you. And macros we’ve got! Check out my SAS tutorial on YouTube titled “Resize, Rename and Reformat Data with SAS Macro” for more details on how this works. The video is also embedded below. Links below the embedded video lead to the data and programs used in the video and to my personal macro trove on GitHub, including a macro that quickly strips labels, formats and informats from a data set.

Until next time, may the SAS be with you!
 
Mark

 

DOWNLOAD NOW | VIDEO DATA PACKAGE ON GITHUB
GET MACROS NOW | MY MACRO STASH ON GITHUB

Jedi SAS tricks: resizing, renaming and reformatting your data was published on SAS Users.

2月 112022
 

Ever had to work with data that had crazy column names, custom formats you didn’t have access to, or text columns sized way too big for the actual data they contained? Annoying, isn’t it? Well, you’re not alone and, as it turns out, it’s not that hard to fix. Let’s take a peek at renaming, reformatting, relabeling and resizing columns in your data using PROC SQL and PROC DATASETS. PROC SQL could handle most of these tasks, but you can’t rename a column in SQL without creating a new copy of the table. PROC DATASETS can do it all, but the syntax can be trickier. When we resize a column, we're going to have to re-write the data anyway, so I'll use PROC SQL for resizing, and PROC DATASETS for everything else. For example, we have this existing table:

Year Make Model MSRP Invoice Engine Size (L)
2004 Acura MDX $36,945 $33,337 3.5
2004 Audi A4 1.8T 4dr $25,940 $23,508 1.8
2004 Buick Rainier $37,895 $34,357 4.2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP num 8 DOLLAR8.
Invoice num 8 DOLLAR8.
EngineSize Engine Size (L) num 8

 
I want to modify the table to use European formats for the numeric values, to rename the EngineSize column as Size leaving its label unchanged, and to add descriptive labels for MSRP and Invoice.
 
The general syntax for this in PROC DATASETS is:
 
PROC DATASETS LIBRARY=;
MODIFY sas-dataset <(options)>;
FORMAT variable-1 <format-1> <variable-2 <format-2> ...>;
LABEL variable-1=<'label-1' | ' '> <variable-2=<'label-2' | ' '> ...>;
RENAME variable-1=new-name-1 <variable-2=new-name-2 ...>;

After running this code:

proc datasets library=work nolist nodetails;
   modify cars;
      format MSRP eurox8. Invoice eurox8. EngineSize commax5.1;
      label MSRP="Sticker Price" Invoice="Wholesale Price" ;
      rename EngineSize=Size;
run; quit;

 
the data now looks like this:
 

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 

with this descriptor portion:

 

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 13
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
In the report, the longest value in the Make column is 5 characters long but the table metadata shows a column width of 13 characters. I can shorten the Make column to 5 characters without truncating the values, making my table size smaller on disk and in memory.

PROC SQL with an ALTER TABLE statement and a MODIFY clause will change a table’s descriptor portion (metadata) for one or more columns. Changing the length of a column will automatically rewrite the data portion of the table, too. The syntax looks like this:

ALTER TABLE table-name MODIFY column-definition-1 <, column-definition-2, ..., column-definition-n>;

After running this code:

proc sql;
alter table work.cars
   modify Make char(5)
;
quit;

The data is still correct:

Year Make Model Sticker Price Wholesale Price Engine Size
(L)
2004 Acura MDX €36.945 €33.337 3,5
2004 Audi A4 1.8T 4dr €25.940 €23.508 1,8
2004 Buick Rainier €37.895 €34.357 4,2

 
and the column size is now smaller:

Column Name Column Label Column Type Column Length Column Format
Year num 8
Make char 5
Model char 40
MSRP Sticker Price num 8 EUROX8.
Invoice Wholesale Price num 8 EUROX8.
Size Engine Size (L) num 8 COMMAX5.1

 
Now, unless you want to write this type of code by hand every time you need it, you’ll want macros to do the work for you. And macros we’ve got! Check out my SAS tutorial on YouTube titled “Resize, Rename and Reformat Data with SAS Macro” for more details on how this works. The video is also embedded below. Links below the embedded video lead to the data and programs used in the video and to my personal macro trove on GitHub, including a macro that quickly strips labels, formats and informats from a data set.

Until next time, may the SAS be with you!
 
Mark

 

DOWNLOAD NOW | VIDEO DATA PACKAGE ON GITHUB
GET MACROS NOW | MY MACRO STASH ON GITHUB

Jedi SAS tricks: resizing, renaming and reformatting your data was published on SAS Users.

1月 072022
 

Welcome to the sixth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to generate simple descriptive statistics of a CAS table.

Let's begin by confirming the cars table is loaded into memory. With a connection to CAS established, execute the tableInfo action to view available in-memory tables. If necessary, you can execute the following code in SAS Studio to load the sashelp.cars table into memory.

conn.tableinfo(caslib="casuser")

The results show the cars table is loaded into memory and available for processing. Next, reference the cars table in the variable tbl. Then use the print function to show the value of the variable.

tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)
CASTable('cars', caslib='casuser')

The results show that the tbl variable references the cars table in the CAS server.

Preview the CAS Table

First things first. Remember, the SWAT package blends the world of Pandas and CAS into one. So you can begin with the traditional head method to preview the CAS table.

tbl.head()

The SWAT head method returns five rows from the CAS server to the client as expected.

The Describe Method

Next, let's retrieve descriptive statistics of all numeric columns by using the familiar describe method on the CAS table.

tbl.describe()

The SWAT describe method returns the same descriptive statistics as the Pandas describe method. The only difference is that the SWAT version uses the CAS API to convert the describe method into CAS actions behind the scenes to process the data on the distributed CAS server. CAS processes the data and returns summarized results back to the client as a SASDataFrame, which is a subclass of the Pandas DataFrame. You can now work with the results as you would a Pandas DataFrame.

Summary CAS Action

Instead of using the familiar describe method, let's use a CAS action to do something similar. Here I'll use the summary CAS action.

tbl.summary()

Summary CAS Action

The results of the summary action return a CASResults object (Python dictionary) to the client. The CASResults object contains a single key named Summary with a SASDataFrame as the value. The SASDataFrame shows a variety of descriptive statistics.  While the summary action does not return exactly the same statistics as the describe method, it can provide additional insights into your data.

What if we don't want all the statistics for all of the data?

Selecting Columns and Summary Statistics with the Summary Action

Let's add additional parameters to the summary action. I'll add the inputs parameter to specify the columns to analyze in the CAS server.

tbl.summary(inputs = ['MPG_City','MPG_Highway'])

The results show only the MPG_City and MPG_Highway columns were analyzed.

Next, I'll use the subSet parameter to specify the summary statistics to produce. Here I'll obtain the MEAN, MIN and MAX.

tbl.summary(inputs = ['MPG_City','MPG_Highway'],
                       subSet = ['mean','min','max'])

The results processed only the MPG_City and MPG_Highway columns, and returned only the specified summary statistics to the client.

Creating a Calculated Column

Lastly, let's create a calculated column within the summary action. There are a variety of ways to do this. I like to add it as a parameter to the CASTable object. You can do that by specifying the tbl object, then computedVarsProgram parameter. Within computedVarsProgram you can use SAS assignment statements with most SAS functions. Here we will create a new column name MPG_Avg that takes the mean of MPG_City and MPG_Highway. Lastly, add the new column to the inputs parameter.

tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

In the results I see the calculated column and requested summary statistics.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package, or the flexible, highly optimized CAS actions like summary to easily obtain summary statistics of your data in the massively parallel processing CAS engine.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Index
SWAT API Reference
CAS Action Documentation
SAS® Cloud Analytic Services: Fundamentals
SAS Scripting Wrapper for Analytics Transfer (SWAT)
CAS Action! - a series on fundamentals
Execute the following code in SAS Studio to load the sashelp.cars table into memory

Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics was published on SAS Users.

12月 072021
 

This post is written in the hopes of easing the SAS Viya deployment process for novices like me. Firstly, deploying SAS Viya, like most enterprise software packages, isn't a skill we're innately born with. We're going to need a little help, some good documentation, and time to absorb the intricoes of the task.

There are many parts and pieces to standing up SAS Viya, depending on what you’re trying to accomplish and how you’d like to go about doing it. Know that the documentation and process can seem colossal and overwhelming, so take your time and don’t rush things. You got this.

Scope of the post

What this blog is and is not

This post will not walk you through the entirety of a deployment. Instead, it’ll point you to the right resources, guide you away from pitfalls, and show you how to accomplish certain tasks the documentation may not entirely cover. Many of these nuances were hard-earned lessons either by me or by people who have been kind enough to show me the way.

Please note the following

  • my experience is limited, and mostly pertains to AWS and Azure
  • the information is current at the time of this writing (December, 2021)

Please feel free to reach out to me if you have any suggestions, comments, or spot any mistakes. Many thanks!

Santa’s Workshop

Deploying SAS Viya is akin to creating toy trains in Santa’s workshop.

At its core, each toy train requires an engine, several cars, and a track. Likewise, each SAS Viya deployment requires a CAS engine that lives on a Kubernetes cluster together with several other servers (e.g., Compute, Connect, Stateful/Stateless), and storage.

Each toy train can be modified in numerous ways depending on the person’s preferences, whether it’s a steam locomotive or a bullet train. Or maybe it’s something more trivial, like merely the color of the train. Regardless of the need, Santa’s workshop contains a plethora of tools, materials, and plenty of knowledgeable elves who have different expertise and insights to customize the pipelines and trains.

Once again, each instance of SAS Viya can also be modified greatly depending on the customer’s needs. There are many hosts, flavors of servers, storage options, and common customizations. A SAS Viya deployment has its own kitchen sink full of tools, pipelines, and methods. And just like in Santa’s workshop, there are plenty of people who are experienced with deploying SAS Viya (and have specialties in different aspects of the deployment) who will assist if you run into issues.

Links Galore

There’s never a shortage of links required to complete deployments. I find myself with multiple windows filled with tabs (for referencing info) while I’m deploying so here’s a list of some I have found helpful.

      1. Setting up SAS Viya Monitoring for K8S
      2. Azure
      3. AWS
      4. GCP
      5. SAS Viya 4 Resource Guide

      Setting Up Your Environment

      There are several required tools for deployment. These include but are not necessarily limited to:

      • kubectl v1.19.9
      • kustomize 3.7.0
      • Docker

      Ensure your environment is set up precisely the way the docs recommend. For example, if you’re going the Terraform route from Viya4-IaC-AWS, you’re going to need this:

      • Terraform v1.0.0
      • kubectl v1.19.9
      • jq v1.6
      • AWS CLI v2.1.29

      The documentation is rather specific in terms of the required version, so please read carefully.

      Starting Off

      To start off, there are a few required readings to get a better understanding of SAS Viya architecture and requirements. Please review these webpages as often as you’d like to ensure understanding and avoid missing any steps.

      1. Getting Started portion of the Viya Operations documentation (linked previously).
      2. In the System Requirements section, please pay special attention to the “Kubernetes Client Machine Requirements” (under Virtual Infrastructure Requirements) to ensure you have the right tools and versions installed.
      3. When you’re done reading the above, it’s time to set up the IaC.

      Choose the corresponding link for “Help with Cluster Setup” (under Virtual Infrastructure Requirements) based on your cloud host of choice.

      IaC

      IaC stands for Infrastructure as Code. These are essentially scripts allowing you to build your cloud infrastructure and provision them through code instead of through the GUI. Several things to note here:

      1. I prefer cloning the IaC repo alongside the other folders, not within them so they’re better organized. It looks something like this:
        Viya4 <– Parent directory
        |– IaC
        |– Deploy
      2. Grab a sample .tfvars file under /examples and paste it into the root IaC directory. I recommend the “sample-input-minimal.tfvars” file if you’re just practicing.
        • Rename this file to “terraform.tfvars” (or preferred name, just be aware that the doc’s instructions assume that you have named it “terraform.tfvars”)
        • This file has several important values to keep in mind / input.
          1. This file contains the cluster configuration and details what all will be created
          2. “prefix” is essentially the name given to all your resources
          3. “default_public_access_cidrs” are CIDRs that you’d like to allow access to your cluster.
          4. “tags” you should include are {“resourceowner”=”your_Email”} (this is to ensure that people will be able to tell who owns the resource. Also, note that the preferred syntax is dependent on the cloud provider, please check the docs to be sure)
          5. “postgres_servers” should only be uncommented if you require an external db server (more expensive), if you don’t and you’re just practicing, leave it commented and it should create an internal one
      3. I highly recommend going the Docker route instead of Terraform (I have personally run into fewer problems through Docker, especially the tearing down process as compared to Terraform).
      4. It takes a while to create the cloud resources so have patience (takes about 15 mins at most).
      5. Once the resources exist, ensure you copy the [prefix]-eks-kubeconfig.conf file into your $(pwd) as well as your ~/.kube/config file if you’d like to keep it. The command to copy the conf file to your ~/.kube location is cp &lt;.conf file&gt; ~/.kube/config
      6. After you’re done with the above, make sure you run export KUBECONFIG=&lt;.conf file&gt;
      7. Test that your deployment is actually up: kubectl get nodes

      Post-IaC

      The next section covers additional SAS Viya requirements for the cluster after standing it up. There are a few things I’d recommend building after ensuring the deployment is up.

      • Ingress Controller
        kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/cloud/deploy.yaml
      • Cert Manager
        kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml
      • Helm/nfs-provisioner (this part is specifically for AWS)
        What’s happening here is that we’re getting the elastic load balancer URL from the ingress-nginx, the EFS ID, and installing the NFS server provisioner

        kubectl get service -n ingress-nginx
         
        ELBURL=$(kubectl get svc -n ingress-nginx ingress-nginx-controller --output jsonpath='{.status.loadBalancer.ingress[0].hostname}')
        echo $ELBURL
         
        EFSFSID=$(aws efs describe-file-systems --region $AWS_DEFAULT_REGION --query "FileSystems[*].FileSystemId" --region $AWS_DEFAULT_REGION --output text)
        echo $EFSFSID
         
        helm repo add stable  --force-update
        helm install stable/nfs-server-provisioner
         
        kubectl get storageclass # to check if the NFS server is up
      • Create a namespace where your SAS Viya deployment lives in the cluster – kubectl create ns . It is critical to go through the System Requirements entirely to ensure you don’t miss any steps (Just be sure that you’re following the portions meant for your cloud host). Examples in the Hardware and Resource Requirements page:
        • Azure – There’s an “Additional PVC Requirements for Microsoft Azure”: a link for “Specify PersistentVolumeClaims to Use ReadWriteMany StorageClass” where you’re required to add a file in the /site-config directory and an additional portion under “transformers” in the kustomization.yaml file
        • AWS – Under “File System and Shared Storage Requirements” refer to the notes on installing a provisioner for the EBS volumes. (The instructions are in the code block above)

      Installation

      This section sets up the parameters and additional customizations to included in the $deploy folder. It falls specifically under the Deployment tab of the SAS Viya Operations documentation.
      After retrieving the required files (under the desired version!), the certificates, license, and all assets and untarring them, take a good look at the section named "Directory Structure" so you have an understanding your desired file structure.

      Under “Installation -> Initial kustomization.yaml file”, once you’ve created your kustomization.yaml file, there are a few things of note here to change:

      1. {{ NAME-OF-NAMESPACE }}
        • If you haven’t already created the namespace where SAS Viya will live, do so now (instructions above #4)
        • Once you have a namespace, replace the entire thing including the {{}} with the name you have chosen.
        • You can always check what namespaces your cluster has by running kubectl get ns
      2. {{ NAME-OF-INGRESS-HOST }} and {{ PORT }} (note that there are multiple references in the kustomization.yaml file)
        • Use kubectl get service -n ingress-nginx and use the external-ip of the output
        • port is 80

      There are plenty of instructions beneath the kustomization.yaml file example, be sure to read through them and follow their instructions thoroughly.

      Additionally, Configure TLS

    1. kustomize build -o site.yaml
      kubectl apply --kubeconfig=kubeconfig-file --selector="sas.com/admin=cluster-wide" -f site.yaml
      kubectl wait --kubeconfig=kubeconfig-file --for condition=established --timeout=60s -l "sas.com/
      admin=cluster-wide" crd
      kubectl apply --kubeconfig=kubeconfig-file --selector="sas.com/admin=cluster-local" -f site.yaml --prune
      kubectl apply --kubeconfig=namespace-kubeconfig-file --selector="sas.com/admin=namespace" -f site.yaml --prune

OR
kustomize build . | kubectl apply -f -
(Note that this is the shortcut of building and piping the results to be applied in kubectl. It does not output a site.yaml file.)

There are a few false-positive errors that may appear during the process (the documentation outlines them pretty clearly).

Post-Deployment

You may run the readiness service to check for when your deployment is ready. Note that this process is lengthy and the fastest I’ve seen a deployment go up is about 15-20 mins. (Now’s a good time to go for a walk or get a cup of coffee).

I highly recommend using Lens to visualize the deployment process and to take a look at the pods and their logs (mini section below).

While all of these steps are possible in Lens, it’s good to know the commands required to inspect and manipulate pods.

kubectl get pods -n  # Take a look at all the pods, add a -W flag to watch them as they update
kubectl describe pod  -n  # To describe specific pods
kubectl logs  -n  # To see the logs of specific pods
kubectl delete pods  -grace-period=0 --force # To force deletion of pods, pods will automatically restart after being deleted.

Important pods to look at:

  • Logon
  • Consul
  • Cache

These pods are pre-requisites for many other pods to come up. If they’re stuck, go ahead and delete them to initiate a restart. This seems to work frequently.

If the pods look good, try going to this website: www.name-of-ingress-host:port/SASDrive. You should see a blue background and a SAS login screen.

Hooray! Now you just have to follow the Sign In as the sasboot User instructions and complete other post deployment tasks (Post-Installation Tasks, Validating the Deployment, etc.” that are pertinent to your use case.

Quick aside: Lens

K8s Lens is an incredibly useful IDE to visualize what is going on in your Kubernetes cluster.
Here are two quick screenshots to help you get situated when you’re looking at pods.

First, you need your .conf file to connect to your cluster. Upon entry, click on Workloads -> Pods to look at the pods. Also click on your namespace for all of the pods for the SAS Viya Deployment to show up.

There are times where you’ll see a yellow triangle with an exclamation mark. While this is technically a warning, it may be an indicator of an error your pod is suffering from. (If you see a HTTP 503 Readiness Probe error, it may just mean that the pod is starting up)

Click on the pod and the lines on the top right in order to see the logs for the chosen pod.

Conclusion

Hopefully this post was helpful for your start in deploying SAS Viya.

Please remember there’s a lot more to it than is covered here. Don’t be disheartened if this wasn’t particularly easy, it certainly wasn’t for me.
Know there are plenty of customizations as well as a constant stream of changes (updates, product related etc.), new methods, and places to deploy.
So there’s always plenty to learn.

Please feel free to reach out and let me know if you have any questions or suggestions for this post.

Acknowledgements

Many thanks to my colleagues Ali Aiello and Jacob Braswell for answering my incessant questions and helping me on this journey!

A Novice Perspective on SAS Viya Deployment was published on SAS Users.

12月 072021
 

This post is written in the hopes of easing the SAS Viya deployment process for novices like me. Firstly, deploying SAS Viya, like most enterprise software packages, isn't a skill we're innately born with. We're going to need a little help, some good documentation, and time to absorb the intricoes of the task.

There are many parts and pieces to standing up SAS Viya, depending on what you’re trying to accomplish and how you’d like to go about doing it. Know that the documentation and process can seem colossal and overwhelming, so take your time and don’t rush things. You got this.

Scope of the post

What this blog is and is not

This post will not walk you through the entirety of a deployment. Instead, it’ll point you to the right resources, guide you away from pitfalls, and show you how to accomplish certain tasks the documentation may not entirely cover. Many of these nuances were hard-earned lessons either by me or by people who have been kind enough to show me the way.

Please note the following

  • my experience is limited, and mostly pertains to AWS and Azure
  • the information is current at the time of this writing (December, 2021)

Please feel free to reach out to me if you have any suggestions, comments, or spot any mistakes. Many thanks!

Santa’s Workshop

Deploying SAS Viya is akin to creating toy trains in Santa’s workshop.

At its core, each toy train requires an engine, several cars, and a track. Likewise, each SAS Viya deployment requires a CAS engine that lives on a Kubernetes cluster together with several other servers (e.g., Compute, Connect, Stateful/Stateless), and storage.

Each toy train can be modified in numerous ways depending on the person’s preferences, whether it’s a steam locomotive or a bullet train. Or maybe it’s something more trivial, like merely the color of the train. Regardless of the need, Santa’s workshop contains a plethora of tools, materials, and plenty of knowledgeable elves who have different expertise and insights to customize the pipelines and trains.

Once again, each instance of SAS Viya can also be modified greatly depending on the customer’s needs. There are many hosts, flavors of servers, storage options, and common customizations. A SAS Viya deployment has its own kitchen sink full of tools, pipelines, and methods. And just like in Santa’s workshop, there are plenty of people who are experienced with deploying SAS Viya (and have specialties in different aspects of the deployment) who will assist if you run into issues.

Links Galore

There’s never a shortage of links required to complete deployments. I find myself with multiple windows filled with tabs (for referencing info) while I’m deploying so here’s a list of some I have found helpful.

      1. Setting up SAS Viya Monitoring for K8S
      2. Azure
      3. AWS
      4. GCP
      5. SAS Viya 4 Resource Guide

      Setting Up Your Environment

      There are several required tools for deployment. These include but are not necessarily limited to:

      • kubectl v1.19.9
      • kustomize 3.7.0
      • Docker

      Ensure your environment is set up precisely the way the docs recommend. For example, if you’re going the Terraform route from Viya4-IaC-AWS, you’re going to need this:

      • Terraform v1.0.0
      • kubectl v1.19.9
      • jq v1.6
      • AWS CLI v2.1.29

      The documentation is rather specific in terms of the required version, so please read carefully.

      Starting Off

      To start off, there are a few required readings to get a better understanding of SAS Viya architecture and requirements. Please review these webpages as often as you’d like to ensure understanding and avoid missing any steps.

      1. Getting Started portion of the Viya Operations documentation (linked previously).
      2. In the System Requirements section, please pay special attention to the “Kubernetes Client Machine Requirements” (under Virtual Infrastructure Requirements) to ensure you have the right tools and versions installed.
      3. When you’re done reading the above, it’s time to set up the IaC.

      Choose the corresponding link for “Help with Cluster Setup” (under Virtual Infrastructure Requirements) based on your cloud host of choice.

      IaC

      IaC stands for Infrastructure as Code. These are essentially scripts allowing you to build your cloud infrastructure and provision them through code instead of through the GUI. Several things to note here:

      1. I prefer cloning the IaC repo alongside the other folders, not within them so they’re better organized. It looks something like this:
        Viya4 <– Parent directory
        |– IaC
        |– Deploy
      2. Grab a sample .tfvars file under /examples and paste it into the root IaC directory. I recommend the “sample-input-minimal.tfvars” file if you’re just practicing.
        • Rename this file to “terraform.tfvars” (or preferred name, just be aware that the doc’s instructions assume that you have named it “terraform.tfvars”)
        • This file has several important values to keep in mind / input.
          1. This file contains the cluster configuration and details what all will be created
          2. “prefix” is essentially the name given to all your resources
          3. “default_public_access_cidrs” are CIDRs that you’d like to allow access to your cluster.
          4. “tags” you should include are {“resourceowner”=”your_Email”} (this is to ensure that people will be able to tell who owns the resource. Also, note that the preferred syntax is dependent on the cloud provider, please check the docs to be sure)
          5. “postgres_servers” should only be uncommented if you require an external db server (more expensive), if you don’t and you’re just practicing, leave it commented and it should create an internal one
      3. I highly recommend going the Docker route instead of Terraform (I have personally run into fewer problems through Docker, especially the tearing down process as compared to Terraform).
      4. It takes a while to create the cloud resources so have patience (takes about 15 mins at most).
      5. Once the resources exist, ensure you copy the [prefix]-eks-kubeconfig.conf file into your $(pwd) as well as your ~/.kube/config file if you’d like to keep it. The command to copy the conf file to your ~/.kube location is cp &lt;.conf file&gt; ~/.kube/config
      6. After you’re done with the above, make sure you run export KUBECONFIG=&lt;.conf file&gt;
      7. Test that your deployment is actually up: kubectl get nodes

      Post-IaC

      The next section covers additional SAS Viya requirements for the cluster after standing it up. There are a few things I’d recommend building after ensuring the deployment is up.

      • Ingress Controller
        kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/cloud/deploy.yaml
      • Cert Manager
        kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml
      • Helm/nfs-provisioner (this part is specifically for AWS)
        What’s happening here is that we’re getting the elastic load balancer URL from the ingress-nginx, the EFS ID, and installing the NFS server provisioner

        kubectl get service -n ingress-nginx
         
        ELBURL=$(kubectl get svc -n ingress-nginx ingress-nginx-controller --output jsonpath='{.status.loadBalancer.ingress[0].hostname}')
        echo $ELBURL
         
        EFSFSID=$(aws efs describe-file-systems --region $AWS_DEFAULT_REGION --query "FileSystems[*].FileSystemId" --region $AWS_DEFAULT_REGION --output text)
        echo $EFSFSID
         
        helm repo add stable  --force-update
        helm install stable/nfs-server-provisioner
         
        kubectl get storageclass # to check if the NFS server is up
      • Create a namespace where your SAS Viya deployment lives in the cluster – kubectl create ns . It is critical to go through the System Requirements entirely to ensure you don’t miss any steps (Just be sure that you’re following the portions meant for your cloud host). Examples in the Hardware and Resource Requirements page:
        • Azure – There’s an “Additional PVC Requirements for Microsoft Azure”: a link for “Specify PersistentVolumeClaims to Use ReadWriteMany StorageClass” where you’re required to add a file in the /site-config directory and an additional portion under “transformers” in the kustomization.yaml file
        • AWS – Under “File System and Shared Storage Requirements” refer to the notes on installing a provisioner for the EBS volumes. (The instructions are in the code block above)

      Installation

      This section sets up the parameters and additional customizations to included in the $deploy folder. It falls specifically under the Deployment tab of the SAS Viya Operations documentation.
      After retrieving the required files (under the desired version!), the certificates, license, and all assets and untarring them, take a good look at the section named "Directory Structure" so you have an understanding your desired file structure.

      Under “Installation -> Initial kustomization.yaml file”, once you’ve created your kustomization.yaml file, there are a few things of note here to change:

      1. {{ NAME-OF-NAMESPACE }}
        • If you haven’t already created the namespace where SAS Viya will live, do so now (instructions above #4)
        • Once you have a namespace, replace the entire thing including the {{}} with the name you have chosen.
        • You can always check what namespaces your cluster has by running kubectl get ns
      2. {{ NAME-OF-INGRESS-HOST }} and {{ PORT }} (note that there are multiple references in the kustomization.yaml file)
        • Use kubectl get service -n ingress-nginx and use the external-ip of the output
        • port is 80

      There are plenty of instructions beneath the kustomization.yaml file example, be sure to read through them and follow their instructions thoroughly.

      Additionally, Configure TLS

    1. kustomize build -o site.yaml
      kubectl apply --kubeconfig=kubeconfig-file --selector="sas.com/admin=cluster-wide" -f site.yaml
      kubectl wait --kubeconfig=kubeconfig-file --for condition=established --timeout=60s -l "sas.com/
      admin=cluster-wide" crd
      kubectl apply --kubeconfig=kubeconfig-file --selector="sas.com/admin=cluster-local" -f site.yaml --prune
      kubectl apply --kubeconfig=namespace-kubeconfig-file --selector="sas.com/admin=namespace" -f site.yaml --prune

OR
kustomize build . | kubectl apply -f -
(Note that this is the shortcut of building and piping the results to be applied in kubectl. It does not output a site.yaml file.)

There are a few false-positive errors that may appear during the process (the documentation outlines them pretty clearly).

Post-Deployment

You may run the readiness service to check for when your deployment is ready. Note that this process is lengthy and the fastest I’ve seen a deployment go up is about 15-20 mins. (Now’s a good time to go for a walk or get a cup of coffee).

I highly recommend using Lens to visualize the deployment process and to take a look at the pods and their logs (mini section below).

While all of these steps are possible in Lens, it’s good to know the commands required to inspect and manipulate pods.

kubectl get pods -n  # Take a look at all the pods, add a -W flag to watch them as they update
kubectl describe pod  -n  # To describe specific pods
kubectl logs  -n  # To see the logs of specific pods
kubectl delete pods  -grace-period=0 --force # To force deletion of pods, pods will automatically restart after being deleted.

Important pods to look at:

  • Logon
  • Consul
  • Cache

These pods are pre-requisites for many other pods to come up. If they’re stuck, go ahead and delete them to initiate a restart. This seems to work frequently.

If the pods look good, try going to this website: www.name-of-ingress-host:port/SASDrive. You should see a blue background and a SAS login screen.

Hooray! Now you just have to follow the Sign In as the sasboot User instructions and complete other post deployment tasks (Post-Installation Tasks, Validating the Deployment, etc.” that are pertinent to your use case.

Quick aside: Lens

K8s Lens is an incredibly useful IDE to visualize what is going on in your Kubernetes cluster.
Here are two quick screenshots to help you get situated when you’re looking at pods.

First, you need your .conf file to connect to your cluster. Upon entry, click on Workloads -> Pods to look at the pods. Also click on your namespace for all of the pods for the SAS Viya Deployment to show up.

There are times where you’ll see a yellow triangle with an exclamation mark. While this is technically a warning, it may be an indicator of an error your pod is suffering from. (If you see a HTTP 503 Readiness Probe error, it may just mean that the pod is starting up)

Click on the pod and the lines on the top right in order to see the logs for the chosen pod.

Conclusion

Hopefully this post was helpful for your start in deploying SAS Viya.

Please remember there’s a lot more to it than is covered here. Don’t be disheartened if this wasn’t particularly easy, it certainly wasn’t for me.
Know there are plenty of customizations as well as a constant stream of changes (updates, product related etc.), new methods, and places to deploy.
So there’s always plenty to learn.

Please feel free to reach out and let me know if you have any questions or suggestions for this post.

Acknowledgements

Many thanks to my colleagues Ali Aiello and Jacob Braswell for answering my incessant questions and helping me on this journey!

A Novice Perspective on SAS Viya Deployment was published on SAS Users.

10月 142021
 

Trimming strings left and right

I am pretty sure you have never heard of the TRIMS function, and I would be genuinely surprised if you told me otherwise. This is because this function does not exist (at least at the time of this writing).

But don’t worry, the difference between "nonexistence" and "existence" is only a matter of time, and from now it is less than a blog away. Let me explain. Recently, I published two complementary blog posts:

[1] Removing leading characters from SAS strings

[2] Removing trailing characters from SAS strings

While working on these pieces and researching “prior art” I stumbled upon a multipurpose function in the SAS FedSQL Language that alone does either one or both of these things – remove leading or/and trailing characters from SAS strings.

FedSQL Language and Proc FedSQL

The FedSQL language is the SAS proprietary implementation of the ANSI SQL:1999 core standard. Expectedly, the FedSQL language is implemented in SAS by means of the FedSQL procedure (PROC FEDSQL). This procedure enables you to submit FedSQL language statements from a Base SAS session, and it is supported in both SAS 9.4 and SAS Viya.

Using the FEDSQL procedure, you can submit FedSQL language statements to SAS and third-party data sources that are accessed with SAS and SAS/ACCESS library engines. Or, if you have SAS Cloud Analytic Services (CAS) configured, you can submit FedSQL language statements to the CAS server.

FedSQL TRIM function

FedSQL language has its own vast FedSQL Functions library with hundreds of functions many of which replicate SAS 9.4 Functions. Many, but not all. Deep inside this FedSQL functions library, there is a unique treasure modestly called TRIM Function which is quite different from the BASE SAS Language TRIM() function.

While SAS 9.4 BASE TRIM() function capabilities are quite limited - it removes just trailing blanks from a character string, the FedSQL TRIM() function is way much more powerful. This triple-action function can remove not just trailing blanks, but also leading blanks, as well as both, leading and trailing blanks. On top of it, it can remove not just blanks, but any characters (although one character at a time). See for yourself, this function has the following pretty self-explanatory syntax:

TRIM( [BOTH | LEADING | TRAILING] [trim-character] FROM column)

Here trim-character specifies one character (in single quotations marks) to remove from column. If trim-character is not specified, the function removes blanks.

While being called a function, it does not look like a regular SAS function where arguments are separated by commas.  It looks more like an SQL statement (which it understandably is – it is part of the FedSQL language). However, this function is available only in PROC FEDSQL; it’s not available in SAS DATA steps or other PROC steps. Still, it gives us pretty good idea of what such a universal function may look like.

User-defined function TRIMS to remove leading or/and trailing characters in SAS strings

Let’s build such a function by means of the PROC FCMP for the outside the FedSQL usage (it is worth noticing that the FCMP procedure is not supported for FedSQL). To avoid confusion with the existing TRIM function we will call our new function TRIMS (with an ‘S’ at the end) which suits our purpose quite well denoting its plural purpose. First, we define what we are going to create.

User-defined TRIMS function

TRIMS Function

Removes leading characters, trailing characters, or both from a character string.

Syntax

TRIMS(function-modifier, string, trim-list, trim-list-modifier)

Required Arguments

  • function-modifier is a case-insensitive character constant, variable, or expression that specifies one of three possible operations:
    'L' or 'l' – removes leading characters.
    'T' or 't' – removes trailing characters.
    'B' or 'b' – removes both, leading and trailing characters.
  • string is a case-sensitive character constant, variable, or expression that specifies the character string to be trimmed.
  • trim-list is a case-sensitive character constant, variable, or expression that specifies character(s) to remove from the string.
  • trim-list-modifier is a case-insensitive character constant variable, or expression that supplements the trim-list.
    The valid values are those modifiers of the FINDC function that “add” groups of characters (e.g. 'a' or 'A', 'c' or 'C', 'd' or 'D', etc.) to the trim-list.

The following user-defined function implementation is based on the coding techniques described in the two previous posts, [1] and [2] that I mentioned above. Here goes.

 
libname funclib 'c:\projects\functions';
 
/* delete previous function definition during debugging */
options cmplib=funclib.userfuncs;
proc fcmp outlib=funclib.userfuncs.package1;
   deletefunc trims;
run;
 
/* new function defintion */
proc fcmp outlib=funclib.userfuncs.package1;
   function trims(f $, str $, clist $, mod $) $32767;
      from = 1;
      last = length(str);
      if upcase(f) in ('L', 'B') then from = findc(str, clist, 'K'||mod);
      if from=0 then return('');
      if upcase(f) in ('T', 'B') then last = findc(str, clist, 'K'||mod, -last); 
      if last=0 then return('');
      return(substr(str, from, last-from+1));      
   endfunc; 
run;

Code highlights

  • In the function definition, we first assign initial values of the target substring positions as from=1 and last=length(str).
  • Then for Leading or Both character removal, we calculate an adjusted value of from as a position of the first character in str that is not listed in clist and not defined by the mod
  • If from=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • Then for Trailing or Both character removal, we calculate an adjusted value of last as a position of the last character in str that is not listed in clist and not defined by the mod
  • If last=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • And finally, we return a substring of str starting at the from position and ending at the last position, that is with the length of last-from+1.

TRIMS function usage

Let’s define SAS data set SOURCE as follows:

data SOURCE;
   input X $ 1-30;
   datalines;
*00It's done*2*1**-
*--*1****9*55
94*Clean record-*00
;

In the following DATA step, we will create three new variables with removed leading (variable XL), trailing (variable XT) and both - leading and trailing (variable XB) characters '*' and '-' as well as any digits:

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length XB XL XT $30;
   XB = trims('b', X, '*-', 'd');
   XL = trims('L', X, '*-', 'd');
   XT = trims('t', X, '*-', 'd');
run;

In this code we use the TRIM function three times, each time with a different first argument to illustrate how this affects the outcome.

Arguments usage highlights

  • The first argument of the TRIMS function specifies whether we remove characters from both leading and trailing positions ('b'), from leading positions only ('L'), or from trailing positions only ('t'). This argument is case-insensitive. (I prefer using capital 'L' for clarity since lowercase 'l' looks like digit '1').
  • The second argument specifies the name of the variable (X) that we are going to remove characters from (variable X is coming from the dataset SOURCE).
  • The third argument '*-' specifies which character (or characters) to remove. In our example we are removing '*' and '-'. If you do not need to explicitly specify any character here, you still must supply a null value ('') since it is a required argument. In this case, the fourth argument (trim-list-modifier) will determine the set of characters to be removed.
  • And finally, the fourth argument (case-insensitive) of the TRIMS function specifies the FINDC function modifier(s) to remove certain characters in bulk (in our example 'd' will remove all digits). If such modifier is not needed, you still must supply a null value ('') since all four arguments of the TRIMS function are positional and required.

Here is the output data table TARGRET showing the original string X and the resulting strings XB (Both leading and trailing characters removed), XL (Leading characters removed) and XT (Trailing characters removed) side by side:

Result of leading and trailing characters trimming

Conclusion

The new TRIMS function presented in this blog post goes ways further the ubiquitous LEFT and TRIM functions that remove the leading (LEFT) or trailing (TRIM) blanks. The TRIMS function handles ANY characters, not just blanks. It also expands the character deletion functionality of the powerful  FedSQL TRIM function beyond just removing any single leading and/or trailing character. The TRIMS function single-handedly removes any number of explicitly specified characters from either leading, trailing or both (leading and trailing) positions. Plus, it removes in bulk many implicitly specified characters. For example 'd' modifier removes all digits, 'du' modifier removes all digits ('d') and all uppercase letters ('u'), 'dup' modifier removes all digits ('d'), all uppercase letters ('u') and all punctuation marks ('p'); and so on as described by the FINDC function modifiers. The order in which modifier characters are listed does not matter.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Please share your thoughts with us below.

Introducing TRIMS function to remove any leading and/or trailing characters from SAS strings was published on SAS Users.