Steven Sober

10月 032020
 

If you're a SAS programmer who now uses SAS Viya and CAS, it's worth your time to optimize your existing programs to take advantage of the new environment. This post is a continuation of my SAS Global Forum 2020 paper Best Practices for Converting SAS® Code to Leverage SAS® Cloud Analytic Services and my SGF 2020 Super Demo.

The best approach for refactoring SAS code for SAS Viya has a few steps:

  • First, "lift and shift" your existing code to run successfully in the compute server for SAS Viya.
  • Next, create CASLIB statements to all of your data sources: i.e. sas7bdat, CSV files, parquet files, RDBMs, cloud data sources, etc.
  • Finally, identify the longest running steps so you know where you have the biggest opportunities. For example, look at steps where the "real time" is 30 minutes or longer, as well as steps that are CPU bound. CPU-bound steps are steps where the CPU time is equal to or greater than the real time for that step.

To help us identify those steps we can leverage a new utility to analyze SAS logs and create reports to help us understand the Real Time and CPU Time for each step. Read on the learn more about this final step in the code refactoring process.

Utility: SAS Log Parser

To generate these reports, I created a SAS program that will read all SAS log files in a directory and create one report per SAS log as well as a descending Real Time (Clock Time) and CPU Time reports. Figure 1 is an example of the report that is generated for each SAS log. In this report we see each step’s procedure or DATA Step’s Real Time and CPU Time. It's derived by picking up on SAS log entries like this:

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           2.79 seconds
      cpu time            0.08 seconds

NOTE: The SAS System used:
      real time           1:08.86
      cpu time            1:18.18

Note, the Total Time and Total CPU Time are fields that are populated when the SAS log note “NOTE: The SAS System used:” is encountered. SAS programs that are ran in batch or using an RSUBMIT process via

Figure 1. sampleETS.log report

Descending Real Time Report

Figure 2 contains an example of the descending real time report. In this report we observe in the Step column that the longest running step is a PROC LOGISTIC that takes over 14 hours (Real Time column) from the SAS log called Sample3.log (File Name column). The best way to use this report is to focus on steps that take longer than 30 minutes. In our case we have 9 steps from 3 SAS logs. Now that we know that we can review the details of each step and then benchmark if that step would run faster by leveraging SAS® Cloud Analytic Services (CAS). Note, for CAS to process data all source and target tables, all data must be in CAS tables and the step must be coded using CAS-enabled steps.

Figure 2. Descending Real Time Report

Descending CPU Time Report

Figure 3 contains an example of the descending CPU times report where we observe that the most CPU intensive step takes over 13 hours (CPU Time column) and is from the Samples3.log (File Name column). Note, if you review the Real Time and CPU Time columns you should notice that only observation 11 (Obs column) has a CPU Time that exceeds the Real Time making it CPU bound.  However, we would not focus on this step since the Real Time is less than 30 minutes.

Figure 3. Descending CPU Time Report

Source code for the SAS Log Parser

I've bundled my SAS code for these steps in my GitHub repository for SAS Global Forum 2020. You'll find these programs along with the other code that supports my previous topics of adapting SAS 9 code for SAS Viya.

sasLogParserMacros.sas contains three macros that drive the process. The macro program %LIST_FILES lists all files with a given extension, %CHECK checks for the existence of a file and deletes it if found, and %SASLOG parses a SAS log and provides the values found in the reports. When you save this file ensure you name it sasLogParserMacros.sas and in the same directory that you save sasLogParser.sas.

sasLogParser.sas is the program we submit to produce the reports. This code includes the code sasLogParserMacros.sas and then generate the repots. The only two statements we need to modify are the first two %LET statements in this program. The first %LET statement points to the location of the two SAS programs sasLogParserMacros.sas and sasLogParser.sas. The second %LET statement points to the directory containing all the SAS logs we want to parse. Note, outside of the two %let statements in sasLogParser.sas do not change any other statement in either program.

Conclusion

In order to understand which steps are good candidates for leveraging the in-memory engine CAS, we must first understand the real time and CPU time of each step. Then we can benchmark which engine in SAS Viya is appropriate for that step -- the compute server or CAS.  The code that I've shared for benchmarking can run within SAS 9 or SAS Viya, on both Windows and Linux.

7月 212020
 

A client recently asked if there's a programmatic way to reduce memory requirements of a CAS table. In this post, you'll learn how to accomplish that when your SASWORK data set has lengthy character data.

What is SASWORK?

The SASWORK library is the temporary library automatically defined by SAS at the start of each SAS session or job. The SASWORK library stores temporary SAS files that you create, such as data sets, catalogs, formats, etc., as well as files created internally by SAS. To access files in the SASWORK library, specify a one-level name for the file or by using a two-level name, i.e. the WORK libref.

A programmatic CASLIB to SASWORK

It is a common practice for SAS programmers to read a source table and then store that source table in SASWORK. When the SASWORK data set has long character data, you can significantly reduce memory requirements by creating a path-based CASLIB to SASWORK and leveraging PROC CASUTIL with the CASDATA= option and use the IMPORTOPTIONS VARCHARCONVERSION=16 statement. The VARCHARCONVERSION=16 statement automatically converts all character data types with length of 16 or greater to a VARCHAR data type.

Code in Figure 1 below creates a macro variable called WORKPATH which we will use in our CASLIB statement:

/* The macro variable WORKPATH contains the path to SASWORK */
%let workpath = %sysfunc(quote(%sysfunc(pathname(work))));
%put &workpath;

(Figure 1: Creating the Macro Variable WORKPATH)

Figure 2 below is the resulting SAS log:


(Figure 2: SAS log of creating the Macro Variable WORKPATH)

Code in Figure 3 below creates our SASWORK data set that we will lift into CAS:

data saswork_table_with_char300;
   length a $ 300 b $ 15 c $ 16;
   a='a300'; b='b15' ; c='c16' ; 
   output;
   a='a300300'; b='b151515'; c='c161616'; 
   output;
   c='c161616161616161'; 
   b='b15151515151515';
   a="a300qzwsxedcrfvtgbyhnujmiklopqazwsxedcrfvtgbyhnujmikolp1234567890123456789012345678901234567890"; 
   output;
run;
 
proc contents data=saswork_table_with_char300;
title "Contents of WORK.SASWORK_TABLE_WITH_CHAR300";
run;

(Figure 3: Code to create a SASWORK data set using a one-level name, i.e. saswork_table_with_char300)

Notice in Figure 4 below the lengths of the character variables a, b, and c:


(Figure 4: Results from PROC CONTENTS)

Code in Figure 5 below creates a path-based CASLIB to SASWORK by using our macro variable WORKPATH:

proc cas;
   file log;
   table.dropCaslib /
   caslib='sas7bdat' quiet = true;
 run;
  addcaslib /
    datasource={srctype="path"}
    name="sas7bdat"
    path=&workpath ; 
 run;

(Figure 5: Path-based CASLIB to SASWORK)

Figure 6 below is resulting SAS log:


(Figure 6: SAS log of path-based CASLIB to SASWORK)

Code in Figure 7 below lifts the SASWORK data set into CAS and ensures the CAS table will contain VARCHAR data types for any character data with a length of 16 or greater:

proc casutil;
 load casdata="saswork_table_with_char300.sas7bdat" 
 casout="cas_table_with_varchar" 
 outcaslib="casuser"
 importoptions=(filetype="basesas", dtm="auto", debug="dmsglvli", 
                varcharconversion=16) ;
run;
quit;

(Figure 7: code that lifts the SASWORK data set into CAS)

Figure 8 below is the resulting SAS log:


(Figure 8: SAS log of code to lift the SASWORK data set into CAS)

Code in Figure 9 below displays the characteristics of our CAS table “cas_table_with_varchar”: Notice in Figure 10 that our variables a and c have a data type of VARCHAR.

proc cas;
   sessionProp.setSessOpt /
   caslib="casuser";
run;
   table.columninfo / table="cas_table_with_varchar";
quit;

(Figure 9: SAS code to display information of a CAS table)

Figure 10 below reveals CAS table characteristics:


(Figure 10: CAS table characteristics)

Conclusion: Conserve memory with smaller CAS tables

By leveraging the coding techniques in this blog, we can lift into CAS any data sets we previously stored in SASWORK. In addition, we convert all character data types with a length of 16 or greater into the VARCHAR data type, which can significantly reduce the size of our CAS tables.

LEARN MORE | SAS® Cloud Analytic Services 3.5: User’s Guide

How to create a path-based CASLIB to SASWORK for speedier analysis with SAS® Viya® was published on SAS Users.

10月 222019
 

PROC SQL

PROC SQL is a very powerful ANSI 92 compliant version of SQL that also allows us to leverage many unique SAS capabilities. Recently I was asked if the PROC SQL in Figure 1 could be refactored into PROC FedSQL so it could run faster by leveraging SAS Viya’s in-memory engine CAS (SAS® Cloud Analytic Services). I was struggling to find a way to refactor this into PROC FedSQL, so I reached out to the SAS Jedi (aka Mark Jordan) for help.

/* Original SQL Statements */
proc sql; create table BenchMark as
     select count(*) as ItemCount
     , sum( abs( nhits - nruns ) < 0.1*natbat )   as DIFF_10
     from sashelp.baseball;
run;

Figure 1. Original PROC SQL

In Figure 2, we can review the SAS Log of our PROC SQL code.

  • It is line 77 that we want to refactor into PROC FedSQL so we can benefit performance improvements by running that code distributed in CAS.
  • On line 77, we use the alias DIFF_10 to create the new column name that is calculated by the two SAS functions SUM and ABS.
  • The expression on line 77 will cause SQL to return a value of 1 if the condition is true and a value of 0 if the condition is false.
  • The alias DIFF_10 will contain the summation of the value returned by the condition (i.e. 0 or 1) for all rows in our data set SASHELP.BASESBALL.

In Figure 5, we can review the results of our PROC SQL statement.

Figure 2. SAS Log of PROC SQL

PROC FedSQL

PROC FedSQL is ANSI 99 compliant without any of the unique SAS capabilities that PROC SQL contains, but PROC FedSQL is CAS enabled, which allows us to leverage SAS Viya’s distributed computing capabilities to improve run-times. Figure 3 is the refactored PROC FedSQL code that the SAS Jedi came up with.

/* PROC FedSQL code */
proc fedsql sessref=casauto; 
   create table BenchMark as
     select count(*) as ItemCount
     , sum(case 
           when (abs (nhits - nruns ) < (0.1*natbat)
                ) is true then 1 end 
          ) as DIFF_10
     from baseball;
quit;

Figure 3. CAS-enabled PROC FedSQL

Figure 4 contains the SAS Log of our CAS enabled PROC FedSQL.

  • Notice on lines 77 we added a CASE statement to the SUM function for our alias DIFF_10.
  • On lines 78-79, the WHEN statement return a value of 1 when the condition is true and a value of 0 when it is false.
  • The alias DIFF_10 will contain the summation, of the value returned by the CASE statement (i.e. 0 or 1) for all rows in our CAS table BASESBALL.

In Figure 5 we can review the results of our PROC FedSQL statement.

Figure 4. SAS log of PROC FedSQL code

Figure 5. Validation that the values from the refactoring of PROC SQL into PROC FedSQL match

Conclusion

As we adopt SAS Viya, a goal is to leverage CAS to speed up the processing of routines written in PROC SQL. To accomplish this, refactor PROC SQL code into PROC FedSQL code. For PROC SQL that cannot be refactored, simply run that PROC SQL code as-is in SAS Viya’s SAS Programming Run-time Environment (SPRE).

SAS® Viya®: How to Emulate PROC SQL Using CAS-Enabled PROC FedSQL was published on SAS Users.

10月 172019
 

CAS Table

SAS Viya’s in-memory tables are referred to as a CAS table and are accessed using a CAS Engine. In this post, we will explore how one can parallel load and compress a CAS table in one pass of the CAS table.

Note: When not using this technique (i.e. PROC CASUTIL with a COMPRESS option) your loading of a CAS table will be a single-threaded process (slower). To understand the following code, which can act as a template for you, see Figure 1, we will review the SAS Log of this code in Figure 2.

proc cas;
  file log;
  table.dropCaslib /
   caslib='sas7bdat' quiet = true;
  addcaslib /
    datasource={srctype="path"}
    name="sas7bdat"
    path="/viyafiles/sasss1/data"
  ; run;
 
  index /
    table={caslib="sas7bdat" name="cars.sas7bdat" singlepass=true}
    casout={caslib="sas7bdat" name="cars" compress=true replication=0}
  ; run;
  print _status; run;
 
  tabledetails /
    caslib="sas7bdat"
    name="cars"
  ; run;
quit;

Figure 1. Template of SAS Code to Parallel Load and Compress a CAS Table in One Pass of the CAS Table

To accomplish the parallel load and compression of a CAS table, we will leverage PROC CAS. Let’s review the SAS log in Figure 2:

  • Line 85 utilizes the FILE LOG statement to redirect the contents we would normally see in the results window directly to the SAS Log i.e. the information between lines 92 and 93 as well as lines 103 and 104. Note: using the FILE LOG statement is optional.
  • On lines 86-92 we are dropping and creating our CASLIB to point to the file system path that contains the SAS7BDAT data set that we want to load and compress, i.e. CARS.
  • On line 87 we added the option QUIET = TRUE to our statement. This is a very handy trick to avoid the ERROR message we get in the SAS Log in Figure 3. If you omit this option an ERROR message will be produced if you have a brand-new CAS Session and our SAS7BDAT CASLIB has not be defined to that session.
  • Lines 88-91 created our CASLIB to our SAS7BDAT data sets.
  • Lines 94-96 are the statements that accomplish the parallel load and compression of our CAS table.
  • On line 94 we use the INDEX statement which is always executed in parallel. Notice we are not creating any indexes in our example but simply using the INDEX statement to activate the parallel load.
  • On line 95 we identify the CASLIB pointing to our source data set CARS.SAS7BDAT. We are also using the option SINGLEPASS = TRUE which means our CAS table will be loaded and compressed as each thread adds a row to our CAS table.
  • On line 96 we are saving our CAS table to our CASLIB SAS7BDAT and naming it CARS. The COMPRESS = TRUE options ensures our CAS table will be compressed and the REPLICATION = 1 ensures our CAS table is replicated i.e. 2 copies of the CAS table, to ensure high availability of the CAS table.
  • On line 98 will print to the SAS Log information telling use that the table was loaded and compressed successfully i.e. {severity=0,reason=0,,statusCode=0}.
  • Lines 101-103 provide information on our compressed CAS table. Reviewing this information, we can see our CAS table has a compression ratio of 5.

Figure 2. SAS Log to Parallel Load and Compress a CAS Table in One Pass of the CAS Table

 

Figure 3. To Avoid this ERROR Message, We Will Add to Line 76 the option QUIET = TRUE, See Line 87 in Figure 2.

Conclusion

When loading multiple CAS tables it is a common practice to compress CAS tables to help avoid the paging of our CAS tables to CAS DISK_CACHE. Paging to CAS_DISK_CACHE impacts performance. In addition, one can parallel load and compress source tables from various formats i.e. CSV files, Hadoop tables as well as many others relational databases, such as Oracle, Teradata and so on.

How to Parallel Load and Compress a SAS® Cloud Analytic Services (CAS) Table was published on SAS Users.

10月 102019
 

DATA Step BY Statements

DATA Step is a very powerful language that SAS and Open Source programmers leverage to stage data for the analytical life cycle. A popular technique is to use the DESCENDING option on BY variables to identify the largest value. Let’s review the example in Figure 1:

  • On line 74 we are using the descending option on the BY statement for the numeric variable MSRP. The reason we are doing this is so we can identify the most expensive car for each make of car in our data set.
  • On line 79 we group our data by MAKE of car.
  • On line 80 we leverage the FIRST. statement on the subsetting IF statement to output the first record for each MAKE. In Figure 2 we can review the results.


Figure 1. Descending BY Statement


Figure 2. Listing of Most Expensive Cars by MAKE

What is CAS?

CAS is SAS Viya’s in-memory engine that processes data and logic in a distributed computing paradigm. When working with CAS tables we can simulate the DESCENDING BY statement by creating a CAS View which will then become the source table to our DATA Step. Let’s review Figure 3:

  • On line 79 we will leverage the CASL (SAS® Cloud Analytic Services Language) action set TABLE with the action VIEW to create the CAS View that will be used as our source table in the DATA Step.
  • On lines 80 and 81 we will store our CAS View in the CASUSER CASLIB with the name of DESCENDING.
  • On line 82 and 83 we use the TABLES statement to specify the input CAS table to our CAS View.
  • On line 84 we use the VARLIST statement to identify the columns from the input table we want in our CAS View.
  • On lines 85 we create a new variable for our CAS View using the computedVars statement,
  • On line 86 we provide the math for our new variable N_MSRP. N_MSRP is the negated value of the input CAS table variable MSRP. Note: This simulation only works for numeric variables. For character data I suggest using LAST. processing which you can review in this blog post.


Figure 3. Simulating DESCENDING BY Statement for Numeric Variables

Now that we have our CAS View with its new variable N_MSRP, we can move on to the DATA Step code in Figure 3.

  • On line 92 the SET statement specifies the source to our DATA Step; CAS View CASUSER.DESCENDING
  • On line 83 we leverage the BY Statement to group our data in ascending order for the CAS View variables MAKE and N_MSRP. Because N_MSRP is in ascending order of our original variable MSRP is in DESCENDING order.
  • On line 94 we use a subsetting IF statement to output the first occurrence of each MAKE.

Figure 4 is a listing of our new CAS table CASUSER.DESCENDING2 and displays the most expensive car for each make of car.


Figure 4. Listing of Most Expensive Cars by MAKE

Template for Creating a CAS View

/* Create a CAS view */
/* For each DESCENDING numeric create a new variable(s) */
/* The value of the new variable(s) is the negated value */
/* of the original DESCENDING BY numeric variable(s) */
proc cas;
   table.view / replace = true
   caslib='casuser'
   name='descending'
   tables={{
      name='cars'
      varlist={'msrp' 'make'},
      computedVars={{name='n_msrp'}},
      computedVarsProgram='n_msrp = -(msrp)'
   }};
run;
quit;
 
data casuser.descending2;
   set casuser.descending;
   by make n_msrp ;
   if first.make ;
run;
 
proc print data=casuser.descending2;
title "Most Expensive Cars";
run;

Conclusion

It is a very common coding technique to process data with a DESCENDING BY statement using DATA Step. With Viya 3.5 the DESCENDING BY statements is supported, for numeric and character data in DATA Step, with this caveat: DESCENDING works on all but the first BY variable on the BY statement. For earlier versions of SAS Viya this simulation is the best practices for numeric data that you want in DESCENDING order.

How to Simulate DATA Step DESCENDING BY Statements in SAS® Cloud Analytic Services (CAS) was published on SAS Users.

11月 142018
 

Prior to SAS Viya

With the creation of SAS Viya, the ability to run DATA Step code in a distributed manner became a reality. Prior to distributed DATA Step, DATA Step programmers never had to think about achieving repeatable results when SAS7BDAT datasets were the sources to their DATA Step code that contains a BY statement. This is because prior to SAS Cloud Analytics Services (CAS), DATA Step ran single-threaded and the source SAS7BDAT dataset was stored on disk. Every time one would run the code we obtained repeatable results because the sequence of rows within the BY group were preserved between runs. To illustrate this, review figures 1, 2, and 3.

Figure 1 is the source SAS7BDAT dataset WORK.TEST1. Notice the sequence of VAR2, especially on row 1 and 4 (i.e., _N_ =1 and 4).

_n_ VAR1 VAR2
1 1 N
2 1 Y
3 1 Y
4 2 Y
5 2 Y
6 2 N


Figure 1. WORK.TEST1 the original SAS7BDAT dataset

In figure 2, we see a BY statement with variable VAR1. This will ensure VAR1 is in ascending order. We are also using FIRST. processing to identify the first occurrence of the BY group. Because this data is stored on disk and because the DATA Step is executed using a single thread, the result table will be repeatable no matter how many times we run the DATA Step code.

Figure 2. Focus on the IF statement, especially VAR2

In figure 3, we see the output SAS7BDAT dataset WORK.TEST2.

_n_ VAR1 VAR2
1 1 N

Figure 3. WORK.TEST2 result dataset from running the code in Figure 2

In figure 4, we are running the same DATA Step but this time our source and target tables are CAS tables. The source table CASLIB.TEST1 was created by lifting the original SAS7BDAT dataset WORK.TEST1 (review figure 1) into CAS.

Figure 4. DATA Step executing in CAS

In figure 5, we see that the DATA Step logic is being respected in runs 1, 2 and 3; but we are not achieving repeatable results. This is due to CAS running on multiple threads. Note that the BY statement – which will group the data correctly for each BY group – is done on the fly. Also, the BY statement will not preserve the sequence of rows within the BY group between runs.

For some processes, this is not a concern but for others it could be. If you need to obtain repeatable results in DATA Step code that runs distributed in CAS as well as match your SAS 9 single-threaded DATA Step results, I suggest the following workaround be used.

Figure 5. DATA Step logic is respected but yields different results with each run

With SAS Viya

The workaround is very simplistic to understand and implement. For each SAS7BDAT dataset being lifted into a CAS table, see figure 6, we need to add a new variable ROW_ID.

_n_ VAR1 VAR2
1 1 N
2 1 Y
3 1 Y
4 2 Y
5 2 Y
6 2 N

Figure 6. Original SAS7BDAT dataset source WORK.TEST1

To accomplish this, we will leverage the automatic variable _N_ that is available to all DATA Step programmers. _N_ is initially set to 1. Each time the DATA step loops past the DATA statement, the variable _N_ increments by 1. The value of _N_ represents the number of times the DATA step has iterated. In our case, the value for each row is the row sequence in the original SAS7BDAT dataset. Figure 7 contains the SAS code we ran on the SAS 9.4M5 workspace server or the SAS Viya compute server to add the new variable ROW_ID.

 

Figure 7. Creating the new variable ROW_ID

By reviewing figure 8 we can see the new variable ROW_ID in the SAS7BDAT dataset WORK.TEST1. Now that we have the new variable, we are ready to lift this dataset into CAS.

_N_ VAR1 VAR2 ROW_ID
1 1 N 1
2 1 Y 2
3 1 Y 3
4 2 Y 4
5 2 Y 5
6 2 N 6

Figure 8. WORK.TEST1 with the new variable ROW_ID

There are many ways to lift a SAS7BDAT dataset into CAS. One way is to use a DATA Step like we did in figure 9.

Figure 9. DATA Step code to create distributed CAS table CASLIB.TEST1 

To obtain the repeatable results, we need to control the sequence of rows within each BY group. We accomplish this by adding the new variable ROW_ID as the last variable to the BY statement in our DATA Step code, see figure 10.

Figure 10. Add ROW_ID as last variable of the BY group

Figure 11 shows us the output CAS table created by the code in figure 10. By adding the new variable ROW_ID and using that variable as the last variable of the BY statement, we are controlling the sequencing of rows within the BY groups for all 3 runs.

VAR1 VAR2 ROW_ID
1 N 1

Figure 11. Distrusted CAS table CASLIB.TEST2

Conclusion

With distributed DATA Step comes great opportunities to improve runtimes. It also means we need to understand differences between single-threaded processing of SAS7BDAT datasets that are stored on disk and distributed processing of CAS tables store in-memory. To help you with that journey I suggest you read the SAS Global Forum paper, Parallel Programming with the DATA Step: Next Steps.

How to achieve repeatable results with distributed DATA Step BY Groups was published on SAS Users.

6月 222018
 

reference CAS tables using a one-level nameReferencing tables

With the SAS language the way one references a table in code is by using a two-level name or a one-level name. With two-level names, one supplies the libref as well as the table name i.e. PROC MEANS DATA = WORK.TABLE;. By default, all one-level names also refer to SASWORK i.e. PROC MEANS DATA = TABLE;

CAS

To reference CAS tables using a one-level name we will issue two statements that alter which libref houses the tables referenced as one-level names. First, we will create a CAS libref (line 77) followed by the “options user” statement on line 80. It is line 80 that changes the default location from SASWORK to the CAS libref i.e. CASWORK.Figure 1. Statements to alter the default location for one-level names

How to reference one-level CAS tables

From this point on all one-level names referenced in code will be tables managed by CAS. In figure 2 we are creating a one-level CAS table called baseball by reading the two-level named table SASHELP.BASEBALL. This step executes in the SAS Programing Runtime Engine (a SAS Viya based workspace server) and creates the table CASWORK.BASEBALL. Because of the “options user” statement we can now also reference that table using a one-level name i.e. BASEBALL.

Figure 2. Loading a two-level named table into a one-level named table that is managed by CAS

In Figure 3 we will use a DATA Step to read a CAS table and write a CAS table using one-level names. We can also see by reviewing the notes in the SAS log that this DATA Step ran in CAS using multiple threads.

Figure 3. DATA Step referencing one-level named tables

In Figure 4 we observe this PROC MEANS is processing the one-level named table BASEBALL. By reviewing the notes in the SAS log we can see this PROC MEANS ran distributed in CAS.

Figure 4. PROC MEANS referencing one-level named CAS table

Because the default location for one-level names is SASWORK, all tables in SASWORK are automatically deleted when the SAS session ends. When one changes the default location for one-level names, like we just did, it is a best practice to use a PROC DELETE as the last statement in your code to delete all one-level tables managed by CAS, figure 5.

Figure 5. PROC DELETE deleting all one-level CAS tables

Conclusion

It is a very common SAS coding technique to read a source table from a non-SAS data store and write it to SASWORK. By using the technique describe in this blog one now has options on where to store the one-level tables names. As for me, I prefer storing them in CAS so I benefit from the distributed process (faster runtimes) that CAS offers.

How to reference CAS tables using a one-level name was published on SAS Users.

4月 192018
 

A very common coding technique SAS programmers use is identifying the largest value for a given column using DATA Step BY statement with the DESCENDING option. In this example I wanted to find the largest value of the number of runs (nRuns) by each team in the SASHELP.BASEBALL dataset. Using a SAS workspace server, one would write:

DESCENDING BY Variables in SAS Viya

Figure 1. Single Threaded DATA Step in SAS Workspace Server

Figure 2 shows the results of the code we ran in Figure 1:

Figure 2. Result from SAS Code Displayed in Figure 1

To run this DATA Step distributed we will leveraging the SAS® Cloud Analytic Services in SAS® Viya™. Notice in Figure 3, there is no need for the PROC SORT step which is required when running DATA Step single threaded in a SAS workspace server. This is because SAS® Cloud Analytic Services in SAS® Viya™ . Instead we will use

Figure 3. Distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™

Figure 4 shows the results when running distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™.

Figure 4. Results of Distributed DATA Step in SAS® Cloud Analytic Services in SAS® Viya™

Conclusion

Until the BY statement running in the SAS® Cloud Analytic Services in SAS® Viya™ supports DESCENDING use this technique to ensure your DATA Step runs distributed.

Read more SAS Viya posts.

Read our SAS 9 to SAS Viya whitepaper.

How to Simulate DESCENDING BY Variables in DATA Step Code that Runs Distributed in SAS® Viya™ was published on SAS Users.

11月 212017
 

SAS Viya provides a robust, scalable, cloud-ready, distributed runtime engine. This engine is driven by CAS (Cloud Analytic Services), providing fast processing for many data management techniques that run distributive, i.e. using all threads on all defined compute nodes.

Why

PROC APPEND is a common technique used in SAS processes. This technique will concatenate two data sets together. However, PROC APPEND will produce an ERROR if the target CAS table exists prior to the PROC APPEND.

Simulating PROC APPEND

Figure 1. SAS Log with the PROC APPEND ERROR message

Now what?

How

To explain how to simulate PROC APPEND we first need to create two CAS tables. The first CAS table is named CASUSER.APPEND_TARGET. Notice the variables table, row and variable in figure 2.

Figure 2. Creating the CAS table we need to append rows to

The second CAS table is called CASUSER.TABLE_TWO and in figure 3 we can review the variables table, row, and variable.

Figure 3. Creating the table with the rows that we need to append to the existing CAS table

To simulate PROC APPEND we will use a DATA Step. Notice on line 77 in figure 4 we will overwrite an existing CAS table, i.e. CASUSER.APEND_TARGET. On Line 78, we see the first table on the SET statement is CASUSER.APPEND_TARGET, followed by CASUSER.TABLE_TWO. When this DATA Step runs, all of the rows in CASUSER.APPEND_TARGET will be processed first, followed by the rows in CASUSER.TABLE_TWO. Also, note we are not limited to two tables on the SET statement with DATA Step; we can use as many as we need to solve our business problems.

Figure 4. SAS log validating the DATA Step ran in CAS i.e. distributed

The result table created by the DATA Step is shown in figure 5.

Figure 5. Result table from simulating PROC APPEND using a DATA Step

Conclusion

SAS Viya’s CAS processing allows us to stage data for downstream consumption by leveraging robust SAS programming techniques that run distributed, i.e. fast. PROC APPEND is a common procedure used in SAS processes. To simulate PROC APPEND when using CAS tables as source and target tables to the procedure use DATA Step.

How to simulate PROC APPEND in CAS was published on SAS Users.

3月 182017
 

Editor’s note: This is the third in a series of articles to help current SAS programmers add SAS Viya to their analytics skillset. In this post, Advisory Solutions Architect Steven Sober explores how to accomplish distributed data management using SAS Viya. Read additional posts in the series.

In my last article I explained how SAS programmers can execute distributed DATA Step, PROC DS2, PROC FEDSQL and PROC TRANSPOSE in SAS Viya’s Cloud Analytic Services (CAS) which speeds up the process of staging data for analytics, visualizations and reporting. In this article we will explore how open source programmers can leverage the same SAS coding techniques while remaining in their comfort zone.

For this post, I will utilize Jupyter Notebook to run the Python script that is leveraging the same code we used in part one of this series.

Importing Package and Starting CAS

First, we import the SAS Scripting Wrapper for Analytics Transfer (SWAT) package, which is the Python client to SAS Cloud Analytic Services (CAS). To down load the SWAT package, use this url: https://github.com/sassoftware/python-swat.

Let’s review the cell “In [16]”:

1.  Import SWAT

a.  Required statement, this loads the SWAT package into our Python client

2.  s = swat.CAS("viya.host.com", port#, "userid", "password")

a.  Required statement, for our example we will use “s” in our dot notation syntax to send our statements to CAS for processing. “s” is end-user definable (i.e. I could have used “steve =” instead of “s =”).

b.  Viya.host.com is the host name of your SAS Viya platform

c.  Port#

i.  Port number used to communicate with CAS

d.  userid

i.  Your user id for the SAS Viya platform

e.  Password

i.  Your password for your userid

3.  indata_dir = "/opt/sasinside/DemoData"

a.  Creating a variable call “indata_dir”. This is a directory on the SAS Viya platform where the source data for our examples is located.

4.  indata     = "cars"

a.  Creating a variable call “indata” which contains the name of the source table we will load into CAS

Reviewing cell “Out[16]” we see the information that CAS returns to our client when we connect successfully.

Loading our Source Table and CAS Action Sets

In order to load data into CAS we first need to create a dataSource={"srcType":"PATH"},
path = indata_dir)

a.  To send statements to CAS we use dot notation syntax where:

a.  s

i.  The CAS session that we established in cell “in[16]”

b.  table

i.  CAS action set

c.  addCaslib

i.  Action set’s action

d.  name

i.  Specifies the name of the caslib to add.

e.  dataSource

i.  Specifies the data source type and type-specific parameters.

f.  path

i.  Specifies data source-specific information. For PATH and HDFS, this is a file system path. In our example we are referencing the path using the variable “indata_dir” that we established in cell “In[16]”.

casOut={"caslib":"casuser", "name":"cars",
"replace":"True"},

)

a.  As we learned s. is our connection to CAS and “table.” is the CAS action set while “Table” is the action set’s action.

a.  path=

i.  Specifies the file, directory or table name. In our example this is the physical name of the SAS data set being loaded into CAS.

b. casOut=

i.  The CAS library we established in cell “In[17]” using the “addCaslib” action.

1  caslib.casuser

a.  “caslib” - is a reserved word and is use to reference all CAS libraries
b.  “casuser” - is the CAS library we will use in our examples
c.  “name”  - is the CAS table name
d.  “replace” - provides us an option to replace the CAS table if it already exists.

Reviewing cell “Out[17]” we see the information that CAS returns to our client when we successfully load a table into CAS.

Click here for information on the loadActionSet action.

DATA Step

We are now ready to continue by running DATA Step, PROC DS2, PROC FEDSQL and PROC TRANSPOSE via our python script.

Now that we understand the dot notation syntax used to send statements to CAS, it become extremely simple to leverage the same code our SAS programmers are using.

Reviewing cell “In[19]” we notice we are using the CAS action set “dataStep” and it’s action “runCode”.  Notice between the (“”” and  “””) we have the same DATA Step code we reviewed in part one of this series. By reviewing cell “Out19]” we can review the information CAS sent back providing information on the source (casuser.cars) and target (casuser.cars_data_step) tables used in our DATA Step.

With DS2 we utilize the CAS action set “ds2” with its action “runDS2.” In reviewing cell “In[23]” we do notice a slight difference in our code. There is no “PROC DS2” prior to the “thread mythread / overwrite = yes;” statement. With the DS2 action set we simply define our DS2 THREAD program and follow that with our DS2 DATA program. Notice in the DS2 DATA program we declare the DS2 THREAD that we just created.

Review the NOTE statements: prior to “Out[23]” These statements validate the DS2 THREAD and DATA programs executed in CAS.

With FedSQL we use the CAS action set “fedsql’ with its action “execDirect.” The “query=” parameter is where we place our FedSQL statements. By reviewing the NOTE statements we can validate our FedSQL ran successfully.

With TRANSPOSE we use the CAS action set “transpose” with its action “transpose.” The syntax is different for PROC TRANSPOSE, but it is very straight forward on mapping out the parameters to accomplish the transpose you need for your analytics, visualizations and reports.

Collaborative distributed data management with open source was published on SAS Users.