SAS Viya

2月 252021
 

The people, the energy, the quality of the content, the demos, the networking opportunities…whew, all of these things combine to make SAS Global Forum great every year. And that is no exception this year.

Preparations are in full swing for an unforgettable conference. I hope you’ve seen the notifications that we set the date, actually multiple dates around the world so that you can enjoy the content in your region and in your time zone. No one needs to set their alarm for 1:00am to attend the conference!

Go ahead and save the date(s)…you don’t want to miss this event!

Content, content, content

We are working hard to replicate the energy and excitement of a live conference in the virtual world. But we know content is king, so we have some amazing speakers and content lined up to make the conference relevant for you. There will be more than 150 breakout sessions for business leaders and SAS users, plus the demos will allow you to see firsthand the innovative solutions from SAS, and the people who make them. I, for one, am looking forward to attending live sessions that will allow attendees the opportunity to ask presenters questions and have them respond in real time.

Our keynote speakers, while still under wraps for now, will have you on the edge of your seats (or couches…no judgement here!).

Networking and entertainment

You read that correctly. We will have live entertainment that'll have you glued to the screen. And you’ll be able to network with SAS experts and peers alike. But you don’t have to wait until the conference begins to network, the SAS Global Forum virtual community is up and running. Join the group to start engaging with other attendees, and maybe take a guess or two at who the live entertainment might be.

A big thank you

We are working hard to bring you the best conference possible, but this isn’t a one-woman show. It takes a team, so I would like to introduce and thank the conference teams for 2021. The Content Advisory Team ensures the Users Program sessions meet the needs of our diverse global audience. The Content Delivery Team ensures that conference presenters and authors have the tools and resources needed to provide high-quality presentations and papers. And, finally, the SAS Advisers help us in a multitude of ways. Thank you all for your time and effort so far!

Registration opens in April, so stay tuned for that announcement. I look forward to “seeing” you all in May.

What makes SAS Global Forum great? was published on SAS Users.

2月 192021
 

This post is part of our Young Data Scientists series, featuring the motivation, work and advice of the next generation of data scientists. Be sure to check back for future posts, or read the whole series by clicking on the image to the right. This July, the inaugural Academic SAS [...]

Taking initiative leads to rewards for this young data scientist was published on SAS Voices by Jelena Stankovic

1月 252021
 

Safety, efficacy, speed and costs must all be prioritized and balanced in the delivery of life-changing therapies to patients. A drug that's quickly and cost-efficiently delivered to market, but isn’t effective and safe is unacceptable. An effective, safe drug that doesn’t get to patients in time to save lives has [...]

The evolving role of AI in drug safety was published on SAS Voices by Cameron McLauchlin

12月 172020
 

There’s nothing worse than being in the middle of a task and getting stuck. Being able to find quick tips and tricks to help you solve the task at hand, or simply entertain your curiosity, is key to maintaining your efficiency and building everyday skills. But how do you get quick information that’s ALSO engaging? By adding some personality to traditionally routine tutorials, you can learn and may even have fun at the same time. Cue the SAS Users YouTube channel.

With more than 50 videos that show personality published to-date and over 10,000 hours watched, there’s no shortage of learning going on. Our team of experts love to share their knowledge and passion (with personal flavor!) to give you solutions to those everyday tasks.

What better way to round out the year than provide a roundup of our most popular videos from 2020? Check out these crowd favorites:

Most viewed

  1. How to convert character to numeric in SAS
  2. How to import data from Excel to SAS
  3. How to export SAS data to Excel

Most hours watched

  1. How to import data from Excel to SAS
  2. How to convert character to numeric in SAS
  3. Simple Linear Regression in SAS
  4. How to export SAS data to Excel
  5. How to Create Macro Variables and Use Macro Functions
  6. The SAS Exam Experience | See a Performance-Based Question in Action
  7. How it Import CSV files into SAS
  8. SAS Certification Exam: 4 tips for success
  9. SAS Date Functions FAQs
  10. Merging Data Sets in SAS Using SQL

Latest hits

  1. Combining Data in SAS: DATA Step vs SQL
  2. How to Concatenate Values in SAS
  3. How to Market to Customers Based on Online Behavior
  4. How to Plan an Optimal Tour of London Using Network Optimization
  5. Multiple Linear Regression in SAS
  6. How to Build Customized Object Detection Models

Looking forward to 2021

We’ve got you covered! SAS will continue to publish videos throughout 2021. Subscribe now to the SAS Users YouTube channel, so you can be notified when we’re publishing new videos. Be on the lookout for some of the following topics:

  • Transforming variables in SAS
  • Tips for working with SAS Technical Support
  • How to use Git with SAS

2020 roundup: SAS Users YouTube channel how to tutorials was published on SAS Users.

12月 072020
 

In my previous blog post, I talked about using PROC CAS to accomplish various data preparation tasks. Since then, my colleague Todd Braswell and I worked through some interesting challenges implementing an Extract, Transform, Load (ETL) process that continuously updates data in CAS. (Todd is really the brains behind getting this to work – I am just along for the ride.)

In a nutshell, the process goes like this:

  1. PDF documents drop into a "receiving" directory on the server. The documents have a unique SubmissionID. Some documents are very large – thousands of pages long.
  2. A Python job runs and converts PDFs into plain text. Python calls an API that performs Optical Character Recognition (OCR) and saves off the output as a CSV file, one row per page, in the PDF document.
  3. A SAS program, running in batch, loads the CSV file with new records into a CAS table. SubmissionID is passed to the batch program as a macro variable, which is used as part of the CAS table name.
  4. Records loaded from the CSV file are appended to the Main table. If records with the current SubmissionID already exist in the Main table, they are deleted and replaced with new records.
    The Main table is queried by downstream processes, which extract subsets of data, apply model score code, and generate results for the customer.

Continuously update data process flow

Due to the volume of data and the amount of time it takes to OCR large PDFs, the ETL process runs in multiple sessions simultaneously. And here is a key requirement: the Main table is always available, in a promoted state, with all up-to-date records, in order for the model score code to pick up the needed records.

What does "promoted state" mean?

The concept of table scope, which was introduced with the first release of SAS Viya, presents a challenge. CAS tables are in-memory tables that can have one of two "scopes":

  • Session scope – the table exists within the scope of your current CAS session and drops from memory as soon as the session ends. Functionally, this is somewhat similar to the data you write to the WORK library in SAS 9 – once you disconnect, the data drops from the WORK library.
  • Global scope – the table is available to all sessions. If your session ends, you will still have access to it when you start a new session. Your colleagues also maintain access, assuming they have the necessary permissions. For the table to assume global scope, it needs to be promoted.

Common promotion techniques for a table are the DATA STEP, PROC CAS, or PROC CASUTIL. For example:

/*promote a table using DATA STEP*/
*this snippet copies a SAS 9 table into CAS and promotes in one step;
data mycaslib.my_table (promote=yes);
     set mylib.my_table;
     run;
 
/*promote using PROC CASUTIL*/
*this snippet promotes a session-scope CAS table to global scope;
proc casutil incaslib='mycaslib' outcaslib='mycaslib';
     promote casdata='my_table';
     quit;
 
/*Promote using PROC CAS*/
*same as above, this snippet promotes a session-scope table to global scope;
proc cas;
table.promote / 
     caslib='mycaslib'
     targetcaslib='mycaslib' 
     name='my_table' 
     target='my_table';
    quit;

Fun Facts About Table Promotion

You cannot promote a table that has already been promoted. If you need to promote a new version of the same table, you need to first drop the existing table and promote the new version.

To discover the current table state, use the

proc cas;
     table.tableinfo / 
     caslib='mycaslib' 
     name='main';
     quit;

If you append rows to a promoted table using the DATA STEP append option, the new rows are automatically promoted. For example, in this snippet the mycaslib.main table, which is promoted, remains promoted when the rows from mycaslib.new_rows are appended to it:

data mycaslib.main(append=yes);
     set mycaslib.new_rows;
     run;

When you manipulate a promoted table using the DATA STEP apart from appending rows, it creates a new, session-scope version of the same table. You will have two versions of the table: the global-scope table, which remains unchanged, and the session-scope version which has the changes you implemented. Even if you don't change anything in the table and simply run:

data mycaslib.my_table;
     set mycaslib.my_table;
     run;

in which mycaslib.my_table is promoted, you end up with a promoted and an unpromoted version of this table in the mycaslib library – a somewhat unexpected and hardly desired result. Appendix 1 walks through a quick exercise you can try to verify this.

As you probably guessed, this is where we ran into trouble with our ETL process: the key requirement was for the Main table to remain promoted, yet we needed to continuously update it. The task was simple if we just needed to append the rows; however, we also needed to replace the rows if they already existed. If we tried to delete the existing rows using the DATA STEP, we would have to deal with the changes applied to a session-scope copy of the global-scope table.

Initially, we designed the flow to save off the session-scope table with changes, then drop the (original) global-scope version, and finally reload the up-to-date version. This was an acceptable workaround, but errors started to pop up when a session looked for the Main table to score data, while a different concurrent session reloaded the most up-to-date data. We were uncertain how this would scale as our data grew.

PROC CAS to the rescue!

After much research, we learned the deleteRows, which allows you to delete rows directly from a global-scope table. The data is never dropped to session-scope – exactly what we needed. Here's an example:

proc cas;
     table.deleteRows /
     table={caslib="mycaslib", name="Main", where="SubmissionID = 12345"};
     quit;

In case you are wondering, the Tables action set also has an

/*1. Load new rows. SubmissionID macro variable is a parameter passed to the batch program*/
/*New rows are written to the casuser library, but it does not really matter which caslib you choose – 
   we are not persisting them across sessions*/
proc casutil;
       load file="/path_to_new_rows/New_rows_&SubmissionID..csv" outcaslib="casuser" casout="new_rows";
       quit;
/*2. Delete rows with the current SubmissionID */
proc cas;
       table.deleteRows /
       table={caslib="prod", name="Main", where="SubmissionID = &SubmissionID."};
       quit;
/*3. Append new rows*/
data mycaslib.main(append=yes);
	set mycaslib.new_rows;
	run;
/*4. Save the main table to ensure we have a disk backup of in-memory data*/
proc casutil incaslib="prod" outcaslib="prod";
	save casdata="main" replace;
	quit;

Conclusion

We learned how to continuously update data in CAS while ensuring the data remains available to all sessions accessing it asynchronously. We learned the append option in DATA STEP automatically promotes new rows but manipulating the data in a global-scope table through DATA STEP in other ways leads to data being copied to session-scope. Finally, we learned that to ensure the table remains promoted while it is updated, we can fall back on PROC CAS.

Together, these techniques enabled implementation of a robust data flow that overcomes concurrency problems due to multiple processes updating and querying the data.

Acknowledgement

We thank Brian Kinnebrew for his generous help in investigating this topic and the technical review.

Appendix 1

Try the following exercise to verify that manipulating a promoted table in DATA STEP leads to two copies of the table – session- AND global-scope.

/*copy and promote a sample SAS 9 table*/
data casuser.cars(promote=yes);
     set sashelp.cars;
     run;
/*check the number of rows and confirm that the table is promoted*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit; /*The table is promoted and has 428 rows*/
 
/*delete some rows in the promoted table*/
data casuser.cars;
     set casuser.cars;
     if make='Acura' then delete;
     run;
/*check again – how may rows does the table have? Is it promoted?*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit;  /*The table is has 421 rows but it is no longer promoted*/
 
/*reset your CAS session*/
/*kill your current CAS session */
cas _all_ terminate;
/*start a new CAS session and assign caslibs*/
cas; 
caslib _all_ assign;
 
/*check again – how may rows does the table have? Is it promoted?*/
proc cas;
     table.tableinfo / caslib='casuser' name='cars';
     quit;  /*The table is promoted and has 428 rows*/

What we see here is, manipulating a global-scope table in DATA STEP leads to duplication of data. CAS copies the table to session-scope and applies the changes there. The changes go away if you terminate the session. One way to get around this issue is instead of trying to overwrite the promoted table, create a new table, then drop the old table and promote the new table under the old table's name. Otherwise, use table.Update actions to update/delete rows in place, as described in this post.

Append and Replace Records in a CAS Table was published on SAS Users.

11月 202020
 

If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.

Everything is virtual these days, so what do we mean by virtual?

Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.

Call for content is open

One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.

The Student Symposium is back

Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.

Learn more

I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.

Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.

11月 192020
 
SAS loves data. It's our raison d'être. We've been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase "World Wide Web" combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To


The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?" is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil - the "indirect" way
  • REST API - the "web" way
  • gcsfuse - the "dark" way
  • BigQuery - the "smart" way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers' object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Accessing Google Cloud Storage (GCS) with SAS Viya was published on SAS Users.

11月 192020
 
SAS loves data. It's our raison d'être. We've been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase "World Wide Web" combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To


The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?" is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil - the "indirect" way
  • REST API - the "web" way
  • gcsfuse - the "dark" way
  • BigQuery - the "smart" way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers' object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Accessing Google Cloud Storage (GCS) with SAS Viya was published on SAS Users.

10月 032020
 

If you're a SAS programmer who now uses SAS Viya and CAS, it's worth your time to optimize your existing programs to take advantage of the new environment. This post is a continuation of my SAS Global Forum 2020 paper Best Practices for Converting SAS® Code to Leverage SAS® Cloud Analytic Services and my SGF 2020 Super Demo.

The best approach for refactoring SAS code for SAS Viya has a few steps:

  • First, "lift and shift" your existing code to run successfully in the compute server for SAS Viya.
  • Next, create CASLIB statements to all of your data sources: i.e. sas7bdat, CSV files, parquet files, RDBMs, cloud data sources, etc.
  • Finally, identify the longest running steps so you know where you have the biggest opportunities. For example, look at steps where the "real time" is 30 minutes or longer, as well as steps that are CPU bound. CPU-bound steps are steps where the CPU time is equal to or greater than the real time for that step.

To help us identify those steps we can leverage a new utility to analyze SAS logs and create reports to help us understand the Real Time and CPU Time for each step. Read on the learn more about this final step in the code refactoring process.

Utility: SAS Log Parser

To generate these reports, I created a SAS program that will read all SAS log files in a directory and create one report per SAS log as well as a descending Real Time (Clock Time) and CPU Time reports. Figure 1 is an example of the report that is generated for each SAS log. In this report we see each step’s procedure or DATA Step’s Real Time and CPU Time. It's derived by picking up on SAS log entries like this:

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           2.79 seconds
      cpu time            0.08 seconds

NOTE: The SAS System used:
      real time           1:08.86
      cpu time            1:18.18

Note, the Total Time and Total CPU Time are fields that are populated when the SAS log note “NOTE: The SAS System used:” is encountered. SAS programs that are ran in batch or using an RSUBMIT process via

Figure 1. sampleETS.log report

Descending Real Time Report

Figure 2 contains an example of the descending real time report. In this report we observe in the Step column that the longest running step is a PROC LOGISTIC that takes over 14 hours (Real Time column) from the SAS log called Sample3.log (File Name column). The best way to use this report is to focus on steps that take longer than 30 minutes. In our case we have 9 steps from 3 SAS logs. Now that we know that we can review the details of each step and then benchmark if that step would run faster by leveraging SAS® Cloud Analytic Services (CAS). Note, for CAS to process data all source and target tables, all data must be in CAS tables and the step must be coded using CAS-enabled steps.

Figure 2. Descending Real Time Report

Descending CPU Time Report

Figure 3 contains an example of the descending CPU times report where we observe that the most CPU intensive step takes over 13 hours (CPU Time column) and is from the Samples3.log (File Name column). Note, if you review the Real Time and CPU Time columns you should notice that only observation 11 (Obs column) has a CPU Time that exceeds the Real Time making it CPU bound.  However, we would not focus on this step since the Real Time is less than 30 minutes.

Figure 3. Descending CPU Time Report

Source code for the SAS Log Parser

I've bundled my SAS code for these steps in my GitHub repository for SAS Global Forum 2020. You'll find these programs along with the other code that supports my previous topics of adapting SAS 9 code for SAS Viya.

sasLogParserMacros.sas contains three macros that drive the process. The macro program %LIST_FILES lists all files with a given extension, %CHECK checks for the existence of a file and deletes it if found, and %SASLOG parses a SAS log and provides the values found in the reports. When you save this file ensure you name it sasLogParserMacros.sas and in the same directory that you save sasLogParser.sas.

sasLogParser.sas is the program we submit to produce the reports. This code includes the code sasLogParserMacros.sas and then generate the repots. The only two statements we need to modify are the first two %LET statements in this program. The first %LET statement points to the location of the two SAS programs sasLogParserMacros.sas and sasLogParser.sas. The second %LET statement points to the directory containing all the SAS logs we want to parse. Note, outside of the two %let statements in sasLogParser.sas do not change any other statement in either program.

Conclusion

In order to understand which steps are good candidates for leveraging the in-memory engine CAS, we must first understand the real time and CPU time of each step. Then we can benchmark which engine in SAS Viya is appropriate for that step -- the compute server or CAS.  The code that I've shared for benchmarking can run within SAS 9 or SAS Viya, on both Windows and Linux.

9月 262020
 

Welcome back to my series on securely integrating custom applications into your SAS Viya platform. My goal today is to lay out some examples of the concepts I introduced in the previous posts. As a quick recap:

  1. In the first installment of this series I shared my experiences on a customer project, where we helped achieve value-add on top of their SAS Viya platform. We built a custom application for their buyers to use at auto auctions and embedded repeatable analytics from a SAS Viya decisioning workflow. That customization now helps their buyers make data-driven decisions based on both business rules and analytical models trained by historical data from past purchases and sales. The models are re-trained by the workflow as buyers "commit" new purchases.
  2. In part two, I outlined the security frameworks (OAuth 2.0 and OpenID Connect) and main integration points for bringing custom apps and SAS Viya together. I also introduced the main decision points before setting up this integration.
  3. The most recent post detailed the OAuth flow choices for access tokens from SAS Logon used for custom application and SAS Viya integration. I outlined some suggestions on when to use each flow in order to retrieve the appropriate scoping for your app.

I would encourage you to check out the previous parts in this series before reading on so we all start on a level playing field. In this post, I will assume you have, and thus are familiar with the concepts of user- and general-scoping. Before we jump into some examples for both of those scenarios, two things to note:

  • Recall for both user-scoped and general-scoped scenarios, the Password authentication flow is technically valid, but the industry is suggesting phasing out the use of this flow (source one) (source two). Accordingly, I won't include examples of it here.
  • This post assumes that you are familiar with the SAS Administration concepts of Custom Groups, granting access to SAS Content, and creating authorization rules. If those topics are unfamiliar to you, and you will be implementing this custom application registration, please take a look at the resources linked inline.

General-scoped access tokens

This refers to scenarios where we don't want end users to have to log in (directly to SAS Viya, at least). You don't want the tokens returned by SAS Logon to your custom applications generating authorization decisions based on the individual using your application. In these cases, however, you do still want control over the endpoints/resources your app calls and which CRUD operations (which map to HTTP methods) it performs on them. The general steps are:

  • Create a Custom Group for the use of the application.
  • Grant necessary access to that Custom Group for the appropriate endpoints and resources.
  • Register a new, unique client to your SAS Viya environment, using the client_credentials grant type, and specifying the created Custom Group as the value for the authorities property.

Detailed steps

  1. As a member of your SAS Viya environment's SAS Administrators group, create a new Custom Group for your application. The naming convention of App.<your_app> is recommended. Using prefixes in this manner logically organizes Custom Groups used for similar purposes together. Consider including a description like the one pictured below.

    (Click to enlarge)

  2. Determine the necessary resource(s), content, and/or endpoint(s) application users need. Next, create the authorization rules to provide appropriate access using the Custom Group you created in the previous step. I've often found this to be an iterative process which is perfectly fine. You (a SAS Admin) can adjust the authorization rules granted to this Custom Group, and thus your app, at any time.
    _

  3. Review the general documentation provided for registering clients to your SAS Viya environment. The following is an adaptation of those instructions, please take note of the differences and be sure to include them as you see below.

    (Click to enlarge)

    For readability, the required data for the last step above was:

    {
        "client_id": "{{YOUR_APP_NAME}}",
        "client_secret": "{{YOUR_SECRET}}",
        "authorized_grant_types": "client_credentials",
        "authorities": "{{CUSTOM_GROUP_FROM_PREVIOUS_STEPS}}",
        "scope": "openid"
    }

    Where scope should include openid for authentication purposes.

    And where authorities will include your Custom Group linking this client with the authorization rules you created. This effectively grants those same permissions to any downstream API requests made by your application when it requests a token in the manner below (next step).
    _

  4. Once registered, here is an example (using Postman) of how your application requests a token using the Client Credentials flow:

    (Click to enlarge)

    _

  5. And here is an example of how your application would make an API call using that token. In this example, an authorization rule was created for this endpoint with Read (GET) permissions for the Custom Group linked to this client:

    (Click to enlarge)

Some important things to note

  • You'll want to use dynamic variables instead of hard coded values as you see in the Postman calls above. This is just for illustrative purposes.
  • If you make a GET request against the client you just registered for your application, you will not get back the client_secret value you specified upon creation. Note that as we saw in Step 4 above, your client will need this value to request it's tokens so plan accordingly.
  • In addition to any values that could change depending on where your application is deployed (including hostnames, etc.), ensure your client_id and client_secret values are secured properly. If you're using Docker, feed an environment variables file to docker-compose.yml. Make sure to include the variables file in your gitignore file. Moreover, if you're using k8s you can always use secrets. Just have a plan and never hard code these into your application.
  • When making requests to SAS Logon Manager for tokens within the context of your registered application as shown in the first screenshot above, the default timeout for the tokens returned is just shy of 12 hours. You can augment that by adding the access_token_validity property with a numeric value in seconds when you register your client. Any subsequent tokens obtained using your client will use that timeout value.
  • OAuth's Refresh Token flow isn't available in combination with the client_credentials grant type. Keep this in mind when deciding on the value to use for the access_token_validity property. You may want to consider adding logic to your application to request a new token before the current one expires. Parsing out the value of the expires_in property dynamically from the original token request response will be helpful in accomplishing that. Keep in mind you will want enough of a time buffer if your application makes a string of endpoint requests or any long-running requests, so your process does not get interrupted by an expired token.

_


User-scoped access tokens

This refers to scenarios where it does make sense for your application's end users to log in to SAS. All subsequent API calls into SAS Viya resources your application makes should be within the context of what they, as an individual user, is allowed to do/see/access in SAS Viya. The authorization decisions for those requests will be based on the union of anything the logged-in user could do while logged into a native SAS Viya UI (see autoapprove below) and any authorization rules that were explicitly granted to the Custom Group(s) selected while registering this client. A SAS Administrator can place all the users of your application in that Custom Group, ensuring a similar level of functionality for those users, while still obeying any more-granular rules that might apply, such as on specific data or content. The general steps are:

  • Create a Custom Group for the use of the application
  • Grant necessary access to that Custom Group for the appropriate endpoints and resources.
  • Register a new, unique client to your SAS Viya environment, using the authorization_code grant type, specifying the created Custom Group as the value in the scope property, and some additional specific properties shown below
  • Include logic in your application to handle the redirection to SAS Logon Manager when needed and to temporarily store authorization codes and OAuth tokens per session/user.

Detailed steps: prepping and registering the client

  1. Perform steps 1 and 2 from the General-scoped tokens section above.
    _

  2. Review the general documentation provided for registering clients to your SAS Viya environment. The following is an adaptation of those instructions, please take note of the differences and be sure to include them as you see below.

    (Click to enlarge)

    For readability, the required data for the last step above was:

     {
        "client_id": "{{YOUR_APP_NAME}}",
        "client_secret": "{{YOUR_SECRET}}",
        "authorized_grant_types": ["authorization_code", "refresh_token"],
        "scope": ["openid", "{{CUSTOM_GROUP_FROM_PREVIOUS_STEPS"],
        "redirect_uri": ["{{URL_SASLOGON_SHOULD_REDIRECT_BACK_TO}}/callback"],
        "autoapprove": "{{TRUE_OR_FALSE_IN_LOWCASE}}" 
    }

    Where scope should include openid for authentication purposes and, for this authorization flow, must also include the Custom Group you created. This is what links your client to any authorization rules you created, effectively granting those same permissions to any downstream API requests made by your application.

    And where redirect_uri includes the URL SAS Logon Manager sends your users back to after they authenticate. It's formatted in a list above in case you want to include both HTTP and HTTPS protocols for development purposes. This is helpful in avoiding edits to this client down the road. Take note of the inclusion of the /callback postfix in the screenshot above. We'll see more on this later.

    And where autoapprove is either true or false, depending on whether or not you want to present the user with a list of Custom Groups they need to approve or deny before proceeding. My personal opinion is to stick with true, not only because it more-closely mimics the behavior already present with native SAS Viya applications. In addition, the user likely won't know the implications of selecting (or not selecting) certain Custom Groups. This could lead to downstream authorization issues for your application.
    _

Detailed steps: partial example for your app's logic

Your app has several requirements pertaining to authentication and authorization. First, it needs to know when to redirect users to SAS Logon Manager (when they're not already logged in or if their session expired). The app also must handle the authorization code returned to your user. Further, it needs to request an OAuth token for your user and how to use that token to make downstream requests to other APIs. Finally, your app needs to revoke tokens when necessary.

This is not a complete example and exactly how this is implemented depends heavily on the language your application was developed in. I don't want to distract from the topic at hand, so I'm not going to include complete code blocks. The application used for this example is a Javascript React web UI and I'll focus on the main order of operations involved in getting this to work. Therefore, I've intentionally removed some syntax to illustrate this in a more agnostic way.

  1. User navigates to our application's home page. Logic determines that a boolean state variable we're setting called isAuthenticated is false (default), so it redirects to the SAS Logon page using the URL below. Note that all items in {{ }} are substituted with environment variables rather than hardcoded. Recall, in the previous step when we registered the client, I mentioned you might want to include /callback after the URL to your application. When you're developing the routes in your code logic, it helps to differentiate between someone who has navigated directly to your / or /home page versus a user that's been redirected there from SAS Logon Manager. You'll see why in the next step.
    https://{{VIYA_HOSTNAME}}/SASLogon/oauth/authorize/response_type=code&client_id={{CLIENT_ID}}&redirect_uri={{URL_POSTFIXED_WITH_/callback}}
  2. This automagically presents the user with the same SAS Logon screen they see when accessing a native SAS Viya application. After they successfully authenticate, the URL redirects them back to our application (we included the /callback postfix in the previous step, so we see that in the redirected URL below) with an extra URL parameter like this (the code will of course be dynamic):
    https://{{YOUR_APPS_HOSTNAME}}/callback?code=pe3vyay7LJ
  3. So now, we substring off of the value of the user's current URL path and store that code in a variable, say code, temporarily. Next, the conditional logic will pick up the fact the user does have a code but isAuthenticated is false, so we kick off another function, feeding the code as an argument.
    _
  4. This function makes the second call to SAS Logon Manager; this time to exchange the user's code for an OAuth token by making a request to:
    https://{{VIYA_HOSTNAME}}/SASLogon/oauth/token/

    _
    With headers similar to the snippet below, again where anything in {{ }} is dynamically set:

    -H "Accept: application/json"
    -H "Content-Type: application/x-www-form-urlencoded"
    -u "{{CLIENT_ID}}:{{CLIENT_SECRET}}" -d "grant_type=authorization_code&code={{CODE}}"
  5. If the request is successful, we then call another function, which uses the js-cookie package in our case, to set a new piece of information in the user's cookie called access_token. This value gets pulled from the result of the previous request. At the same time, we update our isAuthenticated state variable to true.
    _
  6. Now that the state has changed (isAuthenticated is true), the page gets re-rendered in typical React fashion. If our URL included /callback then we know the user came from SAS Logon, and our application logic redirects the user to the main /home page of our application. Now we have the user's OAuth token stored in a cookie for them.

At this point, I think we've seen enough examples to know how to make downstream requests with the token by adding an Authorization header and setting it's value to Bearer {{TOKEN}}, so I'll leave those pieces out from here. Don't forget each of those requests will be made for your application's individual logged-in users, so their unique identities will need to be able to access the endpoints/resources in SAS Viya through appropriate authorization rules.

On a similar note, and because this is getting really long 🙂 , it feels a little out-of-scope to include details on the refresh token logic. Just know, it follows a very similar pattern as above. If you've included refresh_token in the authorized_grant_types list when registering your client, then subsequent token requests made within the context of your client automatically include a refresh_token property in the response you can parse out and use to get a new bearer token. The logout logic includes removing the user's cookie.

Just one more thing: if you have a multi-machine SAS Viya environment then using {{VIYA_HOSTNAME}} might have been confusing. Check your Ansible inventory.ini file and look for the machine specification for the hostgroup [CoreServices].

Thank you!

Whew. That was a doozy. Thanks for sticking with me through this post and the series! I hope this content gives you a better understanding of how to integrate your custom applications with SAS Viya and how to do it securely. Remember to evaluate each application's needs and intended use separately to choose the most appropriate type of scoping. Then use the guidance presented here to choose an OAuth flow so your application makes HTTP REST API calls into the SAS Viya endpoints and resources it needs. Feel free to reach out if you have any questions.