SAS Viya

3月 162020
 

As a long-time SAS 9 programmer, I typically accomplish my data preparation tasks through some combination of the DATA Step, Proc SQL, Proc Transpose and some housekeeping procs like Proc Contents and Proc Datasets. With the introduction of SAS Viya, SAS released a new scripting language called CASL – a language that interacts with SAS Cloud Analytics Services (CAS).

CASL statements include actions, logically organized into action sets based on common functionality. For example, the Table action set allows you to load a table in CAS, view table metadata, change table metadata such as drop or rename a column, fetch (print) sample rows, save or drop a table from CAS, among other things. Steven Sober provides a great overview of CASL in his 2019 SAS Global Forum paper.

Learning CASL is a good idea assuming you want to leverage the power of CAS, because CASL is the language of CAS. While you can continue to use Viya-enabled procs for many of your data processing needs, certain new functionality is only available through CASL. CAS actions also provide a more granular access to options which otherwise may not be available as procedure options. But old habits die hard, and for a while I found myself bouncing between SAS 9.4 and CASL. I'd pull the data down from CAS just to get it to process in the SAS Programming Runtime Environment (SPRE) because it took less effort than figuring out how to get it done properly in CAS.

Then I started a project with a seriously large data set and quickly hit the limit on how much data I could pull down to process in SPRE. And although I could adjust the DATALIMIT option to retrieve more data than the default limit, I was wasting time and server resources unnecessarily moving the data between CAS and SPRE. All this, just so I could process the data “old school.”

I decided to challenge myself to do ALL my data preparation in CASL. I love a good challenge! I started collecting various useful CASL code snippets. In this post, I am sharing the tidbits I’ve accumulated, along with some commentary. Note, you can execute CAS actions from multiple clients, including SAS, Python, R, Lua and Java. Since my objective was to transition from traditional SAS code to CASL, I’ll focus solely on CAS actions from the SAS client perspective. While I used SAS Viya 3.5 for this work, most of the code snippets should work on prior versions as well.

The sections below cover: how to submit CASL code; loading, saving, dropping and deleting data; exploring data; table metadata management; and data transformation. Feel free to jump ahead to any section of interest.

How do you submit CASL code?

You use PROC CAS to submit CASL code from a SAS client. For example:

proc cas;
   <cas action 1>;
   <cas action 2>;
   …;
quit;

Similarly to other interactive procs that use run-group processing, separate CAS actions by run; statements. For example:

proc cas;
   <cas action 1>;
   run;
   <cas action 2>;
   run;
quit;

In fact, you can have the entire data preparation and analysis pipeline wrapped inside a single PROC CAS, passing data and results in the form of CASL variables from one action to the next. It can really be quite elegant.

Moving Data Using PROC CAS

Loading SASHDAT data in CAS

Your data must be in the SASHDAT format for CAS to process it. To load a SASHDAT table into CAS, use the table.loadtable CAS action. The code below assumes your SASHDAT table is saved to a directory on disk associated with your current active caslib, and you are loading it into the same caslib. (This usually occurs when you already performed the conversion to SASHDAT format, but the data has been unloaded. If you are just starting out and are wondering how to get your data into the SASHDAT format in the first place, the next session covers it, so keep reading.)

proc cas; 
     table.loadtable / path="TABLE_NAME.sashdat" casOut="TABLE_NAME"; 
     table.promote /name="TABLE_NAME" drop=true; 
quit;

The table.promote action elevates your newly loaded CAS table to global scope, making it available to other CAS sessions, including any additional sessions you start, or to other users assuming they have the right privileges. I can’t tell you how many times I forgot to promote my data, only to find that my hard-earned output table disappeared because I took a longer coffee break than expected! Don’t forget to promote or save off your data (or both, to be safe).

If you are loading from a directory other than the one associated with your active caslib, modify the path= statement to include the relative path to the source directory – relative to your active caslib. If you are looking to load to a different caslib, modify the casOut= statement by placing the output table name and library in curly brackets. For example:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME" caslib="CASLIB2"};
    table.promote /name="TABLE_NAME" drop=true;
quit;

You can also place a promote=true option inside the casOut= curly brackets instead of calling the table.promote action, like so:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" 
                      casOut={name="TABLE_NAME" caslib="CASLIB2" promote=true};
quit;

Curly brackets are ubiquitous in CASL (and quite unusual for SAS 9.4). If you take away one thing from this post, make it “watch your curly brackets.”

Loading SAS7BDAT, delimited data, and other file formats in CAS

If you have a SAS7BDAT file already on disk, load it in CAS with this code:

proc cas;
    table.loadtable /path="TABLE_NAME.sas7bdat" casout="TABLE_NAME" 
                     importoptions={filetype="basesas"};
quit;

Other file formats load similarly – just use the corresponding filetype= option to indicate the type of data you are loading, such as CSV, Excel, Document (.docx, .pdf, etc.), Image, Video, etc. The impressive list of supported file types is available here.

proc cas;
    table.loadtable / path="TABLE_NAME.csv" casout="TABLE_NAME" 
                      importoptions={filetype="csv"};
    run;
quit;

You can include additional parameters inside the importOptions= curly brackets, which differ by the file type. If you don’t need any additional parameters, use the filetype=”auto” and let CAS determine the best way to load the file.

When loading a table in SAS7BDAT, delimited or some other format, the table.loadtable action automatically converts your data to SASHDAT format.

Loading data in CAS conditionally

Imagine you are building a script to load data conditionally – only if it’s not already loaded. This is handy if you have a reason to believe the data might already be in CAS. To check if the data exists in CAS and load conditionally, you can leverage the table.tableExists action in combination with if-then-else logic. For example:

proc cas;
    table.tableExists result =r / name="TABLE_NAME";
    if r=0  then do;
        table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME"};
        table.promote /name="YOUR_TABLE_NAME" drop=true;
    end;
    else print("Table already loaded");
quit;

Notice that the result=r syntax captures the result code from the tableExists action, which is evaluated before the loadtable and promote actions are executed. If the table is already loaded in CAS, “Table already loaded” is printed to the log. Otherwise, the loadtable and promote actions are executed.

The ability to output CAS action results to a CASL variable (such as result=r in this example) is an extremely powerful feature of CASL. I include another example of this further down, but you can learn more about this functionality from documentation or this handy blog post.

Saving your CAS data

Let’s pretend you’ve loaded your data, transformed it, and promoted it to global scope. You or your colleagues can access it from other CAS sessions. You finished your data preparation, right? Wrong. As the header of this section suggests, you also need to save your prepared CAS data. Why? Because up to this point, your processed and promoted data exists only in memory. You will lose your work if your SAS administrator reboots the server or restarts the CAS controller. If you need to quickly reload prepared data, you must back it up to a caslib’s data source. See the CAS data lifecycle for more details.

To save off CAS data, naturally, you use the table.save action. For example:

proc cas;
    table.save / table="TABLE_NAME" name="TABLE_NAME.sashdat" replace=true;
quit;

In this example, you save off the CAS table to disk as a SASHDAT file, defaulting to the location associated with your active caslib. You can modify the table.save parameters to save or export the data to an alternative data storage solution with full control over the file format (including but not limited to such popular options as HDFS, Oracle, SQL Server, Salesforce, Snowflake and Teradata), compression, partitioning and other options.

Dropping and deleting data

To drop a table from CAS, execute a table.droptable action. For example:

proc cas;
    table.droptable / name="TABLE_NAME" quiet=true;
quit;

The quiet=true option prevents CAS from generating an error if the table does not exist in CAS. Dropping a table deletes it from memory. It’s a good practice to drop tables you no longer need, particularly the one you have promoted. Local-scope tables disappear on their own when the session expires, whereas global tables will stay in memory until they are unloaded.

Dropping a table does not delete the underlying source data. To delete the source of a CAS table, use the table.deleteSource action. For example:

proc cas;
    table.deletesource / source="TABLE_NAME.sashdat" quiet=true;
quit;

Exploring Data Using PROC CAS

After taking a close look at moving the data using PROC CAS, let’s look at some useful ways to start exploring and manipulating CAS data.

Fetching sample data

When preparing data, I find it useful to look at sample data. The table.fetch action is conceptually similar to PROC PRINT and, by default, outputs the first 20 rows of a CAS table:

proc cas;
table.fetch / table="Table_Name";
quit;

You can modify the table.fetch options to control which observations and variables to display and how to display them. For example:

proc cas;
table.fetch / table={name="TABLE_NAME" where="VAR1 in ('value1','value2')"},              /*1*/
	      orderby={{name="VAR1"},                                                     /*2*/
                             {name="VAR2", order="descending"}
                             },	
	     fetchvars={{name="VAR1", label="Variable 1"},                                /*3*/
	                     {name="VAR2", label="Variable 2"}, 
 		             {name="VAR3", label="Variable 3", format=comma12.1}
                             },
	     to=50,                                                                       /*4*/
	     index=false;							          /*5*/
quit;

In the code snippet above:

  • #1 – where= statement limits the records to those meeting the where criteria.
  • #2 – orderby= option defines the sort order. Ascending is the default and is not required. If sorting by more than one variable, put them in a list inside curly brackets, as shown in this example. If a list item has a subparameter (such as order= here), encase each item in curly brackets.
  • #3 – fetchvars= option defines the variables to print as well as their display labels and formats. If you select more than one variable, put them in a list inside curly brackets, as shown here. And again, if a list item includes a subparmeter, then enclose each list item in curly brackets.
  • #4 – to= option defines the number of rows to print.
  • #5 – index= false option deactivates the index column in the output (the default is index=true). This is similar to the noobs option in PROC PRINT.

As mentioned earlier, make sure to watch your curly brackets!

Descriptive statistics and variable distributions

The next step in data exploration is looking at descriptive statistics and variable distributions. I would need a separate blog post to cover this in detail, so I only touch upon a few of the many useful CAS actions.

To look at statistics for numeric variables, use the simple.summary action, which computes standard descriptive statistics, such as minimum, maximum, mean, standard deviation, number missing, and so on. For example:

proc cas;
    simple.summary / table="TABLE_NAME";
quit;

Among its other features, the simple.summary action allows analysis by one or more group-by variables, as well as define the list of desired descriptive statistics. For example:

proc cas;
simple.summary / table={name="TABLE_NAME", groupBy="VAR1", vars={"NUMVAR1","NUMVAR2”}},
                 subSet={"MAX", "MIN", "MEAN", "NMISS"};
quit;

Another useful action is simple.topK, which selects the top K and bottom K values for variables in a data set, based on a user-specified ranking order. The example below returns the top 5 and bottom 5 values for two variables based on their frequency:

proc cas;
simple.topk / table="TABLE_NAME" 
              aggregator="N",                        
              inputs={"VAR1","VAR2"},
              topk=5,
              bottomk=5;
quit;

Simple is a rich action set with heaps of useful options covered in the documentation.

You may be wondering – what about crosstabs and frequency tables? The simple action set includes freq and crosstab actions. In addition, the action closely imitating the functionality of the beloved PROC FREQ is freqTab.freqTab. For example, the code snippet below creates frequency tables for VAR1, VAR2 and a crosstab of the two.

proc cas;
freqtab.freqtab / table="TABLE_NAME"
                  tabulate={"VAR1","VAR2",
                  {vars={"VAR1","VAR2"}}
                  };
quit;

Managing CAS Table Variables

Changing table metadata

One of the basic tasks after exploring your data is changing table metadata, such as dropping unnecessary variables, renaming tables and columns, and changing variable formats and labels. The table.altertable action helps you with these housekeeping tasks. For example, the code snippet below renames the table, drops two variables and renames and changes labels for two variables:

proc cas;
    table.altertable / table="TABLE_NAME" rename="ANALYTIC_TABLE"
                       drop={"VAR1",”VAR2”}
                       columns={{name="VAR3" rename="ROW_ID" label="Row ID"},
                                {name="VAR4" rename="TARGET" label="Outcome Variable"}
                                }
                       ;
quit;

Outputting variable list to a data set

Another useful trick I frequently use is extracting table columns as a SAS data set. Having a list of variables as values in a data set makes it easy to build data-driven scripts leveraging macro programming. The code snippet below provides an example. Here we encounter another example of capturing action result as a CASL variable and using it in further processing – I can’t stress enough how helpful this is!

proc cas;
    table.columninfo r=collinfo / table={name="TABLE_NAME"};       /*1*/
    collist=collinfo["ColumnInfo"];                                /*2*/
    saveresult collist casout="collist";                           /*3*/
quit;

In the snippet above:

  • #1 - the columninfo action collects column information. The action result is passed to a CASL variable collinfo. Notice, instead of writing out result=, I am using an alias r =.
  • #2 - the portion of the a CASL variable collinfo containing column data is extracted into another CASL variable collist.
  • #3 - the saveresult statement sends the data to a CAS table collist. If you want to send the results to a SAS7BDAT data set, replace casout= with dataout=, and provide the library.table_name information.

Transforming the Data

Lastly, let’s look at some ways to use CAS actions to transform your data. Proc SQL and DATA step are the two swiss-army knives in SAS 9 developers’ toolkit that take care of 90% of the data prep. The good news is you can execute both DATA Step and SQL directly from PROC CAS. In addition, call the transpose action to transpose your data.

Executing DATA Step code

The dataStep.runCode action enables you to run DATA step code directly inside PROC CAS. You must enclose your DATA step code in quotation marks after the code= statement. For example:

proc cas;
    dataStep.runCode /
    code="
        data table_name;
        set table_name;
        run;
        ";
quit;

Running DATA step code in CAS allows access to sophisticated group-by processing and the use of such popular programming techniques as first- and last-dot. Refer to the documentation for important nuances related to processing in a distributed, multi-threaded environment of CAS.

Executing FedSQL

To run SQL in CASL, use the fedSQL.execDirect action. Enclose the SQL query in quotation marks following the query= statement. Optionally, you can use the casout= statement to save the results to a CAS table. For example:

proc cas;
    fedsql.execDirect/
    query=
         "
          select  *
          from TABLE1 a  inner join TABLE2 b
          on a.VAR1 = b.VAR1
         "
    casout={name="TABLE3", replace=True};
quit;

Similarly to DATA step, be aware of the many nuances when executing SQL in CAS via FedSQL. Brian Kinnebrew provides an excellent overview of FedSQL in his SAS Communities article, and the documentation has up-to-date details on the supported functionality.

Transposing data

Transposing data in PROC CAS is a breeze. The example below uses transpose.transpose action to restructure rows into columns.

proc cas;
    transpose.transpose /
          table={name="TABLE_NAME", groupby={"VAR1"}}
          transpose={"VAR2"}
          id={"VAR3"}
          prefix="Prefix"
    casout={name="TRANSPOSED" replace=true};
run;

You can transpose multiple variables in the same transpose action. Simply place additional variables inside the curly brackets following transpose=, in quotes, separated by a comma.

Conclusion

PROC CAS is a wrapper procedure enabling you to leverage SAS’ new programming language - CASL. CASL enables you to submit CAS actions directly to SAS Cloud Analytic Services engine from a SAS client. This post provided examples of loading, managing, exploring and transforming your data through CAS actions. Certain new functionality in CAS is only available through CAS actions, so getting comfortable with CASL makes sense. Fear not, and let the curly brackets guide the way 😊.

Acknowledgement

I would like to thank Brian Kinnebrew for his thoughtful review and generous help on my journey learning CASL.

Challenge accepted: Learning data prep in CASL was published on SAS Users.

3月 052020
 

Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and Machine Learning. Download it today and keep reading for an excerpt from this free e-book!

In SAS Studio, you can access tasks that help automate your programming so that you do not have to manually write your code. However, there are three options for manually writing your programs in SAS® Viya®:

  1. SAS Studio provides a SAS programming environment for developing and submitting programs to the server.
  2. Batch submission is also still an option.
  3. Open-source languages such as Python, Lua, and Java can submit code to the CAS server.

In this blog post, you will learn the syntax for two of the new, advanced data mining and machine learning procedures: PROC TEXTMINE and PROCTMSCORE.

Overview

The TEXTMINE and TMSCORE procedures integrate the functionalities from both natural language processing and statistical analysis to provide essential functionalities for text mining. The procedures support essential natural language processing (NLP) features such as tokenizing, stemming, part-of-speech tagging, entity recognition, customized stop list, and so on. They also support dimensionality reduction and topic discovery through Singular Value Decomposition.

In this example, you will learn about some of the essential functionalities of PROC TEXTMINE and PROC TMSCORE by using a text data set containing 1,830 Amazon reviews of electronic gaming systems. The data set is named Amazon. You can find similar data sets of Amazon reviews at http://jmcauley.ucsd.edu/data/amazon/.

PROC TEXTMINE

The Amazon data set has already been loaded into CAS. The review content is stored in the variable ReviewBody, and we generate a unique review ID for each review. In the proc call shown in Program 1 we ask PROC TEXTMINE to do three tasks:

  1. parse the documents in table reviews and generate the term by document matrix
  2. perform dimensionality reduction via Singular Value Decomposition
  3. perform topic discovery based on Singular Value Decomposition results

Program 1: PROC TEXTMINE

data mycaslib.amazon;
    set mylib.amazon;
run;

data mycaslib.engstop;
    set mylib.engstop;
run;

proc textmine data=mycaslib.amazon;
    doc_id id;
    var reviewbody;

 /*(1)*/  parse reducef=2 entities=std stoplist=mycaslib.engstop 
          outterms=mycaslib.terms outparent=mycaslib.parent
          outconfig=mycaslib.config;

 /*(2)*/  svd k=10 svdu=mycaslib.svdu outdocpro=mycaslib.docpro
          outtopics=mycaslib.topics;

run;

(1) The first task (parsing) is specified in the PARSE statement. Parameter “reducef” specifies the minimum number of times a term needs to appear in the text to be included in the analysis. Parameter “stop” specifies a list of terms to be excluded from the analysis, such as “the”, “this”, and “that”. Outparent is the output table that stores the term by document matrix, and Outterms is the output table that stores the information of terms that are included in the term by document matrix. Outconfig is the output table that stores configuration information for future scoring.

(2) Tasks 2 and 3 (dimensionality reduction and topic discovery) are specified in the SVD statement. Parameter K specifies the desired number of dimensions and number of topics. Parameter SVDU is the output table that stores the U matrix from SVD calculations, which is needed in future scoring. Parameter OutDocPro is the output table that stores the new matrix with reduced dimensions. Parameter OutTopics specifies the output table that stores the topics discovered.

Click the Run shortcut button or press F3 to run Program 1. The terms table shown in Output 1 stores the tagging, stemming, and entity recognition results. It also stores the number of times each term appears in the text data.

Output 1: Results from Program 1

PROC TMSCORE

PROC TEXTMINE is used with large training data sets. When you have new documents coming in, you do not need to re-run all the parsing and SVD computations with PROC TEXTMINE. Instead, you can use PROC TMSCORE to score new text data. The scoring procedure parses the new document(s) and projects the text data into the same dimensions using the SVD weights derived from the original training data.

In order to use PROC TMSCORE to generate results consistent with PROC TEXTMINE, you need to provide the following tables generated by PROC TEXTMINE:

  • SVDU table – provides the required information for projection into the same dimensions.
  • Config table – provides parameter values for parsing.
  • Terms table – provides the terms that should be included in the analysis.

Program 2 shows an example of TMSCORE. It uses the same input data layout used for PROC TEXTMINE code, so it will generate the same docpro and parent output tables, as shown in Output 2.

Program 2: PROC TMSCORE

Proc tmscore data=mycaslib.amazon svdu=mycaslib.svdu
        config=mycaslib.config terms=mycaslib.terms
        svddocpro=mycaslib.score_docpro outparent=mycaslib.score_parent;
    var reviewbody;
    doc_id id;
run;

 

Output 2: Results from Program 2

To learn more about advanced data mining and machine learning procedures available in SAS Viya, including PROC FACTMAC, PROC TEXTMINE, and PROC NETWORK, you can download the free e-book, Exploring SAS® Viya®: Data Mining and Machine Learning. Exploring SAS® Viya® is a series of e-books that are based on content from SAS® Viya® Enablement, a free course available from SAS Education. You can follow along with examples in real time by watching the videos.

 

Learn about new data mining and machine learning procedures in SAS Viya was published on SAS Users.

3月 022020
 

Growing up, I was often reminded to turn off the lights. In my home, this was a way of saving on a key resource (electricity) that we had to pay for when not using it. It was a way of being a good steward with the family’s money and targeting it to run the lights and other things in our home. This allowed us to go about our daily tasks and get them done when we needed to.

These days I have the same goal in my own home, but I’ve automated the task. I have a voice assistant that, with a few select words, will turn off (or on) the lights in the rooms that I use most. The goal and the reasoning are the same, but the automation allows me to take it to another level.

In a similar way, automation today allows us to optimize the use of compute resources in a way that we haven’t been able to do in the past. The degree to which we can switch on and off the systems required to run our compute workloads in cloud environments, scale to use more, or fewer, resources depending on demand, and only pay for what we need, is a clear indicator of just how much infrastructure technology has evolved in recent years.

Like the basic utilities we rely on in our homes, SAS, with its analytics and modeling capabilities has become an essential utility for any business that wants to not only make sense of data but turn it into the power to get things done. And, like any utility necessary to do business, we want it working quickly at the flip of a switch, easily made available anywhere we need it, and helping us be good stewards of our resources.

Containers and the related technologies can help us achieve all of this in SAS

A container can most simply be thought of as a self-contained environment with all the programs, configuration, initial data, and other supporting pieces to run applications. This environment can be treated as a stand-alone unit, ready to turn on and run at any time, much in the way your laptop is a stand-alone system. In fact, this sort of “portable machine” analogy can be a good way to think about containers at a high level – a complete virtual system containing, and configured for running, one or more targeted application(s) – or components of an application.

Docker is one of the oldest, and most well-known, applications for defining and managing containers. Defining a container is done by first defining an image. The image is an immutable (static) set of the software, environment, configuration, etc. that serves as a template for containers to be created from it. Each image, in turn, is composed of layers that apply some setting, software, or data that a container based on the image will need.
Have you ever staged a new machine for yourself, your company, a friend or relative? If so, you can relate the “layering” of software you installed and configured to the layers that go into an image. And the image itself might be like the image of software you created on the disk of the system. The applications are stored there, fully configured and ready to go, whenever you turn the system on.

Turning an image into a container is mostly just adding another layer on top of the present image layers. The difference is this “container layer” can be modified as needed – have things written to it, updated, etc. If you think about the idea of creating a user profile along with its space on a system you staged, it’s a similar idea. Like that dedicated user space on the laptop or desktop, the layer that gets added to an image to make a container, is there for the running system to use and customize as needed. This is like how the user area is there to use and customize as needed when a PC is turned on and running.

Containers, Kubernetes, and cloud environments

It is rare that any corporate system today is managed with only a single PC. Likewise, in the world of cloud and containerized environments, it is rare that any software product is run with only a single container. More commonly, applications consist of many containers organized to address specific application areas (such as web interfaces, database management, etc.) and/or architectural designs to optimize resource use and communication paths in the system (microservices).

Having the advantages that are derived from either multiple PCs or multiple containers also requires a way to manage them and ensure reliability and robustness for our applications and customers. For the machines, enterprises typically rely on data centers. Data centers play a key role, ensuring systems are kept up and running, replaced when broken, and are centrally accessible. As well, they may be responsible for bringing more systems online to address increased loads or taking some systems offline to save costs.

For containers, we have applications that function much like a “data center for containers.” The most prominent one today is Kubernetes (also known as “K8S” for the eight letters between “K” and “S”). Kubernetes’ job is to simplify deployment and management of containers and containerized workloads. It does this by automating key needs around containers, such as deployment, scaling, scheduling, healing, monitoring, and more. All of this is managed in a “declarative” way where we no longer must tell the system “how” to get to the state we want – we instead tell it “what” state we want, and it ensures that state is met and preserved.

The combination of containers, Kubernetes, and cloud environments provides an evolutionary jump in being able to control and leverage the infrastructure and runtime environments that you run your applications in. And this gives your business a similar jump in being able to provide the business value targeted to meet the environments, scale, and reliability that your customers demand - while having the automatic optimization of resources and the automatic management of workloads that you need to be competitive.

Harness decades of expertise with SAS® Viya® 4.0

Viya 4.0 provides this same evolutionary jump for SAS. Now, your SAS applications and workloads can be run in containers, Kubernetes, and cloud environments natively. Viya 4 builds on the award-winning, best-in-class analytics to allow data scientists, business executives, and decision makers at all levels to harness the decades of SAS expertise running completely in containers, and tightly integrated with Kubernetes and the cloud.
Viya 4.0 brings all the key SAS functionalities you’d expect – modeling, decision-making, forecasting, visualization, and more – to the cloud and enterprise cloud environments, along with the advantages of running in a containerized model. It also leverages the robust container management, monitoring, self-healing, scaling and other aspects of Kubernetes. This is all guaranteed to make you more in control and less reliant on being in your data center to manage these kinds of activities.

Just remember to turn the lights off.

LEARN MORE | AN INTRO TO SAS FOR CONTAINERS

Automation with Containers: the Power to Get Things Done was published on SAS Users.

2月 292020
 

Stored processes were a very popular feature in SAS 9.4. They were used in reporting, analytics and web application development. In Viya, the equivalent feature is jobs. Using jobs the same way as stored processes was enhanced in Viya 3.5. In addition, there is preliminary support for promoting 9.4 stored processes to Viya. In this post, I will examine what is sure to be a popular feature for SAS 9.4 users starting out with Viya.

What is a job? In Viya, that is a complicated question as there are a number of different types of jobs. In the context of the log, we are talking about jobs that can be created in SAS Studio or in the Job Execution application. A SAS Viya job consists of a program and its definition (metadata related to the program). The job definition includes information such as the job name, the author and when it was created. After you have created a job definition, you have an execution URL that you can share with others at your site. The execution URL can be entered into a web browser and ran without opening SAS Studio, or you can run a job directly from SAS Studio. When running a SAS job, SAS Studio uses the SAS Job Execution Web Application.

In addition, you can create task prompt(XLM) or an HTML form to provide a user interface to the job. When the user selects an option in the prompt and submits the job, the data specified in the form or task prompt is passed to a SAS session as global macro variables. The SAS program runs and the results are returned to the web browser. Sounds a lot like a stored process!

For more information on authoring jobs in SAS Studio, see SAS® Studio 5.2 Developer’s Guide: Working with Jobs.

With the release of Viya 3.5, there is preliminary support for the promotion of 9.4 Stored Processes to Viya Jobs. For more information on this functionality, see SAS® Viya® 3.5 Administration: Promotion (Import and Export).

How does it work? Stored processes are exported from SAS 9.4 using the export wizard or on the command-line and the resultant package can be imported to Viya using SAS Environment Manager or the sas-admin transfer CLI. The import process converts the stored processes to job definitions which can be run in SAS Studio or directly via a URL.

The new job definition includes the job metadata, SAS code and task prompts.

1. Job Metadata

The data is not included in the promotion process. However, it must be made available to the Viya compute server so that the job code and any dynamic prompts can access it. There are two Viya server contexts that need to have access to the data:

  • Job Execution Context: used when running jobs via a URL
  • SAS Studio Context: used when running jobs from SAS Studio

To make the data available, the compute server has to access it via a libname and the table must exist in the library.

To add a libname to the SAS Job Execution Compute context in SAS Environment Manager, select Contexts > Compute contexts and SAS Job Execution Compute context. Then select Edit > Advanced.

In the box labelled, “Enter each autoexec statement on a new line:” add a line for the libname. If you keep the same 8 character libname as in 9.4, you will have less to change in your code and prompts.

NOTE: the libname could be added to the /opt/sas/viya/config/etc/compsrv/default/autoexec_usermods.sas file on the compute server. While this is perhaps easier than adding to each context that requires it, this would make it available to every Viya compute process.

2. SAS Code

The SAS code that makes up the stored process is copied without modification to the Viya job definition. In most cases, the code will need to be edited so that it will run in the Viya environment. The SAS code in the definition can be modified using SAS Studio or the SAS Job Execution Web Application. Edit the code so that:

  • Any libnames in the new job code to point to the location of the data in Viya
  • Any other SAS 9 metadata related code is removed from the stored process

As you test your new jobs in Viya, additional changes may be needed to get them running.

3. Task Prompts

Prompts that were defined in metadata in the SAS 9 stored process are converted to the task prompts and stored within the job definition. The xml for the prompt definition can be accessed and edited from SAS Studio or the SAS Job Execution Web App. For more information on working with task prompts in Viya, see the SAS Studio Developers guide.

If you have shared prompts you want to include, it is recommended that you select, “Include dependent objects” when exporting from SAS 9.4. If you do not select this option, any shared prompts will be omitted from the package. If the shared prompts, libnames and tables are included in the package with the stored processes, the SAS 9 metadata based library and table definitions referenced in dynamic prompts will convert to use a library.table reference in Viya. When this happens, the XML for the prompt will include the libname.tablename of the dataset used to populate the prompt values. For example:

<datasource active=”true” name=”DataSource2″
defaultValue=”FINDATA.FINANCIAL__SUMMARY” name=”DataSource1″”>

If the libnames and tables are not included in the package, the prompt will show a url mapping to the tables location in 9.4 metadata. For example:

<DataSource active=”true” name=”DataSource2″ url=”/gelcorp/financecontent/Data/Source Data/FINANCIAL__SUMMARY(Table)”>

For the prompt to work in the latter case, you need to edit it and provide the libname.table reference in the same way as it is shown in the first example.

Including libraries and tables in a package imported to Viya results in folders that contain libraries and tables in 9.4 created in Viya. This may result in Viya folders that are not needed, because data does not reside in folders in Viya. As an administrator, you can choose:

  • Include the dependent data and tables in the package and clean up any extra folders after promotion
  • Exclude the dependent data and tables in the package and edit the data source in prompt xml to reference the libname.table (this is not a great option if you have shared prompts)

Issues encountered converting SAS 9 prompts to Viya SAS Prompt Interface will be cause warning messages to be included at the beginning of the xml defining the prompt interface.

As I mentioned earlier, you can run the job from SAS Studio or from the Job Execution Web Application. You can also run the job from a URL just like you could with stored processes in SAS 9.4. To get the URL for the job, select the job properties of in SAS Studio.

Earlier releases of Viya provided a different way to support stored processes. This consists of enabling access to the 9.4 stored process server and its stored processes in Viya. This approach is still supported in Viya 3.5 because, while jobs can replace some stored processes, they can not currently be embedded in a Viya Visual Analytics Report.

For more information, please check out:

Jobs: Stored processes in Viya was published on SAS Users.

2月 172020
 

Administrators like to be able to keep track of resource usage and who is using what in a system. When an administrator has this capability, they can look out for issues of high resource usage that may have an impact on overall system performance. In Viya, data is accessed in caslibs. In this post, I will show you how an administrator can track and control resource usage of personal caslibs.

A Caslib is an in-memory space to hold tables, access control lists, and data source information. In GEL enablement classes as we have discussed CAS and caslibs, one of the big asks we have had was related to personal caslibs and how can an administrator keep track of the resources that they use. Until recently we didn’t have a great answer, the good news is now we do.

Personal caslibs are, by default, the active caslib when a user starts a CAS session. This gives each user a default location to access and load data (sort of like a home directory). As the name suggests, they are personal and only the user who starts the session can access the data. In early releases, administrators saw this as a big problem because personal caslibs were basically invisible to them. There was no way to monitor what was going on with a personal caslib, leaving the system open to issues where one user could consume a lot of resources and have an adverse impact.

Introduced in Viya 3.4, the accessControl.accessPersonalCaslibs action brings all existing personal Caslibs into a session where an administrator has assumed the data or superuser role.

Running the accessControl.accessPersonalCaslibs action has the following effects. The administrator can:

  • See all personal caslibs that existed at the time the action was run
  • View promoted tables characteristics within the personal caslibs
  • Drop promoted tables within other users’ personal caslibs.

This elevation of access remains in effect for the duration of the session. The action does not give an administrator full access to personal caslibs, an administrator can never fetch data from other users’ personal caslibs, drop any personal caslib, set access controls on any personal caslib, or set access controls on any table in any personal caslib. What it does is give the administrator a view into personal caslibs to be able to monitor and troubleshooting their resource usage.

Let’s look at how it works. Logged into Viya as Viya administrator (by default also a CAS administrator), I can use the table.caslibinfo action to see all the caslibs that the administrator has permission to view. In the output below, I see my own personal caslib, and all the other caslibs that the administrator has permissions (set by the CAS authorization system) to view.

cas mysess;
proc cas;
table.caslibinfo;
quit;
cas mysess terminate;

In the code below, the super user role is assumed for this session (SAS Administrators by default can also assume the super user role in CAS). With the role assumed, the administrator can execute the accessControl.accessPersonalCaslibs action and the subsequent table.caslibinfo action returns all caslibs including any personal caslibs that existed when the session started.

cas adminsession;
proc cas;
/* need to be a super user or data administrator */
accessControl.assumeRole / adminRole="superuser";
accessControl.accessPersonalCaslibs;
table.caslibinfo ;
quit;
cas adminsession terminate;

That helps, but what about the details? How can the administrator see how many resources the tables in a personal CASLIB are using? To get the details, we can access an individual CASLIB and its tables, and for each table, execute the table.tabledetails action. The program below will loop all the personal caslibs and, for each of the caslibs, it will loop the in-memory tables and execute the table.tabledetails action. The output of tabledetails gives an idea of how many resources (memory, disk etc.) a table is using.

cas adminsession;
proc cas;
/* need to be a super user */
accessControl.assumeRole / adminRole=”superuser”;
accessControl.accessPersonalCaslibs;table.caslibinfo result=fileresult;
casliblist=findtable(fileresult);
 
/* loop caslibs */
do cvalue over casliblist;
 
if cvalue.name==: ‘CASUSER’ then
do; /* only look at caslibs that contain CASUSER */
 
table.tableinfo result=tabresult / caslib=cvalue.name;
 
tablelist=findtable(tabresult);
x=dim(tablelist);
 
if x>1 then
do; /* there are tables available */
 
do tvalue over tablelist; /* loop all tables in the caslib */
 
table.tabledetails / caslib=cvalue.name name=tvalue.name;
table.tableinfo / caslib=cvalue.name name=tvalue.name;
 
end; /* loop all tables in the caslib */
end; /* there are tables available */
end; /* only look at caslibs that contain CASUSER */
end; /* loop caslibs */
 
accessControl.dropRole / adminRole=”superuser”;
quit;
cas adminsession terminate;

Two fields that can give an administrator an idea of how big a table is are:

  • Data size: the size of the SAS Dataset in memory
  • Memory Mapped: the part of the data that has been “memory mapped” and is backed up in the CAS Disk Cache memory-mapped files.

The table below show the output for one users personal caslib.

If one table in particular is causing problems, it is possible for the administrator to drop the table from memory.

cas adminsession;
proc cas;
 
accessControl.assumeRole / adminRole=”superuser”;
accessControl.accessPersonalCaslibs;
sessionProp.setSessOpt / caslib=”CASUSER(gatedemo499)”;
table.droptable / name=”TRAIN”;
 
quit;
cas adminsession terminate;

Visibility into personal caslibs will be a big help to Viya administrators monitoring CAS resource usage. Check out the following for more details:

Viya administrators can now get personal with users' Caslibs was published on SAS Users.

1月 292020
 

Editor's note: This post is the first in our Young Data Scientists series, so be sure to check back for future posts! -------------------------------------------------------------------------------------------------------------------------------------------- Stefan Stoyanov is a Business Analytics and Research Intern at Boemska and a 2020 SAS Student Ambassador. He’s an MSc Business Analytics student with a passion for [...]

Data science gives you power to change the world was published on SAS Voices by Jelena Stankovic

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

11月 152019
 

“The future is already here — it's just not very evenly distributed.”  ~ William Gibson, author The same can be said for climate change – global warming is here, in a big way, but its effects are still an arm's length away for many of us. How is climate change [...]

Climate change: It's all about the CO2 was published on SAS Voices by Leo Sadovy

11月 152019
 

Designing interactive reports can be a fun and unique challenge. As user interface experience designers can attest, there are several aspects that go into developing a successful and effective self-service tool. Granted I’m not designing the actual software, but reports require a similar approach to be sure that visualizations are clear and that users can get to the answers they are looking for. Enter prompts.

Reports prompt users to better understand trends, how their data points compare to the whole, and to narrow the scope of data. Being able to pick the placement of these prompts quickly and easily will open the possibilities of your report layouts! I’m specifically speaking about Report and Page level prompts. Traditionally, these global prompt controls were only able to be placed at the top; see the yellow highlighted areas below.

Let’s take a look at an example report with the traditional Report and Page prompt layout. The Report prompts are extremely easy to pick out, since they sit above the pages, but the Page prompts can sometimes blend in with other prompts contained in the report body.

Introduced in the SAS Visual Analytics 8.4 release is the ability to control the layout position of these prompts. Using my example report, let’s change the placement of these prompts. In Edit mode, open the Options pane and use the top level drop-down to select the report name. This will activate the report level, and the report level Options will display. Next, under the Report Controls subgroup, move the placement radio button to the west cardinal point.

Depending on the type of control objects you are using in your report, you may not like this layout yet. For instance, you can see here that my date slider is taking up too much space.

When you activate the slide control, use the Options pane to alter the Slider Direction and Layout. You can even use the Style option to change the font size. You can see that after these modifications, the Report prompt space can be configured to your liking.

Next, let’s change the placement for the Page prompts, for demonstration purposes. From the Options pane, use the top drop-down to select the page name. This will activate the page level, and the page level Options will display. Next, under the Page Controls subgroup, move the placement radio button to the west cardinal position.

You can see that the direction of the button bar control was automatically changed to vertical. Now we can clearly see which prompts belong to the page level.

If I switch to view mode, and adjust the browser size, you can get a better feel for the Report and Page prompt layout changes.

But as with many things, just because you can, doesn’t mean you should. This is where the report designer’s creativity and style can really take flight. Here is the same report, but with my preferred styling.

Notice that I kept the Report prompts along the top but moved the Page prompts to the left of the report. I also added two containers and configured a gray border for each container to better separate the objects. This helps the user quickly see that the drop-down will filter the word cloud is only. I also used the yellow highlighting through styling and a display rule to emphasize the selected continent. The bar chart is fed from an aggregated data source which is why the report prompt is not filtering out the other continents.

Feel free to send me your latest report design ideas!

Additional material related to Report and Page prompts:

New control prompt placement option in SAS Visual Analytics was published on SAS Users.

11月 052019
 

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various database options available in Microsoft Azure. Access all the articles in the series here.

The series

This is the next iteration of a series covering database as a service (DBaaS) and storage offerings in the cloud, this time from Microsoft Azure. I have already published two articles on Amazon Web Services. One of those articles covers the DBaaS offerings and the other covers storage offerings for Amazon Web Services. I will cover Google Cloud Platform in future articles. The goal of these articles is to supply a breakdown of these services to better understand the business requirements of these offerings and how they relate to SAS. I would encourage you to read all the articles in the series even if you are already using a specific cloud provider. Many of the core technologies and services are offered across the different cloud providers. These articles focus primarily on SAS Data Connectors as part of SAS Viya, but all the same functionality is available using a SAS/ACCESS Interface in SAS 9.4. SAS In-Database technologies in SAS Viya, called the SAS Data Connect Accelerator, are synonymous with the SAS Embedded Process.

As companies move their computing to the cloud, they are also moving their storage to the cloud. Just like compute in the cloud, data storage in the cloud is elastic and responds to demand while only paying for what you use. As more technologies move to a cloud-based architecture, companies must consider questions like: Where do I store my data? What cloud services best meet my business requirements? Which cloud vendor should I use? Can I migrate my applications to the cloud? If you are looking to migrate your SAS infrastructure to Azure, look at the SAS Viya QuickStart Template for Azure to see a rapid deployment pattern to get the SAS Viya platform up and running in Azure.

SAS integration with Azure

SAS has extended SAS Data Connectors and SAS In-Database Technologies support to Azure database variants. A database running in Azure is much like your on-premise database, but instead Microsoft manages the software and hardware. Azure’s DBaaS offerings takes care of the scalability and high availability of the database with minimal user input. SAS integrates with your cloud database even if SAS is running on-premise or with a different cloud provider.

Azure databases

Azure offers database service technologies familiar to many users. If you read my previous article on SAS Data Connectors and Amazon Web Services, you are sure to see many parallels. It is important to understand the terminology and how the different database services in Azure best meet the demands of your specific application. Many common databases already in use are being refactored and provided as service offerings to customers in Azure. The advantages for customers are clear: no hardware to manage and no software to install. Databases that scale automatically to meet demand and software that updates and creates backups means customers can spend more time creating value from their data and less time managing their infrastructure.

For the rest of this article I cover various database management systems, the Azure offering for each database type, and SAS integration. First let's consider the diagram below depicting a decision flow chart to determine integration points between Azure database services and SAS. Trace you path in the diagram and read on to learn more about connection details.

Integration points between Azure database services and SAS

Relational Database Management System (RDBMS)

In the simplest possible terms, an RDBMS is a collection of managed tables with rows and columns. You can divide relational databases into two functional groups: online transaction processing (OLTP) and online analytical processing (OLAP). These two methods serve two distinct purposes and are optimized depending in how you plan to use the data in the database.

Transactional Databases (OLTP)

Transactional databases are good at processing reads, inserts, updates and deletes. These queries usually have minimal complexity, in large volumes. Transactional databases are not optimized for business intelligence or reporting. Data processing typically involves gathering input information, processing the data and updating existing data to reflect the collected and processed information. Transactional databases prevent two users accessing the same data concurrently. Examples include order entry, retail sales, and financial transaction systems. Azure offers several types of transactional database services. You can organize the Azure transactional database service into three categories: enterprise licenses, open source, and cloud native.

Enterprise License

Many customers have workloads built around an enterprise database. Azure is an interesting use case because Microsoft is also a traditional enterprise database vendor. Amazon, for example, does not have existing on-premise enterprise database customers. Oracle cloud is the other big player in the enterprise market looking to migrate existing customers to their cloud. Slightly off topic, but it may be of interest to some, SAS does support customers running their Oracle database on Oracle Cloud Platform using their SAS Data Connector to Oracle. Azure offers a solution for customers looking to continue their relationship with Microsoft without refactoring their existing workflows. Customers bring an existing enterprise database licenses to Azure and run SQL Server on Virtual Machines. SAS has extended SAS Data Connector support for SQL Server on Virtual Machines. You can also use your existing SAS license for SAS Data Connector to Oracle or SAS Data Connector to Microsoft SQL Server to interact with SQL Server on Virtual Machines.

Remember you can install and manage your own database on a virtual machine. For example, support for both SAS Data Connector to Teradata and SAS Data Connect Accelerator for Teradata is available for Teradata installed on Azure. If there is not an available database as a service offering, the traditional backup and update responsibilities are left to the customer.

SQL Server Stretch Database is another service available in Azure. If you are not prepared to add more storage to your existing on-premise SQL Server database, you can add capacity using the resources available in Azure. SQL Server Stretch will scale your data to Azure without having to provision any more servers on-premise. New SQL Server capacity will be running in Azure instead of in your data center.

Open Source

Azure provides service offerings for common open source databases like MySQL, MariaDB, and PostgreSQL. You can use your existing SAS license for SAS Data Connector to MYSQL to connect to Azure Database for MYSQL and SAS Data Connector to PostgreSQL to interface with Azure Database for PostgreSQL. SAS has not yet formally supported Azure Database for MariaDB. MariaDB is a variant of MySQL, so validation of support for SAS Data Connector is coming soon. If you need support for MariaDB in Azure database, please comment below and I will share your feedback with product management and testing.

Cloud Native

Azure SQL Database is an iteration of Microsoft SQL Server built for the cloud, combining the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases. SAS has extended SAS Data Connector support for Azure SQL Database. You can use your existing license for SAS Data Connector to Microsoft SQL Server to connect to Azure SQL Database.

Analytical Databases (OLAP)

Analytical Databases optimize on read performance. These databases work best from complex queries in smaller volume. When working with an analytical database you are typically doing analysis on multidimensional data interactively from multiple perspectives. Azure SQL Data Warehouse is the analytical database service offered by Azure. The SAS Data Connector to ODBC combined with a recent version of the Microsoft-supplied ODBC driver is currently the best way to interact with Azure SQL Data Warehouse. Look for the SAS Data Connector to Microsoft SQL Server to support SQL Data Warehouse soon.

NoSQL Databases

A non-relational or NoSQL database is any database not conforming to the relational database model. These databases are more easily scalable to a cluster of machines. NoSQL databases are a more natural fit for the cloud because the loose dependencies make the data easier to distribute and scale. The different NoSQL databases are designed to solve a specific business problem. Some of the most common data structures are key-value, column, document, and graph databases. If you want a brief overview of these database structures, I cover them in my AWS database blog.

For Microsoft Azure, CosmosDB is the option available for NoSQL databases. CosmosDB is multi-model, meaning you can build out your databases to fit the NoSQL model you prefer. Use the SAS Data Connector to ODBC to interact with your Data in Azure CosmosDB.

Hadoop

The traditional deployment of Hadoop is changing dramatically with the cloud. Traditional Hadoop vendors may have a tough time keeping up with the service offerings available in the cloud. Hadoop still offers reliable replicated storage across nodes and powerful parallel processing of large jobs without much data movement. Azure offers HDInsights as their Hadoop as a service offering. Azure HDInsights supports both SAS Data Connector to Hadoop and SAS Data Connect Accelerator for Hadoop.

Finally

It is important to think about the use case for your database and the type of data you plan to store before you select an Azure database service. Understanding your workloads is critical to getting the right performance and cost. When dealing with cloud databases, remember that you will be charged for the storage you use and for the data that you move out of the database. Performing analysis and reporting on your data may require data transfer. Be aware of these costs and think about how you can lower these by keeping frequently accessed data cached somewhere or remain on-premise. Another strategy I’ve seen becoming more popular is taking advantage of the SAS Micro Analytics Service to move the models you have built to run in the cloud provider where your data is stored. Data transfer is cheaper if that data moves between cloud services instead of outside of the cloud provider. Micro Analytics Service allows you to score the data in place without movement from a cloud provider and without having to do an install of SAS.

Additional Resources
1. Support for Databases in SAS® Viya® 3.4
2. Support for Cloud and Database Variants in SAS® 9.4

Accessing Databases in the Cloud – SAS Data Connectors and Microsoft Azure was published on SAS Users.