3月 052020
 

Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and Machine Learning. Download it today and keep reading for an excerpt from this free e-book!

In SAS Studio, you can access tasks that help automate your programming so that you do not have to manually write your code. However, there are three options for manually writing your programs in SAS® Viya®:

  1. SAS Studio provides a SAS programming environment for developing and submitting programs to the server.
  2. Batch submission is also still an option.
  3. Open-source languages such as Python, Lua, and Java can submit code to the CAS server.

In this blog post, you will learn the syntax for two of the new, advanced data mining and machine learning procedures: PROC TEXTMINE and PROCTMSCORE.

Overview

The TEXTMINE and TMSCORE procedures integrate the functionalities from both natural language processing and statistical analysis to provide essential functionalities for text mining. The procedures support essential natural language processing (NLP) features such as tokenizing, stemming, part-of-speech tagging, entity recognition, customized stop list, and so on. They also support dimensionality reduction and topic discovery through Singular Value Decomposition.

In this example, you will learn about some of the essential functionalities of PROC TEXTMINE and PROC TMSCORE by using a text data set containing 1,830 Amazon reviews of electronic gaming systems. The data set is named Amazon. You can find similar data sets of Amazon reviews at http://jmcauley.ucsd.edu/data/amazon/.

PROC TEXTMINE

The Amazon data set has already been loaded into CAS. The review content is stored in the variable ReviewBody, and we generate a unique review ID for each review. In the proc call shown in Program 1 we ask PROC TEXTMINE to do three tasks:

  1. parse the documents in table reviews and generate the term by document matrix
  2. perform dimensionality reduction via Singular Value Decomposition
  3. perform topic discovery based on Singular Value Decomposition results

Program 1: PROC TEXTMINE

data mycaslib.amazon;
    set mylib.amazon;
run;

data mycaslib.engstop;
    set mylib.engstop;
run;

proc textmine data=mycaslib.amazon;
    doc_id id;
    var reviewbody;

 /*(1)*/  parse reducef=2 entities=std stoplist=mycaslib.engstop 
          outterms=mycaslib.terms outparent=mycaslib.parent
          outconfig=mycaslib.config;

 /*(2)*/  svd k=10 svdu=mycaslib.svdu outdocpro=mycaslib.docpro
          outtopics=mycaslib.topics;

run;

(1) The first task (parsing) is specified in the PARSE statement. Parameter “reducef” specifies the minimum number of times a term needs to appear in the text to be included in the analysis. Parameter “stop” specifies a list of terms to be excluded from the analysis, such as “the”, “this”, and “that”. Outparent is the output table that stores the term by document matrix, and Outterms is the output table that stores the information of terms that are included in the term by document matrix. Outconfig is the output table that stores configuration information for future scoring.

(2) Tasks 2 and 3 (dimensionality reduction and topic discovery) are specified in the SVD statement. Parameter K specifies the desired number of dimensions and number of topics. Parameter SVDU is the output table that stores the U matrix from SVD calculations, which is needed in future scoring. Parameter OutDocPro is the output table that stores the new matrix with reduced dimensions. Parameter OutTopics specifies the output table that stores the topics discovered.

Click the Run shortcut button or press F3 to run Program 1. The terms table shown in Output 1 stores the tagging, stemming, and entity recognition results. It also stores the number of times each term appears in the text data.

Output 1: Results from Program 1

PROC TMSCORE

PROC TEXTMINE is used with large training data sets. When you have new documents coming in, you do not need to re-run all the parsing and SVD computations with PROC TEXTMINE. Instead, you can use PROC TMSCORE to score new text data. The scoring procedure parses the new document(s) and projects the text data into the same dimensions using the SVD weights derived from the original training data.

In order to use PROC TMSCORE to generate results consistent with PROC TEXTMINE, you need to provide the following tables generated by PROC TEXTMINE:

  • SVDU table – provides the required information for projection into the same dimensions.
  • Config table – provides parameter values for parsing.
  • Terms table – provides the terms that should be included in the analysis.

Program 2 shows an example of TMSCORE. It uses the same input data layout used for PROC TEXTMINE code, so it will generate the same docpro and parent output tables, as shown in Output 2.

Program 2: PROC TMSCORE

Proc tmscore data=mycaslib.amazon svdu=mycaslib.svdu
        config=mycaslib.config terms=mycaslib.terms
        svddocpro=mycaslib.score_docpro outparent=mycaslib.score_parent;
    var reviewbody;
    doc_id id;
run;

 

Output 2: Results from Program 2

To learn more about advanced data mining and machine learning procedures available in SAS Viya, including PROC FACTMAC, PROC TEXTMINE, and PROC NETWORK, you can download the free e-book, Exploring SAS® Viya®: Data Mining and Machine Learning. Exploring SAS® Viya® is a series of e-books that are based on content from SAS® Viya® Enablement, a free course available from SAS Education. You can follow along with examples in real time by watching the videos.

 

Learn about new data mining and machine learning procedures in SAS Viya was published on SAS Users.

3月 052020
 

Fifty years ago, as the women’s liberation movement was gaining momentum in the U.S., my maternal great-grandmother, Pearl, worked in a factory sewing American flags while volunteering with the Girl Scouts and caring for her grandchildren. My paternal grandmother, Greta, also worked in local factories while caring for her family. [...]

50 years of strong, intelligent women was published on SAS Voices by Ashley Binder

3月 042020
 

Your brand is customer journey obsessed, and every interaction with your company provides a potential opportunity to make an intelligent decision, deepen engagement and meet conversion goals. The hype of martech innovation in 2020 is continuing to elevate, and every technology vendor is claiming the following statement: "Bolster the customer [...]

SAS Customer Intelligence 360: Marketing AI vision was published on Customer Intelligence Blog.

3月 042020
 

I'm a planner, and my plan was to have a completely natural birth and to breast feed only. This plan was written in stone (or so I thought). Long story short – my birthing plan flew out the window. One of my complications was very high blood pressure, which refused [...]

Breast milk, babies, NCSU students and SAS/OR was published on SAS Voices by Natalia Summerville

3月 042020
 

Suppose that a data set contains a set of parameter values. For each row of parameters, you need to perform some computation. A recent discussion on the SAS Support Communities mentions an important point: if there are duplicate rows in the data, a program might repeat the same computation several times. This is inefficient. An efficient alternative is to perform the computations once and store them in a list. This article describes this issue and implements it by using lists in SAS/IML software. Lists were introduced in SAS/IML 14.2 (SAS 9.4M4). A natural syntax for creating lists was introduced in SAS/IML 14.3 (SAS 9.4M5).

This article assumes that the matrices are not known until you read a data set that contains the parameters. The program must be able to handle computing, storing, and accessing an arbitrary number of matrices: 4, 10, or even 100. Obviously, if you know in advance the number of matrices that you need, then you can just create those matrices and give them names such as X1, X2, X3, and X4.

Example: Using powers of a matrix

To give a concrete example, suppose the following data set contains multiple values for a parameter, p:

data Powers;
input p @@;
datalines;
9 1 5 2 2 5 9 1 2 5 2 1 
;

For each value of p, you need to compute X = Ap = A*A*...*A for a matrix, A, and then use X in a subsequent computation. Here's how you might perform that computation in a straightforward way in SAS/IML without using lists:

proc iml;
/* use a small matrix, but in a real application the matrix might be large. */
A = {2 1, 1 3};
 
use Powers;                    /* read the parameters from the data set */
read all var "p";
close; 
 
/* The naive method: Compute the power A**p[i] each time you need it. */
do i = 1 to nrow(p);
  power = p[i];
  X = A**power;                /* compute each power as needed */
  /* compute with X ... */
end;

The program computes Ap for each value of p in the data set. Because the data set contains duplicate values of p, the program computes A2 four times, A5 three times, and A9 two times. If A is a large matrix, it is more efficient to compute and store these matrices and reuse them as needed.

Store all possible matrices in a list

If the values of p are uniformly likely, the easiest way to store matrix powers in a list is to find the largest value of p (call it m) and then create a list that has m items. The first item in the list is A, the second item is A2, and so forth up to Am. This is implemented by using the following statements:

/* Method 1: Find upper bound of power, m. 
             Compute all matrices A^p for 1 <= p <= m. 
             Store these matrices in a list so you don't need to recompute. */
m = max(p);                    /* largest power in data */
L = ListCreate( m );
do i = 1 to m;
  L$i = A**i;                  /* compute all powers up to maximum; store A^i as i_th item */
end;
 
do i = 1 to nrow(p);           /* extract and use the matrices, as needed */
  power = p[i];
  X = L$power;                 /* or   X = ListGetItem(L, power); */
  /* compute with X ... */
end;

For these data, the list contains nine elements because 9 is the largest value in the data. The following statements show how to display a compact version of the list, which shows the matrices Ap for p=1,2,...,9.

package load ListUtil;         /* load the Struct and ListPrint subroutines */
run struct(L);
Structure of a nine-item list. The p_th item stores A^p.

The table shows the first few elements of every item in the list. In this case, the p_th item is storing the 2 x 2 matrix Ap. The elements of the matrix are displayed in row-major order.

Store only the necessary matrices in a list

Notice that the data only contain four values of p: 1, 2, 5, and 9. The method in the previous section computes and stores unnecessary matrices such as A3, A4, and A8. This is inefficient. Depending on the maximum value of p (such as max(p)=100), this method might be wasteful in terms of memory and computational resources.

A better solution is to compute and store only the matrices that are needed for the computation. For this example, you only need to compute and store four matrices. You can use the UNIQUE function to find the unique values of p (in sorted order). The following statements compute and store Ap for only the unique values of p in the data set:

/* Method 2: Compute matrices A^p for the unique values of p in the data.
             Store only these matrices in a named list. */
u = unique(p);                 /* find the unique values of p */
nu = ncol(u);                  /* how many unique values? */
L = ListCreate( nu );
do i = 1 to nu;
  power = u[i];                /* compute only the powers in the data; store in a named list */
  L$i =  [#"Power"  = power,   /* optional: store as a named list */
          #"Matrix" = A**power]; 
end;
 
run struct(L);
Structure of a four-item list, where each item is a named list.

For this example, each item in the list L is itself a list. I used a "named list" with items named "Power" and "Matrix" so that the items in L have context. The STRUCT subroutine shows that L contains only four items.

How can you access the matrix for, say, A5? There are several ways, but the easiest is to use the u matrix that contains the unique values of p in sorted order. You can use the LOC function to look up the index of the item that you want. For example, A5 is stored as the third item in the list because 5 is the third element of u. Because each item in L is itself a list, you can extract the "Matrix" item as follows:

do i = 1 to nrow(p);
  power = p[i];
  k = loc(u = power);          /* the k_th item is A^p */
  X = L$k$"Matrix";            /* extract it */
  /* compute with X ... */
  *print i power X;
end;

In summary, this article describes how to use a list to store matrices so that you can compute them once and reuse them many times. The article assumes that the number of matrices is not known in advance but is obtained by reading a data set at run time. The example stores powers of a matrix, but this technique is generally applicable.

If you have never used lists in SAS/IML, see the article "Create lists by using a natural syntax in SAS/IML" or watch this video that describes the list syntax.

The post Store pre-computed matrices in a list appeared first on The DO Loop.

3月 032020
 

The endpoint of analytics is not a report or an alert. The endpoint is a decision. Often those decisions are related to your business and you make them to reduce risks, improve production or satisfy customers. In health care, however, the decisions made with analytics can be a matter of [...]

Improving lives through better decisions was published on SAS Voices by Oliver Schabenberger

3月 022020
 

Let’s face it. Data sharing between platforms in health care just isn’t easy. Patient data privacy concerns, incompatible file formats, asynchronous identifiers … I’ve heard it all. From the electronic health record (EHR), picture archiving and communication systems (PACS) to discrete processes like pharmacy or departmental information systems, achieving some [...]

Why analytic interoperability matters in health care was published on SAS Voices by Alyssa Farrell

3月 022020
 

Growing up, I was often reminded to turn off the lights. In my home, this was a way of saving on a key resource (electricity) that we had to pay for when not using it. It was a way of being a good steward with the family’s money and targeting it to run the lights and other things in our home. This allowed us to go about our daily tasks and get them done when we needed to.

These days I have the same goal in my own home, but I’ve automated the task. I have a voice assistant that, with a few select words, will turn off (or on) the lights in the rooms that I use most. The goal and the reasoning are the same, but the automation allows me to take it to another level.

In a similar way, automation today allows us to optimize the use of compute resources in a way that we haven’t been able to do in the past. The degree to which we can switch on and off the systems required to run our compute workloads in cloud environments, scale to use more, or fewer, resources depending on demand, and only pay for what we need, is a clear indicator of just how much infrastructure technology has evolved in recent years.

Like the basic utilities we rely on in our homes, SAS, with its analytics and modeling capabilities has become an essential utility for any business that wants to not only make sense of data but turn it into the power to get things done. And, like any utility necessary to do business, we want it working quickly at the flip of a switch, easily made available anywhere we need it, and helping us be good stewards of our resources.

Containers and the related technologies can help us achieve all of this in SAS

A container can most simply be thought of as a self-contained environment with all the programs, configuration, initial data, and other supporting pieces to run applications. This environment can be treated as a stand-alone unit, ready to turn on and run at any time, much in the way your laptop is a stand-alone system. In fact, this sort of “portable machine” analogy can be a good way to think about containers at a high level – a complete virtual system containing, and configured for running, one or more targeted application(s) – or components of an application.

Docker is one of the oldest, and most well-known, applications for defining and managing containers. Defining a container is done by first defining an image. The image is an immutable (static) set of the software, environment, configuration, etc. that serves as a template for containers to be created from it. Each image, in turn, is composed of layers that apply some setting, software, or data that a container based on the image will need.
Have you ever staged a new machine for yourself, your company, a friend or relative? If so, you can relate the “layering” of software you installed and configured to the layers that go into an image. And the image itself might be like the image of software you created on the disk of the system. The applications are stored there, fully configured and ready to go, whenever you turn the system on.

Turning an image into a container is mostly just adding another layer on top of the present image layers. The difference is this “container layer” can be modified as needed – have things written to it, updated, etc. If you think about the idea of creating a user profile along with its space on a system you staged, it’s a similar idea. Like that dedicated user space on the laptop or desktop, the layer that gets added to an image to make a container, is there for the running system to use and customize as needed. This is like how the user area is there to use and customize as needed when a PC is turned on and running.

Containers, Kubernetes, and cloud environments

It is rare that any corporate system today is managed with only a single PC. Likewise, in the world of cloud and containerized environments, it is rare that any software product is run with only a single container. More commonly, applications consist of many containers organized to address specific application areas (such as web interfaces, database management, etc.) and/or architectural designs to optimize resource use and communication paths in the system (microservices).

Having the advantages that are derived from either multiple PCs or multiple containers also requires a way to manage them and ensure reliability and robustness for our applications and customers. For the machines, enterprises typically rely on data centers. Data centers play a key role, ensuring systems are kept up and running, replaced when broken, and are centrally accessible. As well, they may be responsible for bringing more systems online to address increased loads or taking some systems offline to save costs.

For containers, we have applications that function much like a “data center for containers.” The most prominent one today is Kubernetes (also known as “K8S” for the eight letters between “K” and “S”). Kubernetes’ job is to simplify deployment and management of containers and containerized workloads. It does this by automating key needs around containers, such as deployment, scaling, scheduling, healing, monitoring, and more. All of this is managed in a “declarative” way where we no longer must tell the system “how” to get to the state we want – we instead tell it “what” state we want, and it ensures that state is met and preserved.

The combination of containers, Kubernetes, and cloud environments provides an evolutionary jump in being able to control and leverage the infrastructure and runtime environments that you run your applications in. And this gives your business a similar jump in being able to provide the business value targeted to meet the environments, scale, and reliability that your customers demand - while having the automatic optimization of resources and the automatic management of workloads that you need to be competitive.

Harness decades of expertise with SAS® Viya® 4.0

Viya 4.0 provides this same evolutionary jump for SAS. Now, your SAS applications and workloads can be run in containers, Kubernetes, and cloud environments natively. Viya 4 builds on the award-winning, best-in-class analytics to allow data scientists, business executives, and decision makers at all levels to harness the decades of SAS expertise running completely in containers, and tightly integrated with Kubernetes and the cloud.
Viya 4.0 brings all the key SAS functionalities you’d expect – modeling, decision-making, forecasting, visualization, and more – to the cloud and enterprise cloud environments, along with the advantages of running in a containerized model. It also leverages the robust container management, monitoring, self-healing, scaling and other aspects of Kubernetes. This is all guaranteed to make you more in control and less reliant on being in your data center to manage these kinds of activities.

Just remember to turn the lights off.

LEARN MORE | AN INTRO TO SAS FOR CONTAINERS

Automation with Containers: the Power to Get Things Done was published on SAS Users.

3月 022020
 

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer than baseline are displayed in one color (red for warm) whereas days colder than the baseline are displayed in another color (blue for cold). The graph was very pretty. A reader posted a comment asking whether a similar graph could be created by using other graphical tools such as GTL or even PROC SGPLOT. The answer is yes, but I am going to propose a different graph that I think is more flexible and easier to read.

Let's generalize the problem. Suppose you have a time series and you want to compare the values to a baseline (or reference) value. One way to do this is to visualize the data as deviations from the baseline. Data values that are close to the baseline will be small and almost unnoticeable. The eye will be drawn to values that indicate large deviations from the baseline. A "deviation plot" like this can be used for many purposes. Some applications include monitoring blood glucose relative to a target value, showing expenditures relative to a fixed income amount, and, yes, displaying the temperature relative to some comfortable reference value. Deviation plots sometimes accompany a hypothesis test for a one-way frequency distribution.

Linear displays versus circular displays

My colleague's display shows one year's worth of temperatures by plotting the day of the year along a circle. While this makes for an eye-catching display, there are a few shortcomings to this approach:

  • It is difficult to read the data values. It is also difficult to compare values that are on opposite sides of a circle. For example, how does March data compare with October data?
  • Although a circle can show data for one year, it is less effective for showing 8 or 14 months of data.
  • Even for one year's worth of data, it has a problem: It places December 31 next to January 1. In the temperature graph, the series began on 01JAN2018. However, the graph places 31DEC2018 next to 01JAN2018 even though those values are a year apart.

As mentioned earlier, you can use SAS/GRAPH or the statistical graphics (SG) procedure in SAS to display the data in polar coordinates. Sanjay Matange's article shows how to create a polar plot. For some of my thought about circular versus rectangular displays, see "Smoothers for periodic data."

A deviation-from-baseline plot

The graph to the right (click to enlarge) shows an example of a deviation plot (or deviation-from-baseline plot). It is similar to a waterfall chart, but in many waterfall charts the values are shown as percentages, whereas for the deviation plot we will show the observed values. You can see that the values are plotted for each day. The high values are plotted in one color (red) whereas low values are plotted in a different color (blue). A reference line (in this case, at 100) is displayed.

To create a deviation plot, you need to perform these three steps:

  1. Use the SAS DATA step to encode the data as 'High' or 'Low' by using the reference value. Compute the deviations from the reference value.
  2. Create a discrete attribute map that maps values to colors. This step is optional. Alternatively, SAS will assign colors based on the current ODS style.
  3. Use a HIGHLOW plot to graph the deviations from the reference value.

Let's implement these steps on a time series for three months of daily blood glucose values. An elderly male takes oral medications to control his blood glucose level. Each morning he takes his fasting blood glucose level and records it. The doctor has advised him to try to keep the blood glucose level below 100 mg/dL, so the reference value is 100. The following DATA step defines the dates and glucose levels for a three-month period.

data Series;
informat Date date.;
format Date Date.;
input Date y @@;
label y = "Blood Glucose (mg/dL)";
datalines;
01SEP19 100 02SEP19  96 03SEP19  86 04SEP19  93 05SEP19 105 06SEP19 106 07SEP19 123 
08SEP19 121 09SEP19 115 10SEP19 108 11SEP19  94 12SEP19  96 13SEP19  95 14SEP19 120
15SEP19 112 16SEP19 104 17SEP19  97 18SEP19 101 19SEP19 108 20SEP19 108 21SEP19 117 
22SEP19 103 23SEP19 109 24SEP19  97 25SEP19  93 26SEP19 100 27SEP19  98 28SEP19 122 
29SEP19 116 30SEP19  99 01OCT19 102 02OCT19  99 03OCT19  95 04OCT19  99 05OCT19 116 
06OCT19 109 07OCT19 106 08OCT19  94 09OCT19 104 10OCT19 112 11OCT19 119 12OCT19 111 
13OCT19 104 14OCT19 101 15OCT19  99 16OCT19  92 17OCT19 101 18OCT19 115 19OCT19 109 
20OCT19  98 21OCT19  91 22OCT19  92 23OCT19 100 24OCT19 109 25OCT19 102 26OCT19 117 
27OCT19 106 28OCT19  98 29OCT19  98 30OCT19  95 31OCT19  97 01NOV19 129 02NOV19 120 
03NOV19 117 04NOV19   . 05NOV19 101 06NOV19 105 07NOV19 105 08NOV19 106 09NOV19 118 
10NOV19 109 11NOV19 102 12NOV19  98 13NOV19  97 14NOV19 .   15NOV19  92 16NOV19 114 
17NOV19 107 18NOV19  98 19NOV19  91 20NOV19  97 21NOV19 109 22NOV19  98 23NOV19 95 
24NOV19  95 25NOV19  94 26NOV19   . 27NOV19  98 28NOV19 115 29NOV19 123 30NOV19 114 
01DEC19 104 02DEC19  96 03DEC19  97 04DEC19 100 05DEC19  94 06DEC19  93 07DEC19 105 
08DEC19   . 09DEC19  88 10DEC19  84 11DEC19 101 12DEC19 122 13DEC19 114 14DEC19 108 
15DEC19 103 16DEC19  88 17DEC19  74 18DEC19  92 19DEC19 110 20DEC19 118 21DEC19 106 
22DEC19 100 23DEC19 106 24DEC19 107 25DEC19 116 26DEC19 113 27DEC19 113 28DEC19 117 
29DEC19 101 30DEC19  96 31DEC19 101  
;

Encode the data

The first step is to compute the deviation of each observed value from the reference value. If an observed value is above the reference value, mark it as 'High', otherwise mark it as 'Low'. We will plot a vertical bar that goes from the reference level to the observed value. Because we will use a HIGHLOW statement to display the graph, the DATA step computes two new variables, High and Low.

/* 1. Compute the deviation and encode the data as 'High' or 'Low' by using the reference value */
%let RefValue = 100;
 
data Center;
set Series;
if (y > &RefValue) then Group="High";
else Group="Low";
Low  = min(y, &RefValue);    /* lower end of highlow bar */
High = max(y, &RefValue);    /* upper end of highlow bar */
run;

Maps high and low values to colors

If you want SAS to assign colors to the two groups, you can skip this step. However, in many cases you might want to choose which color is plotted for the high and low categories. You can map levels of a group to colors by using a discrete attribute map ("DATTR map", for short) in PROC SGPLOT. Because we are going to use a HIGHLOW statement to graph the data, we need to define a map that has the FillColor and LineColor for the vertical bars. The following DATA step maps the 'High' category to red and the 'Low' category to blue:

/* 2. Create a discrete attribute map that maps values to colors */
data DAttrs;
length c FillColor LineColor $16.;
ID = "HighLow";
Value = "High"; c="DarkRed";  FillColor=c; LineColor=c; output;
Value = "Low";  c="DarkBlue"; FillColor=c; LineColor=c; output;
run;

Create a high-low plot

The final step is to create a high-low plot that shows the deviations from the reference value. You can use the DATTRMAP= option to tell PROC SGPLOT how to assign colors for the group values. Because a data set can contain multiple maps, the ATTRID= option specifies which mapping to use.

/* 3. Use a HIGHLOW plot to graph the deviations from the reference value */
title "Deviations from Reference Value (&RefValue)";
title2 "Morning Fasting Blood Glucose";
ods graphics / width=600px height=400px;
proc sgplot data=Center DATTRMAP=DAttrs noautolegend;
   highlow x=day low=low high=high / group=Group ATTRID=HighLow;
   refline &RefValue / axis=y;
   yaxis grid label="Blood Glucose Level";
run;

The graph is shown at the top of this section. It is clear that on most days the patient has high blood sugar. With additional investigation, you can discover that the highest levels are associated with weekends and holidays.

Note that these data would not be appropriate to plot on a circular graph because the data are not for a full year. Furthermore, on this graph it is easy to see specific values and days and to compare days in September with days in December.

A deviation plot for daily average temperatures

My colleague's graph displayed daily average temperatures. The following deviation plot shows average temperatures and a reference value of 65F. The graph shows the daily average temperature in Raleigh, NC, for 2018:

In this graph, it is easy to find the approximate temperature for any range of dates (such as "mid-October") and to compare the temperature for different time periods, such as March versus October. I think the rectangular deviation plot makes an effective visualization of how these data compare to a baseline value.

You can download the SAS program that creates the graphs in this article.

The post Create a deviation plot to visualize values relative to a baseline appeared first on The DO Loop.

2月 292020
 

Stored processes were a very popular feature in SAS 9.4. They were used in reporting, analytics and web application development. In Viya, the equivalent feature is jobs. Using jobs the same way as stored processes was enhanced in Viya 3.5. In addition, there is preliminary support for promoting 9.4 stored processes to Viya. In this post, I will examine what is sure to be a popular feature for SAS 9.4 users starting out with Viya.

What is a job? In Viya, that is a complicated question as there are a number of different types of jobs. In the context of the log, we are talking about jobs that can be created in SAS Studio or in the Job Execution application. A SAS Viya job consists of a program and its definition (metadata related to the program). The job definition includes information such as the job name, the author and when it was created. After you have created a job definition, you have an execution URL that you can share with others at your site. The execution URL can be entered into a web browser and ran without opening SAS Studio, or you can run a job directly from SAS Studio. When running a SAS job, SAS Studio uses the SAS Job Execution Web Application.

In addition, you can create task prompt(XLM) or an HTML form to provide a user interface to the job. When the user selects an option in the prompt and submits the job, the data specified in the form or task prompt is passed to a SAS session as global macro variables. The SAS program runs and the results are returned to the web browser. Sounds a lot like a stored process!

For more information on authoring jobs in SAS Studio, see SAS® Studio 5.2 Developer’s Guide: Working with Jobs.

With the release of Viya 3.5, there is preliminary support for the promotion of 9.4 Stored Processes to Viya Jobs. For more information on this functionality, see SAS® Viya® 3.5 Administration: Promotion (Import and Export).

How does it work? Stored processes are exported from SAS 9.4 using the export wizard or on the command-line and the resultant package can be imported to Viya using SAS Environment Manager or the sas-admin transfer CLI. The import process converts the stored processes to job definitions which can be run in SAS Studio or directly via a URL.

The new job definition includes the job metadata, SAS code and task prompts.

1. Job Metadata

The data is not included in the promotion process. However, it must be made available to the Viya compute server so that the job code and any dynamic prompts can access it. There are two Viya server contexts that need to have access to the data:

  • Job Execution Context: used when running jobs via a URL
  • SAS Studio Context: used when running jobs from SAS Studio

To make the data available, the compute server has to access it via a libname and the table must exist in the library.

To add a libname to the SAS Job Execution Compute context in SAS Environment Manager, select Contexts > Compute contexts and SAS Job Execution Compute context. Then select Edit > Advanced.

In the box labelled, “Enter each autoexec statement on a new line:” add a line for the libname. If you keep the same 8 character libname as in 9.4, you will have less to change in your code and prompts.

NOTE: the libname could be added to the /opt/sas/viya/config/etc/compsrv/default/autoexec_usermods.sas file on the compute server. While this is perhaps easier than adding to each context that requires it, this would make it available to every Viya compute process.

2. SAS Code

The SAS code that makes up the stored process is copied without modification to the Viya job definition. In most cases, the code will need to be edited so that it will run in the Viya environment. The SAS code in the definition can be modified using SAS Studio or the SAS Job Execution Web Application. Edit the code so that:

  • Any libnames in the new job code to point to the location of the data in Viya
  • Any other SAS 9 metadata related code is removed from the stored process

As you test your new jobs in Viya, additional changes may be needed to get them running.

3. Task Prompts

Prompts that were defined in metadata in the SAS 9 stored process are converted to the task prompts and stored within the job definition. The xml for the prompt definition can be accessed and edited from SAS Studio or the SAS Job Execution Web App. For more information on working with task prompts in Viya, see the SAS Studio Developers guide.

If you have shared prompts you want to include, it is recommended that you select, “Include dependent objects” when exporting from SAS 9.4. If you do not select this option, any shared prompts will be omitted from the package. If the shared prompts, libnames and tables are included in the package with the stored processes, the SAS 9 metadata based library and table definitions referenced in dynamic prompts will convert to use a library.table reference in Viya. When this happens, the XML for the prompt will include the libname.tablename of the dataset used to populate the prompt values. For example:

<datasource active=”true” name=”DataSource2″
defaultValue=”FINDATA.FINANCIAL__SUMMARY” name=”DataSource1″”>

If the libnames and tables are not included in the package, the prompt will show a url mapping to the tables location in 9.4 metadata. For example:

<DataSource active=”true” name=”DataSource2″ url=”/gelcorp/financecontent/Data/Source Data/FINANCIAL__SUMMARY(Table)”>

For the prompt to work in the latter case, you need to edit it and provide the libname.table reference in the same way as it is shown in the first example.

Including libraries and tables in a package imported to Viya results in folders that contain libraries and tables in 9.4 created in Viya. This may result in Viya folders that are not needed, because data does not reside in folders in Viya. As an administrator, you can choose:

  • Include the dependent data and tables in the package and clean up any extra folders after promotion
  • Exclude the dependent data and tables in the package and edit the data source in prompt xml to reference the libname.table (this is not a great option if you have shared prompts)

Issues encountered converting SAS 9 prompts to Viya SAS Prompt Interface will be cause warning messages to be included at the beginning of the xml defining the prompt interface.

As I mentioned earlier, you can run the job from SAS Studio or from the Job Execution Web Application. You can also run the job from a URL just like you could with stored processes in SAS 9.4. To get the URL for the job, select the job properties of in SAS Studio.

Earlier releases of Viya provided a different way to support stored processes. This consists of enabling access to the 9.4 stored process server and its stored processes in Viya. This approach is still supported in Viya 3.5 because, while jobs can replace some stored processes, they can not currently be embedded in a Viya Visual Analytics Report.

For more information, please check out:

Jobs: Stored processes in Viya was published on SAS Users.