3月 212019
 

SAS Decision Manager enables you to build and test decisions to use in batch processes, real-time web applications or with SAS ESP.

In this blog, I explain how to use Rulesets in an Event Stream Process project. If you are streaming data using SAS ESP and your data stream involves making decisions, you can build Rulesets in SAS Decision Manager and use them in your event stream project. ESP can invoke the code generated by SAS Decision Manager and execute it in its Micro Analytic Service (MAS) engine.

Receiving code for Rulesets

To use a Ruleset in Decision Manager within an event stream project in ESP, you need to export the DS2 code generated by Decision Manager and point ESP towards the code to execute it. To export code from Decision Manager, we use the SAS Decision Manager Viya REST API to:
• Obtain an access token to SAS Viya
• Receive the ID for the required Ruleset
• Receive the Decision Manager DS2 code via the Ruleset ID

Obtain an access token to SAS Viya

Before using SAS Viya APIs, your SAS administrator must register a client identifier. The SAS Logon OAuth API uses OAuth2 to securely identify your application before it connects to the SAS Viya platform. See Registering clients for information on how clients are registered. Once a client is successfully registered, the SAS administrator provides you with the client identifier and client secret to authenticate an API request.
To obtain an access token call:
http://{{ViyaServer}}/SASLogon/oauth/token

If successfully executed, you will get an access token for all further REST calls.

Receive the Ruleset ID

We need the ID for the Ruleset we want to use in ESP. The REST Endpoint requires the ID to receive the DS2 code.
To get the ID, call the Endpoint that lists all available Rulesets:
http://{{ViyaServer}}/businessRules/ruleSets

If successfully executed, you will receive the Ruleset ID in the field “id” in the “items” list.

Receive Ruleset code

With this new ID, we can export the DS2 code for the Ruleset.
To get the code, call the appropriate Ruleset Endpoint:
http://{{ViyaServer}}/businessRules/ruleSets//code
Set ID to the value of this new Ruleset ID.

If executed successfully, you will receive the DS2 code for the Ruleset.

Preparing the code

Copy the DS2 code from the REST call into a file, save the file with a descriptive name (i.e. the name of the Ruleset) and move it to a location where ESP can access it.

Invoke Decision Manager Code in ESP

Now that we have saved the code into a file and moved it to a location that ESP can access, we can now invoke the code from our ESP project.

We need to register the ruleset code file we saved.
Open the ESP project and go to Micro Analytic Service Modules at the project level.

Add a new Micro Analytic Service Module for the ruleset code file and fill in all required fields.


To invoke the code in the event stream, add a Calculate Window.
In Settings choose Calculation = User-specified.

Under Handlers, select the source and ensure the field values are set correctly.


Set the fields for the output schema of the calculate window. Note that the field names and types must match the names and types used in the Ruleset.
Save the project.

You are now ready to run your project in Test mode to check if it works.

Conclusion

SAS Decision Manager allows you to build decisions in an independent environment to ESP. This gives you the freedom to design and test decisions in a less technical environment without touching the event stream. After testing the decision, you can simply “hook it in” to your event stream.
Other users can work on and update decisions by just applying a new/updated code file. This will allow your event stream to be to be more flexible and easier to maintain. To learn more, please check out these sources.

Video: SAS Decision Manager
Article: Using SAS Decision Manager to enrich the data prep process

Calling SAS Decision Manager Rulesets in ESP was published on SAS Users.

3月 202019
 

An analyst was using SAS to analyze some data from an experiment. He noticed that the response variable is always positive (such as volume, size, or weight), but his statistical model predicts some negative responses. He posted the data and asked if it is possible to modify the graph so that only positive responses are displayed.

This article shows how you can truncate a surface or a contour plot so that negative values are not displayed. You could do something similar to truncate unreasonably high values in a surface plot.

Why does the model predict negative values?

Before showing how to truncate the surface plot, let's figure out why the model predicts negative values when all the observed responses are positive. The following DATA step is a simplified version of the real data. The RSREG procedure uses least squares regression to fit a quadratic response surface. If you use the PLOTS=SURFACE option, the procedure automatically displays a contour plot and surface plot for the predicted response:

data Sample;
input X Y Z @@;
datalines;
10 90 22  22 76 13  22 75  7 
24 78 14  24 76 10  25 63  5
26 62 10  26 94 20  26 63 15
27 94 16  27 95 14  29 66  7
30 69  8  30 74  8
;
 
ods graphics / width=400px height=400px ANTIALIASMAX=10000;
proc rsreg data=Sample plots=surface(fill=pred overlaypairs);
   model Z = Y X;
run;
 
proc rsreg data=Sample plots=surface(3d fill=Pred gridsize=80);
   model Z = Y X;
   ods select Surface;
   ods output Surface=Surface; /* use ODS OUTPUT to save surface data to a data set */
run;

The contour plot overlays a scatter plot of the data. You can see that the data are observed only in the upper-right portion of the plot (the red regions) and that no data are in the lower-left portion of the plot. The RSREG procedure fits a quadratic model to the data. The predicted values near the observed data are all positive. Some of the predicted values that are far from the observed data are negative.

I previously wrote about this phenomenon and showed how to compute the convex hull for these bivariate data. When you evaluate the model inside the convex hull, you are interpolating. When you evaluate the model outside the convex hull, you are extrapolating. It is well known that polynomial regression models can give nonsensical results if you extrapolate far from the data.

Truncating a response surface

The RSREG procedure is not aware that the response variable should be positive. A quadratic surface will eventually get arbitrarily big in the positive and/or negative directions. You can see this on the contour and surface plots, which show the predictions of the model on a regular grid of (X, Y) values.

If you want to display only the positive portion of the prediction surface, you can replace each negative predicted value with a missing value. The first step is to obtain the predicted values on a regular grid. You can use the "missing value trick" to score the quadratic model on a grid, or you can use the ODS OUTPUT statement to obtain the gridded values that are used in the surface plot. I chose the latter option. In the previous section, I used the ODS OUTPUT statement to write the gridded predicted values for the surface plot to a SAS data set named Surface.

As Warren Kuhfeld points out in his article about processing ODS OUTPUT data set, the names in an ODS data object can be "long and hard to type." Therefore, I rename the variables. I also combine the gridded values with the original data so that I can optionally overlay the data and the predicted values.

/* rename vars and set negative responses to missing */
data Surf2;
set Surface(rename=(
       Predicted0_1_0_0 = Pred  /* rename the long ODS names */
       Factor1_0_1_0_0  = GY    /* 'G' for 'gridded' */
       Factor2_0_1_0_0  = GX))  
    Sample(in=theData);         /* combine with original data */
if theData then Type = "Data   ";
else            Type = "Gridded";
if Pred < 0 then Pred = .;      /* replace negative predictions with missing values */
label GX = 'X'  GY = 'Y';
run;

You can use the Graph Template Language (GTL) to generate graphs that are similar to those produced by PROC RSREG. You can then use PROC SGRENDER to create the graph. Because the negative response values were set to missing, the contour plot displays a missing value color (black, for this ODS style) in the lower-left and upper-right portions of the plot. Similarly, the missing values cause the surface plot to be truncated. By using the GRIDSIZE= option, you can make the jagged edges small.

Notice that the colors in the graphs are now based on the range [0, 50], whereas previously the colors were based on the range [-60, 50]. I've added a continuous legend to the plots so that the range of the response variable is obvious.

I'd like to stress that sometimes "nonsensical values" indicate an inappropriate model. If you notice nonsensical values, you should always ask yourself why the model is predicting those values. You shouldn't modify the prediction surface without a good reason. But if you do have a good reason, the techniques in this article should help you.

You can download the complete SAS program that analyzes the data and generates the truncated graphs.

The post Truncate response surfaces appeared first on The DO Loop.

3月 182019
 

Artificial intelligence (AI) is a natural evolution of analytics. Over the years, we have seen AI add learning and automation capabilities to the predictive and prescriptive jobs of analytics. We have been building AI systems for decades, but a few things have changed to make today’s AI systems more powerful. [...]

Advancing AI with deep learning and GPUs was published on SAS Voices by Oliver Schabenberger

3月 182019
 

Artificial intelligence (AI) is a natural evolution of analytics. Over the years, we have seen AI add learning and automation capabilities to the predictive and prescriptive jobs of analytics. We have been building AI systems for decades, but a few things have changed to make today’s AI systems more powerful. [...]

Advancing AI with deep learning and GPUs was published on SAS Voices by Oliver Schabenberger

3月 182019
 

Statisticians often emphasize the dangers of extrapolating from a univariate regression model. A common exercise in introductory statistics is to ask students to compute a model of population growth and predict the population far in the future. The students learn that extrapolating from a model can result in a nonsensical prediction, such as trillions of people or a negative number of people! The lesson is that you should be careful when you evaluate a model far beyond the range of the training data.

The same dangers exist for multivariate regression models, but they are emphasized less often. Perhaps the reason is that it is much harder to know when you are extrapolating a multivariate model. Interpolation occurs when you evaluate the model inside the convex hull of the training data. Anything else is an extrapolation. In particular, you might be extrapolating even if you score the model at a point inside the bounding box of the training data. This differs from the univariate case in which the convex hull equals the bounding box (range) of the data. In general, the convex hull of a set of points is smaller than the bounding box.

You can use a bivariate example to illustrate the difference between the convex hull of the data and the bounding box for the data, which is the rectangle [Xmin, Xmax] x [Ymin, Ymax]. The following SAS DATA step defines two explanatory variables (X and Y) and one response variable (Z). The SGPLOT procedure shows the distribution of the (X, Y) variables and colors each marker according to the response value:

data Sample;
input X Y Z @@;
datalines;
10 90 22  22 76 13  22 75  7 
24 78 14  24 76 10  25 63  5
26 62 10  26 94 20  26 63 15
27 94 16  27 95 14  29 66  7
30 69  8  30 74  8
;
 
title "Response Values for Bivariate Data";
proc sgplot data=Sample;
scatter x=x y=y / markerattrs=(size=12 symbol=CircleFilled)
        colorresponse=Z colormodel=AltThreeColorRamp;
xaxis grid; yaxis grid;
run;
Bivariate data. The data are not uniformly distributed in the bounding box of the data.

The data are observed in a region that is approximately triangular. No observations are near the lower-left corner of the plot. If you fit a response surface to this data, it is likely that you would visualize the model by using a contour plot or a surface plot on the rectangular domain [10, 30] x [62, 95]. For such a model, predicted values near the lower-left corner are not very reliable because the corner is far from the data.

In general, you should expect less accuracy when you predict the model "outside" the data (for example, (10, 60)) as opposed to points that are "inside" the data (for example, (25, 70)). This concept is sometimes discussed in courses about the design of experiments. For a nice exposition, see the course notes of Professor Rafi Hafka (2012, p. 49–59) at the University of Florida.

The convex hull of bivariate points

You can use SAS to visualize the convex hull of the bivariate observations. The convex hull is the smallest convex set that contains the observations. The SAS/IML language supports the CVEXHULL function, which computes the convex hull for a set of planar points.

You can represent the points by using an N x 2 matrix, where each row is a 2-D point. When you call the CVEXHULL function, you obtain a vector of N integers. The first few integers are positive and represent the rows of the matrix that comprise the convex hull. The (absolute value of) the negative integers represents the rows that are interior to the convex hull. This is illustrated for the sample data:

proc iml;
use Sample;  read all var {x y} into points;  close;
 
/* get indices of points in the convex hull in counter-clockwise order */
indices = cvexhull( points );
print (indices`)[L="indices"];  /* positive indices are boundary; negative indices are inside */

The output shows that the observation numbers (indices) that form the convex hull are {1, 6, 7, 12, 13, 14, 11}. The other observations are in the interior. You can visualize the interior and boundary points by forming a binary indicator vector that has the value 1 for points on the boundary and 0 for points in the interior. To get the indicator vector in the order of the data, you need to use the SORTNDX subroutine to compute the anti-rank of the indices, as follows:

b = (indices > 0);               /* binary indicator variable for sorted vertices */
call sortndx(ndx, abs(indices)); /* get anti-rank, which is sort index that "reverses" the order */
onBoundary = b[ndx];             /* binary indicator data in original order */
 
title "Convex Hull of Bivariate Data";
call scatter(points[,1], points[,2]) group=onBoundary 
             option="markerattrs=(size=12 symbol=CircleFilled)";
Convex hull: Blue points are on the convex hull and red points are in the interior

The blue points are the boundary of the convex hull whereas the red points are in the interior.

Visualize the convex hull as a polygon

You can visualize the convex hull by forming the polygon that connects the first, sixth, seventh, ..., eleventh observations. You can do this manually by using the POLYGON statement in PROC SGPLOT, which I show in the Appendix section. However, there is an easier way to visualize the convex hull. I previously wrote about SAS/IML packages and showed how to install the polygon package. The polygon package contains a module called PolyDraw, which enables you to draw polygons and overlay a scatter plot.

The following SAS/IML statements extract the positive indices and use them to get the points on the boundary of the convex hull. If the polygon package is installed, you can load the polygon package and visualize the convex hull and data:

hullNdx = indices[loc(b)];         /* get positive indices */
convexHull = points[hullNdx, ];    /* extract the convex hull, in CC order */
 
/* In SAS/IML 14.1, you can use the polygon package to visualize the convex hull:
   https://blogs.sas.com/content/iml/2016/04/27/packages-share-sas-iml-programs-html */
package load polygon;    /* assumes package is installed */
run PolyDraw(convexHull, points||onBoundary) grid={x y}
    markerattrs="size=12 symbol=CircleFilled";

The graph shows the convex hull of the data. You can see that it primarily occupies the upper-right portion of the rectangle. The convex hull shows the interpolation region for regression models. If you evaluate a model outside the convex hull, you are extrapolating. In particular, even though points in the lower left corner of the plot are within the bounding box of the data, they are far from the data.

Of course, if you have 5, 10 or 100 explanatory variables, you will not be able to visualize the convex hull of the data. Nevertheless, the same lesson applies. Namely, when you evaluate the model inside the bounding box of the data, you might be extrapolating rather than interpolating. Just as in the univariate case, the model might predict nonsensical data when you extrapolate far from the data.

Appendix

Packages are supported in SAS/IML 14.1. If you are running an earlier version of SAS, you create the same graph by writing the polygon data and the binary indicator variable to a SAS data set, as follows:

hullNdx = indices[loc(b)];         /* get positive indices */
convexHull = points[hullNdx, ];    /* extract the convex hull, in CC order */
 
/* Write the data and polygon to SAS data sets. Use the POLYGON statement in PROC SGPLOT. */
p = points || onBoundary;       
poly = j(nrow(convexHull), 1, 1) || convexHull;
create TheData from p[colname={x y "onBoundary"}]; append from p;    close;
create Hull from poly[colname={ID cX cY}];         append from poly; close;
quit;
 
data All; set TheData Hull; run;    /* combine the data and convex hull polygon */
 
proc sgplot data=All noautolegend;
   polygon x=cX y=cY ID=id / fill;
   scatter x=x y=y / group=onBoundary markerattrs=(size=12 symbol=CircleFilled); 
   xaxis grid; yaxis grid;
run;

The resulting graph is similar to the one produced by the PolyDraw modules and is not shown.

The post Interpolation vs extrapolation: the convex hull of multivariate data appeared first on The DO Loop.

3月 152019
 
SAS makes it easy for you to create a large amount of procedure output with very few statements. However, when you create a large amount of procedure output with the Output Delivery System (ODS), your SAS session might stop responding or run slowly. In some cases, SAS generates a “Not Responding” message. Beginning with SAS® 9.3, the SAS windowing environment creates HTML output by default and enables ODS Graphics by default. If your code creates a large amount of either HTML output or ODS Graphics output, you can experience performance issues in SAS. This blog article discusses how to work around this issue.

Option 1: Enable the Output window instead of the Results Viewer window

By default, the SAS windowing environment with SAS 9.3 and SAS® 9.4 creates procedure output in HTML format and displays that HTML output in the Results Viewer window. However, when a large amount of HTML output is displayed in the Results Viewer window, performance might suffer. To display HTML output in the Results Viewer window, SAS uses an embedded version of Internet Explorer within the SAS environment. And because Internet Explorer does not process large amounts of HTML output well, it can slow down your results.

If you do not need to create HTML output, you can display procedure output in the Output window instead. To do so, add the following statements to the top of your code before the procedure step:

   ods _all_ close; 
   ods listing;

The Output window can show results faster than HTML output that is displayed in the Results Viewer window.

If you want to enable the Output window via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, select Create listing and clear the Create HTML check box.
    4. Click OK.

A large amount of output in the Output window, which typically does not cause a performance issue, might still generate an “Output window is full” message. In that case, you can route your LISTING output to a disk file. Use either the PRINTTO procedure or the ODS LISTING statement with the FILE= option. Here is an example:

   ods _all_ close; 
   ods listing file="sasoutput.lst"; 

Option 2: Disable ODS Graphics

Beginning with SAS 9.3, the SAS windowing environment enables ODS Graphics by default. Therefore, most SAS/STAT® procedures now create graphics output automatically. Naturally, graphics output can take longer to create than regular text output. If you are running a SAS/STAT procedure but you do not need to create graphics output, add the following statement to the code before the procedure step:

   ods graphics off; 

If you want to set this option via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, clear the Use ODS Graphics check box.
    4. Click OK.

For maximum efficiency, you can combine the ODS GRAPHICS OFF statement with the statements listed in the previous section, as shown here:

   ods _all_ close;
   ods listing;
   ods graphics off; 

Option 3: Write ODS output to disk

You can ask SAS to write ODS output to disk but not to create output in the Results Viewer window. To do so, add the following statement to your code before your procedure step:

   ods results off;

Later in your SAS session, if you decide that you want to see output in the Results Viewer window, submit this statement:

   ods results on;

If you want to disable the Results Viewer window via the SAS windowing environment, take these steps:

    1. Choose Tools ► Options ► Preferences.
    2. Click the Results tab.
    3. In this window, clear the View results as they are generated check box.
    4. Click OK.

The ODS RESULTS OFF statement is a valuable debugging tool because it enables you to write ODS output to disk without viewing it in the Results Viewer window. You can then inspect the ODS output file on disk to check the size of it (before you open it).

Option 4: Suppress specific procedure output from the ODS results

In certain situations, you might use multiple procedure steps to send output to ODS. However, if you want to exclude certain procedure output from being written to ODS, use the following statement:

   ods exclude all;

Ensure that you place the statement right before the procedure step that contains the output that you want to suppress.

If necessary, use the following statement when you want to resume sending subsequent procedure output to ODS:

   ods exclude none;

Five reasons to use ODS EXCLUDE to suppress SAS output discusses the ODS EXCLUDE statement in more detail.

Conclusion

Certain web browsers display large HTML files better than others. When you use SAS to create large HTML files, you might try using a web browser such as Chrome, Firefox, or Edge instead of Internet Explorer. However, even browsers such as Chrome, Firefox, and Edge might run slowly when processing a very large HTML file.

Instead, as a substitute for HTML, you might consider creating PDF output (with the ODS PDF destination) or RTF output (with the ODS RTF destination). However, if you end up creating a very large PDF or RTF file, then Adobe (for PDF output) and Microsoft Word (for RTF output) might also experience performance issues.

The information in this blog mainly pertains to the SAS windowing environment. For information about how to resolve ODS issues in SAS® Enterprise Guide®, refer to Take control of ODS results in SAS Enterprise Guide.

How to view or create ODS output without causing SAS® to stop responding or run slowly was published on SAS Users.

3月 142019
 

I am obsessed with jigsaw puzzles. Specifically, 1000-piece mystery puzzles, entertaining not just for their pictorial humor, but also for the challenge. Unlike traditional puzzles, you don't know what you are putting together because the completed puzzle isn't pictured on the box. Mystery puzzles are constructed so that you must [...]

Puzzle obsession ... piecing together a 360-degree view of a patient. was published on SAS Voices by Heather Hallett

3月 142019
 

I am obsessed with jigsaw puzzles. Specifically, 1000-piece mystery puzzles, entertaining not just for their pictorial humor, but also for the challenge. Unlike traditional puzzles, you don't know what you are putting together because the completed puzzle isn't pictured on the box. Mystery puzzles are constructed so that you must [...]

Puzzle obsession ... piecing together a 360-degree view of a patient. was published on SAS Voices by Heather Hallett

3月 142019
 

Why is sports analytics rapidly becoming a high priority for several universities? Because they’re beginning to truly understand the significance of a winning athletic program. The value of a win speaks volumes and takes on different forms. It trickles down to the fans filling the stadium seats, television contracts, fan [...]

It’s a game changer: Sports analytics in higher education was published on SAS Voices by Ron Righter