8月 192019
 

One of my friends likes to remind me that "there is no such thing as a free lunch," which he abbreviates by "TINSTAAFL" (or TANSTAAFL). The TINSTAAFL principle applies to computer programming because you often end up paying a cost (in performance) when you call a convenience function that simplifies your program.

I was thinking about TINSTAAFL recently when I was calling a Base SAS function from the SAS/IML matrix language. The SAS/IML language supports hundreds of built-in functions that operate on vectors and matrices. However, you can also call hundreds of functions in Base SAS and pass in vectors for the parameters. It is awesome and convenient to be able to call the virtual smorgasbord of functions in Base SAS, such as probability function, string matching functions, trig function, financial functions, and more. Of course, there is no such thing as a free lunch, so I wondered about the overhead costs associated with calling a Base SAS function from SAS/IML. Base SAS functions typically are designed to operate on scalar values, so the IML language has to call the underlying function many times, once for each value of the parameter vector. It is more expensive to call a function a million times (each time passing in a scalar parameter) than it is to call a function one time and pass in a vector that contains a million parameters.

To determine the overhead costs, I decided to test the cost of calling the MISSING function in Base SAS. The IML language has a built-in syntax (b = (X=.)) for creating a binary variable that indicates which elements of a vector are missing. The call to the MISSING function (b = missing(X)) is equivalent, but requires calling a Base SAS many times, once for each element of x. The native SAS/IML syntax will be faster than calling a Base SAS function (TINSTAAFL!), but how much faster?

The following program incorporates many of my tips for measuring the performance of a SAS computation. The test is run on large vectors of various sizes. Each computation (which is very fast, even on large vectors) is repeated 50 times. The results are presented in a graph. The following program measures the performance for a character vector that contains all missing values.

/* Compare performance of IML syntax
   b = (X = " ");
   to performance of calling Base SAS MISSING function 
   b = missing(X);
*/
proc iml;
numRep = 50;                            /* repeat each computation 50 times */
sizes = {1E4, 2.5E4, 5E4, 10E4, 20E4};  /* number of elements in vector */
labl = {"Size" "T_IML" "T_Missing"};
Results = j(nrow(sizes), 3);
Results[,1] = sizes;
 
/* measure performance for character data */
do i = 1 to nrow(sizes);
   A = j(sizes[i], 1, " ");            /* every element is missing */
   t0 = time();
   do k = 1 to numRep;
      b = (A = " ");                   /* use built-in IML syntax */
   end;
   Results[i, 2] = (time() - t0) / numRep;
 
   t0 = time();
   do k = 1 to numRep;
      b = missing(A);                  /* call Base SAS function */
   end;
   Results[i, 3] = (time() - t0) / numRep;
end;
 
title "Timing Results for (X=' ') vs missing(X) in SAS/IML";
title2 "Character Data";
long = (sizes // sizes) || (Results[,2] // Results[,3]);   /* convert from wide to long for graphing */
Group = j(nrow(sizes), 1, "T_IML") // j(nrow(sizes), 1, "T_Missing"); 
call series(long[,1], long[,2]) group=Group grid={x y} label={"Size" "Time (s)"} 
            option="markers curvelabel" other="format X comma8.;";

The graph shows that the absolute times for creating a binary indicator variable is very fast for both methods. Even for 200,000 observations, creating a binary indicator variable takes less than five milliseconds. However, on a relative scale, the built-in SAS/IML syntax is more than twice as fast as calling the Base SAS MISSING function.

You can run a similar test for numeric values. For numeric values, the SAS/IML syntax is about 10-20 times faster than the call to the MISSING function, but, again, the absolute times are less than five milliseconds.

So, what's the cost of calling a Base SAS function from SAS/IML? It's not free, but it's very cheap in absolute terms! Of course, the cost depends on the number of elements that you are sending to the Base SAS function. However, in general, there is hardly any cost associated with calling a Base SAS function from SAS/IML. So enjoy the lunch buffet! Not only is it convenient and plentiful, but it's also very cheap!

The post Timing performance in SAS/IML: Built-in functions versus Base SAS functions appeared first on The DO Loop.

8月 192019
 

As you will have read in my last blog, businesses are demanding better outcomes, and through IoT initiatives big data is only getting bigger. This presents a clear opportunity for organisations to start thinking seriously about how to leverage analytics with their other investments. Demands on supply chains have also [...]

Can the artificial intelligence of things make the supply chain intelligent? was published on SAS Voices by Tim Clark

8月 182019
 

Have you ever thought of selling sand on the beach? Neither have I. To most people the mere idea is preposterous. But isn’t it how all great discoveries and inventions are made? Someone comes up with an outwardly crazy, outlandish idea, and despite all the skepticism, criticism, ostracism, ridicule and [...]

Selling sand at the beach was published on SAS Voices by Leonid Batkhan

8月 162019
 

The Output Delivery System (ODS) Graphics procedures provide many options to give you control over the look of your output. However, there are times when your output does not look like you thought it would.

This blog discusses how to solve some common output-related problems that we hear about in Technical Support.

All of the examples in this blog relate to creating scatter plots and bar charts from the same data set, SASHELP.CLASS. This data set, included in your SAS® installation, provides information about heights and ages for both male and female students.

Colors in the output are not as desired

Using the STYLEATTRS statement

The STYLEATTRS statement enables you to define attributes, such as color, for graphical elements.

In the following SGPLOT procedure example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink:

proc sgplot data=sashelp.class;
   styleattrs datacolors=(blue pink);
   scatter x=age y=height / group=sex
   markerattrs=(symbol=circlefilled);
run;

However, after you submit the code, the resulting plot does not use the specified colors. Instead, you see blue and red:

When defining colors for graphical elements, the DATACOLORS= option defines the colors for filled areas, and the DATACONTRASTCOLORS= option defines the colors for marker symbols and lines.

Because the scatter plot is creating marker symbols, you have to change the STYLEATTRS statement to use the DATACONTRASTCOLORS= option instead of the DATACOLORS= option. Here is the revised code:

proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink);
   scatter x=age y=height / group=sex 
   markerattrs=(symbol=circlefilled);
run;

Now, when you submit the updated code, the correct colors appear:

You can find more information about the STYLEATTRS statement in the STYLEATTRS section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Using an attribute map

An attribute map enables you to associate specific values for your plot GROUP= variable with specific graphical attributes.

The attribute map is defined in a data set that includes the following:

  • an ID variable that contains the name of the attribute map definition
  • a VALUE variable that contains the value of the plot statement GROUP= variable
  • any other variables for the attributes that you want to define

In the following example, the attribute map BARCOLORS is defined to associate the group value F with the color pink and the group value M with the color blue. Note that the FILLCOLOR variable is also used to define the colors for the bars of the bar chart.

data attrmap;
   id='barcolors';
      input value $ fillcolor $;
      datalines;
F pink
M blue
;
run;
proc format;
   value $ genderfmt
      'F'='Female'
      'M'='Male';
run;
proc sgplot data=sashelp.class dattrmap=attrmap;
   vbar age / response=height group=sex groupdisplay=cluster 
   nooutline attrid=barcolors;
   format sex $genderfmt.;
run;

However, the output shows blue and red bars, instead of using the pink and blue values that you specified:

In this case, a format is defined to display the group values as Female and Male. The attribute map associates the group values of F and M with the pink and blue bar colors that you want, so the values do not match.

You need to change the attribute map VALUE variable so that it contains the formatted value of the GROUP= variable. Here is the first part of the code again, highlighted to show where it has changed:

With the updated values, the output now displays the correct colors:

You can find more information about attribute maps in the SG Attribute Maps section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Symbols in the output are not as desired

In the following PROC SGPLOT example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink. Also, the marker symbols should be either filled circles or filled squares:

ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

You submit the code using the STYLES.HTMLBLUE style. The output shows all the symbols as circles, and none are squares:

The ATTRPRIORITY ODS Graphics option determines how attributes are cycled. The default value for the ATTRPRIORITY option is defined in the style that is being used.

The STYLES.HTMLBLUE style sets the default value COLOR for the ATTRPRIORITY option. This COLOR value cycles the symbols through your specified colors before the second symbol is generated.

You want to set the ATTRPRIORITY ODS Graphics option to NONE in an ODS GRAPHICS statement. That ODS GRAPHICS statement then prevents the symbols from cycling through the colors list:

ods graphics /attrpriority=none;
ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

In the updated output, squares are now seen in the correct color, pink:

You can read more about how attributes are cycled in the following blog post by Rick Wicklin:
Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics

Annotation is not placed in the output as desired

Adding an oval

In this example, you want the resulting scatter plot to contain an oval around the circle that represents the tallest student:

proc sql;
   create table maxheight as
   select height, age
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x=age;
       y=height;
       width=6;
       height=6;
       linecolor='red';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

After you submit the code, you notice that the resulting plot does not include the oval:

If you have used annotation in SAS/GRAPH® software, you might be accustomed to using the X and Y variables in the annotation data set to indicate the location of the annotation. However, ODS Statistical Graphics (SG) annotation uses X1 and Y1 variables for the location of the annotation.

Therefore, you need to change the X and Y variables in the annotation data set to X1 and Y1 instead:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x1=age;
       y1=height;
       width=6;
       height=6;
       linecolor='red';
run;

In the next version of the scatter plot, the oval now appears:

Adding a text label

In this example, you want to place a text label next to the circle that represents the tallest student.

proc sql;
   create table maxheight as
   select height, age, name
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='label';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

However, after you run this code, the output does not include the text label:

Again, if you are a SAS/GRAPH user, you might assume that the LABEL function can place text on a plot. However, ODS SG annotation needs to use the TEXT function instead.

In the previous DATA step, you need to change the FUNCTION variable so that it contains the value TEXT:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='text';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;

After you revise the DATA step and resubmit your code, you then see that the text label appears where intended:

You can find more information about SG annotation in the SG Annotation section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Summary

These are just a few examples to demonstrate some of the common output-related problems that we hear about in Technical Support. If your graphical output does not appear as you wanted, consider the options that you are using and make sure that you are using the correct option.

Learn More

How to fix common problems in output from ODS Graphics procedures was published on SAS Users.

8月 142019
 

Many programmers are familiar with "short-circuit" evaluation in an IF-THEN statement. Short circuit means that a program does not evaluate the remainder of a logical expression if the value of the expression is already logically determined. The SAS DATA step supports short-circuiting for simple logical expressions in IF-THEN statements and WHERE clauses (Polzin 1994, p. 1579; Gilsen 1997). For example, in the following logical-AND expression, the condition for the variable Y does not need to be checked if the condition for variable X is true:

data _null_;
set A end=eof;
if x>0 & y>0 then   /* Y is evaluated only if X>0 */
   count + 1;
if eof then 
   put count=;
run;

Order the conditions in a logical statement by likelihood

SAS programmers can optimize their IF-THEN and WHERE clauses if they can estimate the probability of each separate condition in a logical expression:

  • In a logical AND statement, put the least likely events first and the most likely events last. For example, suppose you want to find patients at a VA hospital that are male, over the age of 50, and have kidney cancer. You know that kidney cancer is a rare form of cancer. You also know that most patients at the VA hospital are male. To optimize a WHERE clause, you should put the least probable conditions first:
    WHERE Disease="Kidney Cancer" & Age>50 & Sex="Male";
  • In a logical OR statement, put the most likely events first and the least likely events last. For example, suppose you want to find all patients at a VA hospital that are either male, or over the age of 50, or have kidney cancer. To optimize a WHERE clause, you should use
    WHERE Sex="Male" | Age>50 | Disease="Kidney Cancer";

The SAS documentation does not discuss the conditions for which a logical expression does or does not short circuit. Polzin (1994, p. 1579) points out that when you put function calls in the logical expression, SAS evaluates certain function calls that produce side effects. Common functions that have side effects include random number functions and user-defined functions (via PROC FCMP) that have output arguments. The LAG and DIF functions can also produce side effects, but it appears that expressions that involve the LAG and DIF functions are short-circuited. You can force a function evaluation by calling the function prior to an IF-THEN statement. You can use nested IF-THEN/ELSE statements to ensure that functions are not evaluated unless prior conditions are satisfied.

Logical ligatures

The SAS/IML language does not support short-circuiting in IF-THEN statements, but it performs several similar optimizations that are designed to speed up your code execution. One optimization involves the ANY and ALL functions, which test whether any or all (respectively) elements of a matrix satisfy a logical condition. A common usage is to test whether a missing value appear anywhere in a vector, as shown in the following SAS/IML statement:

bAny = any(y = .);   /* TRUE if any element of Y is missing */
/* Equivalently, use   bAll = all(y ^= .);  */

The SAS/IML language treats simple logical expressions like these as a single function call, not as two operations. I call this a logical ligature because two operations are combined into one. (A computer scientist might just call this a parser optimization.)

You might assume that the expression ANY(Y=.) is evaluated by using a two-step process. In the first step, the Boolean expression y=. is evaluated and the result is assigned to a temporary binary matrix, which is the same size as Y. In the second step, the temporary matrix is sent to the ANY function, which evaluates the binary matrix and returns TRUE if any element is nonzero. However, it turns out that SAS/IML does not use a temporary matrix. The SAS/IML parser recognizes that the expression inside the ANY function is a simple logical expression. The program can evaluate the function by looking at each element of Y and returning TRUE as soon it finds a missing value. In other words, it short-circuits the computation. If no value is missing, the expression evaluates to FALSE.

Short circuiting an operation can save considerable time. In the following SAS/IML program, the vector X contains 100 million elements, all equal to 1. The vector Y also contains 100 million elements, but the first element of the vector is a missing value. Consequently, the computation for Y is essentially instantaneous whereas the computation for X takes a tenth of a second:

proc iml;
numReps = 10;      /* run computations 10 times and report average */
N = 1E8;           /* create vector with 100 million elements */
x = j(N, 1, 1);    /* all elements of x equal 1 */
y = x; y[1] = .;   /* the first element of x is missing */
 
/* the ALL and ANY functions short-circuit when the 
   argument is a simple logical expression */
/* these function calls examine only the first elements */
t0 = time();
do i = 1 to numReps;
   bAny = any(y = .);   /* TRUE for y[1] */
   bAll = all(y ^= .);  /* TRUE for y[2] */
end;
t = (time() - t0) / numReps;
print t[F=BEST6.];
 
/* these function calls must examine all elements */
t0 = time();
do i = 1 to numReps;
   bAny = any(x = .);   
   bAll = all(x ^= .);
end;
t = (time() - t0) / numReps;
print t[F=BEST6.];

Although the evaluation of X does not short circuit, it still uses the logical ligature to evaluate the expression. Consequently, the evaluation is much faster than the naive two-step process that is shown explicitly by the following statements, which require about 0.3 seconds and twice as much memory:

   /* two-step process: slower */
   b1 = (y=.);                /* form the binary vector */
   bAny = any(b1);            /* TRUE for y[1] */

In summary, the SAS DATA step uses short-circuit evaluation in IF-THEN statements and WHERE clauses that use simple logical expressions. If the expression contains several subexpressions, you can optimize your program by estimating the probability that each subexpression is true. In the SAS/IML language, the ANY and ALL functions not only short circuit, but when their argument is a simple Boolean expression, the language treats the function call as a logical ligature and evaluates the call in an efficient manner that does not require any temporary memory.

Short circuits can be perplexing if you don't expect them. Equally confusing is expecting a statement to short circuit, but it doesn't. If you have a funny story about short-circuit operators, in any language, leave a comment.

The post Short-circuit evaluation and logical ligatures in SAS appeared first on The DO Loop.

8月 132019
 

Across all industries, organizations are adopting a cloud-first strategy. What does it mean to be cloud-first? Broadly speaking, cloud first means using shared infrastructure instead of building and hosting your own private storage facility, systems, etc. Benefits of adopting a cloud-first strategy include cost savings and productivity improvements. However, what [...]

Five considerations for your cloud-first strategy was published on SAS Voices by Kevon Hayati

8月 132019
 

Raw data doesn’t change an organization, and neither do analytics on their own. It’s making decisions based on that data and the results of analytics that drives change through a company. Every decision is important and influences an organization. Thousands of decisions need to be made every day and many decisions are dependent on other decisions in an interconnected network.

SAS Intelligent Decisioning combines business rules management, decision processing, real-time event detection, decision governance and analytics to automate and manage decisions across the enterprise. It supports customer-facing activities such as personalized marketing and next-best action, plus decisions affecting customers, including credit services and fraud prevention.

Overview

Business rules

An integrated business rule management platform enables fast rule construction, testing, governance and integration within decision flows. You can manage rule versions for tracking and governance. The solution allows users to create complex business logic supported by sophisticated functions and integration with Lookup Tables.

Decision flows

A graphical drag-and-drop interface allows users to build decisions with minimal programming effort. Decisions are created in a decision flow that orchestrates business rules, analytical models, database access, custom code objects and more.

Graphical editor to create decisions

Further, it is possible to test and maintain different versions of decisions and business rules before deploying them for production real-time or batch execution.

The high-performance, real-time Micro Analytics Services (MAS) engine can handle more than 5,000 real-time transactions per second with response times of 10 milliseconds per transaction. The REST interface to call decisions or business rules in real-time provides simple integration with most third-party applications.

Monitor test results through Decision Path tracking

New Features

Recently, the latest release of SAS intelligent Decisioning was released and I’d like to highlight some of the new features.

SQL Query Node

Users can now submit SQL directly into a SQL Query node without supplying any additional coding logic. The SQL Query node supports SELECT, INSERT, DELETE and UPDATE.

To link a SQL statement to a decision, just point tables and columns to the decision variables as shown below in the curly brackets. Intelligent Decisioning will then automatically pass data into the SQL as appropriate.

If you query data via a select statement, the result is returned in a Datagrid. A Datagrid is a data type for an object in Intelligent Decisioning and represents data in a table format that belongs to a single record.

Datagrids are used in many places in Intelligent Decisioning and there is a rich set of Datagrid functions to access and work with data in a Datagrid.

Python Code Node

Intelligent Decisioning provides an environment that aims to minimize the need to write code to build decisions. But if necessary, it is possible to submit code. Intelligent Decisioning supports writing code in Python as part of a decision flow. Data from a decision flow can be passed into the Python code and return values will be passed back from Python into the decision flow.

To enable coding in Python, a Python execution environment needs to be installed alongside Intelligent Decisioning. If a decision flow contains a Python Code Node, the Python code will automatically be executed in the Python environment as part of the overall decision.

Decision Flow containing Python code node

A code editor in Intelligent Decisioning allows you to edit your Python code within the environment.

A Python code editor is part of Intelligent Decisioning

Decision Node

Decision flows can call other decision flows. This opens the way to designing and building modular decisions with “pluggable” components. You can also build reusable decisions which are called by different decision flows. Building decisions in such a modular way makes it easier to read and maintain decision flows.

Drill down from one decision to the next

Treatments

Treatments are lists of attributes with fixed or dynamic values.

Treatments are used to define offers to present to a customer as a result of an inbound marketing campaign. Or treatments can be used as parameter lists to control engine settings. There are numerous use cases for treatments.

Treatment attribute list

To determine if a treatment is valid for a decision, you can set Eligibility Rules to decide when a treatment will be used. For audit reasons and to track changes over time, you can also have different versions of a treatment.

To utilize treatments, you group them together in treatment groups, which can then be called from a decision flow.

Conclusion

Manging and analysing high volumes of data to make thousands of decisions every day in an automated fashion and applying analytics to real-time customer interactions require a sophisticated and complete solution like SAS Intelligent Decisioning. It enables users to create, test, control versioning and trace analytically driven decisions all in one solution.

By making decisions, smarter organizations become more efficient. As mentioned in the beginning: Data doesn’t change your organization, decisions do!

Learn more

Video: SAS Intelligent Decisioning | Product Overview
Documentation: SAS Intelligent Decisioning
Product: SAS Intelligent Decisioning

SAS Intelligent Decisioning: Intro and Update was published on SAS Users.

8月 122019
 

In part one of this blog series, we introduced the automation of AI (i.e., artificial intelligence) as a multifaceted and evolving topic for marketing and segmentation. After a discussion on maximizing the potential of a brand's first-party data, a machine learning method incorporating natural language explanations was provided in the context [...]

SAS Customer Intelligence 360: Automated AI and segmentation [Part 2] was published on Customer Intelligence Blog.