Tech

2月 072018
 

Jazz up your Geo Map or Network Analysis graph by applying icon-based display rule markers instead of color markers on the map. With SAS Visual Analytics, you may have already used display rules by populating intervals or adding color-mapped values for report objects. Now, you can jazz up your Geo Map or Network Analysis object by choosing from a curated set of icons and applying icon-based display rule markers.SAS Visual Analytics Geo Map

The set of curated icons in SAS Visual Analytics 8.2 are classified into these groups for use with icon-based display rules:

SAS Visual Analytics Geo Map
Here is an example of the display-rule icons that are available for Status:

SAS Visual Analytics Geo Map
When your mouse hovers over an icon, the name of that icon is displayed.

Applying Icon-Based Display Rules to a Geo Map

While working with a data source that included a measure for the total number of cellular mobile subscriptions per 100 in each country, I wanted to display the results in a Geo Map. Before creating the display rules, I looked at the data for the mobile cellular subscriptions for various countries, and decided that I wanted to create four display rules, each one associated with the number of mobile cellular subscriptions per 100. The icons that I wanted for my display rules were all available under Status. So here’s how I decided to set up my operators, values, and the icon style and color:


Here are the steps I followed to setup the icon-based display rules.

Create the New Geography Item

1.  In my new SAS Visual Analytics 8.2 report, I went to Objects, chose Geo Map (available under Graphs) and dragged it over to the blank canvas.
2.  From Data, I searched for my data source and added it to the report.
3.  In my data source, I highlighted the category (Country), right clicked, and selected New Geography.
4.  In the New Geography Item dialog, I entered a name for the new geographic item that I was creating: Country (Geographic Item).

Change Geo Map Type to Coordinates

5.  I select the Geo Map, go to Options in SAS Visual Analytics and scroll down to Map.
6.  By default, the Type is set to Bubbles. I change it to Coordinates (this is a requirement to create the icon-based display rules).


7.  The default size for the Marker size is 11. I change it to 14 because I would like my markers to show up slightly bigger in the Geo Map.
8.   By default, Legend is displayed for the Geo Map and Visibility is set to On. I chose not to display Legend information for the Geo Map, so I chose Off for Visibility.

Choose Role for the Geo Map

9.  I choose Roles, and I am ready to assign the geographic data item to Category. So I choose the Country (Geographic Item) that I had just created, and drag it over to Category. I now see the Country (Geographic) data role applied to the Geo Map.

Create Icon-Based Display Rules for the Geo Map

10.  I click on Rules and under Display Rules, I click on New rule.


11.  In the New Display Rule dialog, I chose <= for Operator and entered a 250 for Value.
12.  I click on Style and choose Red as the color for this display rule.


13.  I click on Icon, and I am presented with seven categories for the icons. When I hover an icon, the icon name is displayed.
14.  I click on the Significantly Lower icon and click OK.


15.  A quick review of what I just created and I click OK.


16.  I continue to create three additional display rules for my Geo Map.

Now, I have completed creating the four display rules. Here’s how they show in the SAS Visual Analytics Viewer:


17.  When all of the display rules have been created, the Geo Map displays with the colorful icon-based display rules applied to the various countries.


You’ve just seen how you can create icon-based display rules for a Geo Map. You can also create icon-based display rules for a Network Analysis object as well.

Jazz up a Geo Map with colorful icon-based display rules was published on SAS Users.

1月 272018
 

Are you interested in using SAS Visual Analytics 8.2 to visualize a state by regions, but all you have is a county shapefile?  As long as you can cross-walk counties to regions, this is easier to do than you might think.

Here are the steps involved:

Step 1

Obtain a county shapefile and extract all components to a folder. For example, I used the US Counties shapefile found in this SAS Visual Analytics community post.

Note: Shapefile is a geospatial data format developed by ESRI. Shapefiles are comprised of multiple files. When you unzip the shapefile found on the community site, make sure to extract all of its components and not just the .shp. You can get more information about shapefiles from this Wikipedia article:  https://en.wikipedia.org/wiki/Shapefile.

Step 2

Run PROC MAPIMPORT to convert the shapefile into a SAS map dataset.

libname geo 'C:\Geography'; /*location of the extracted shapefile*/
 
proc mapimport datafile="C:\Geography\UScounties.shp"
out=geo.shapefile_counties;
run;

Step 3

Add a Region variable to your SAS map dataset. If all you need is one state, you can subset the map dataset to keep just the state you need. For example, I only needed Texas, so I used the State_FIPS variable to subset the map dataset:

proc sql;
create table temp as select
*, 
/*cross-walk counties to regions*/
case
when name='Anderson' then '4'
when name='Andrews' then '9'
when name='Angelina' then '5'
when name='Aransas' then '11',
<……>
when name='Zapata' then '11'
when name='Zavala' then '8'
end as region 
from geo.shapefile_counties
/*subset to Texas*/
where state_fips='48'; 
quit;

Step 4

Use PROC GREMOVE to dissolve the boundaries between counties that belong to the same region. It is important to sort the county dataset by region before you run PROC GREMOVE.

proc sort data=temp;
by region;
run;
 
proc gremove
data=temp
out=geo.regions_shapefile
nodecycle;
by region;
id name; /*name is county name*/
run;

Step 5

To validate that your boundaries resolved correctly, run PROC GMAP to view the regions. If the regions do not look right when you run this step, it may signal an issue with the underlying data. For example, when I ran this with a county shapefile obtained from Census, I found that some of the counties were mislabeled, which of course, caused the regions to not dissolve correctly.

proc gmap map=geo.regions_shapefile data=geo.regions_shapefile all;
   id region;
   choro region / nolegend levels=1;
run;

Here’s the result I got, which is exactly what I expected:

Custom Regional Maps in SAS Visual Analytics

Step 6

Add a sequence number variable to the regions dataset. SAS Visual Analytics 8.2 needs it properly define a custom polygon inside a report:

data geo.regions_shapefile;
set geo.regions_shapefile;
seqno=_n_;
run;

Step 7

Load the new region shapefile in SAS Visual Analytics.

Step 8

In the dataset with the region variable that you want to visualize, create a new geography variable and define a new custom polygon provider.

Geography Variable:

Polygon Provider:

Step 9

Now, you can create a map of your custom regions:

How to create custom regional maps in SAS Visual Analytics 8.2 was published on SAS Users.

1月 262018
 

If you have worked with the different types of score code generated by the high-performance modeling nodes in SAS® Enterprise Miner™ 14.1, you have probably come across the Analytic Store (or ASTORE) file type for scoring.  The ASTOREfile type works very well for scoring complex machine learning models like random forests, gradient boosting, support vector machines and others. In this article, we will focus on ASTORE files generated by SAS® Viya® Visual Data Mining and Machine Learning (VDMML) procedures. An introduction to analytic stores on SAS Viya can be found here.

In this post, we will:

  1. Generate an ASTORE file for a PROC ASTORE in SAS Visual Data Mining and Machine Learning.

Generate an ASTORE file for a gradient boosting model

Our example dataset is a distributed in-memory CAS table that contains information about applicants who were granted credit for a certain home equity loan. The categorical binary-valued target variable ‘BAD’ identifies if a client either defaulted or repaid their loan. The remainder of the variables indicating the candidate’s credit history, debt-to-income ratio, occupation, etc., are used as predictors for the model. In the code below, we are training a gradient boosting model on a randomly sampled 70% of the data and validating against 30% of the data. The statement SAVESTATE creates an analytic store file (ASTORE) for the model and saves it as a binary file named “astore_gb.”

proc gradboost data=PUBLIC.HMEQ;
partition fraction(validate=0.3);
target BAD / level=nominal;
input LOAN MORTDUE DEBTINC VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO /
level=interval;
input REASON JOB / level=nominal;
score out=public.hmeq_scored copyvars=(_all_);
savestate rstore=public.astore_gb;
id _all_;
run;

Shown below are a few observations from the scored dataset hmeq_scored  where YOJ (years at present job) is greater than 10 years.

PROC ASTORE

Override the scoring decision using PROC ASTORE

In this segment, we will use PROC ASTORE to override the scoring decision from the gradient boosting model. To that end, we will first make use of the DESCRIBE statement in PROC ASTORE to produce basic DS2 scoring code using the EPCODE option. We will then edit the score code in DS2 language syntax to override the scoring decision produced from the gradient boosting model.

proc astore;
    describe rstore=public.astore_gb
        epcode="/viyafiles/jukhar/gb_epcode.sas"; run;

A snapshot of the output from the above code statements are shown below. The analytic store is assigned to a unique string identifier. We also get information about the analytic engine that produced the store (gradient boosting, in this case) and the time when the store was created. In addition, though not shown in the snapshot below, we get a list of the input and output variables used.

Let’s take a look at the DS2 score code (“gb_epcode.sas”) produced by the EPCODE option in the DESCRIBE statement within PROC ASTORE.

data sasep.out;
	 dcl package score sc();
	 dcl double "LOAN";
	 dcl double "MORTDUE";
	 dcl double "DEBTINC";
	 dcl double "VALUE";
	 dcl double "YOJ";
	 dcl double "DEROG";
	 dcl double "DELINQ";
	 dcl double "CLAGE";
	 dcl double "NINQ";
	 dcl double "CLNO";
	 dcl nchar(7) "REASON";
	 dcl nchar(7) "JOB";
	 dcl double "BAD";
	 dcl double "P_BAD1" having label n'Predicted: BAD=1';
	 dcl double "P_BAD0" having label n'Predicted: BAD=0';
	 dcl nchar(32) "I_BAD" having label n'Into: BAD';
	 dcl nchar(4) "_WARN_" having label n'Warnings';
	 Keep 
		 "P_BAD1" 
		 "P_BAD0" 
		 "I_BAD" 
		 "_WARN_" 
		 "BAD" 
		 "LOAN" 
		 "MORTDUE" 
		 "VALUE" 
		 "REASON" 
		 "JOB" 
		 "YOJ" 
		 "DEROG" 
		 "DELINQ" 
		 "CLAGE" 
		 "NINQ" 
		 "CLNO" 
		 "DEBTINC" 
		;
	 varlist allvars[_all_];
	 method init();
		 sc.setvars(allvars);
		 sc.setKey(n'F8E7B0B4B71C8F39D679ECDCC70F6C3533C21BD5');
	 end;
	 method preScoreRecord();
	 end;
	 method postScoreRecord();
	 end;
	 method term();
	 end;
	 method run();
		 set sasep.in;
		 preScoreRecord();
		 sc.scoreRecord();
		 postScoreRecord();
	 end;
 enddata;

The sc.setKey in the method init () method block contains a string identifier for the analytic store; this is the same ASTORE identifier that was previously outputted as part of PROC ASTORE. In order to override the scoring decision created from the original gradient boosting model, we will edit the gb_epcode.sas file (shown above) by inserting new statements in the postScoreRecord method block; the edited file must follow DS2 language syntax. For more information about the DS2 language, see 

method postScoreRecord();
      if YOJ>10 then do; I_BAD_NEW='0'; end; else do; I_BAD_NEW=I_BAD; end;
    end;

Because we are saving the outcome into a new variable called “I_BAD_NEW,” we will need to declare this variable upfront along with the rest of the variables in the score file.

In order for this override to take effect, we will need to run the SCORE statement in PROC ASTORE and provide both the original ASTORE file (astore_gb), as well as the edited DS2 score code (gb_epcode.sas).

proc astore;
    score data=public.hmeq epcode="/viyafiles/jukhar/gb_epcode.sas"
          rstore=public.astore_gb
          out=public.hmeq_new; run;

A comparison of “I_BAD” and “I_BAD_NEW” in the output of the above code for select variables shows that the override rule for scoring has indeed taken place.

In this article we explored how to override the scoring decision produced from a machine learning model in SAS Viya. You will find more information about scoring in the Using PROC ASTORE to override scoring decisions in SAS® Viya® was published on SAS Users.

1月 262018
 

If you have worked with the different types of score code generated by the high-performance modeling nodes in SAS® Enterprise Miner™ 14.1, you have probably come across the Analytic Store (or ASTORE) file type for scoring.  The ASTOREfile type works very well for scoring complex machine learning models like random forests, gradient boosting, support vector machines and others. In this article, we will focus on ASTORE files generated by SAS® Viya® Visual Data Mining and Machine Learning (VDMML) procedures. An introduction to analytic stores on SAS Viya can be found here.

In this post, we will:

  1. Generate an ASTORE file for a PROC ASTORE in SAS Visual Data Mining and Machine Learning.

Generate an ASTORE file for a gradient boosting model

Our example dataset is a distributed in-memory CAS table that contains information about applicants who were granted credit for a certain home equity loan. The categorical binary-valued target variable ‘BAD’ identifies if a client either defaulted or repaid their loan. The remainder of the variables indicating the candidate’s credit history, debt-to-income ratio, occupation, etc., are used as predictors for the model. In the code below, we are training a gradient boosting model on a randomly sampled 70% of the data and validating against 30% of the data. The statement SAVESTATE creates an analytic store file (ASTORE) for the model and saves it as a binary file named “astore_gb.”

proc gradboost data=PUBLIC.HMEQ;
partition fraction(validate=0.3);
target BAD / level=nominal;
input LOAN MORTDUE DEBTINC VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO /
level=interval;
input REASON JOB / level=nominal;
score out=public.hmeq_scored copyvars=(_all_);
savestate rstore=public.astore_gb;
id _all_;
run;

Shown below are a few observations from the scored dataset hmeq_scored  where YOJ (years at present job) is greater than 10 years.

PROC ASTORE

Override the scoring decision using PROC ASTORE

In this segment, we will use PROC ASTORE to override the scoring decision from the gradient boosting model. To that end, we will first make use of the DESCRIBE statement in PROC ASTORE to produce basic DS2 scoring code using the EPCODE option. We will then edit the score code in DS2 language syntax to override the scoring decision produced from the gradient boosting model.

proc astore;
    describe rstore=public.astore_gb
        epcode="/viyafiles/jukhar/gb_epcode.sas"; run;

A snapshot of the output from the above code statements are shown below. The analytic store is assigned to a unique string identifier. We also get information about the analytic engine that produced the store (gradient boosting, in this case) and the time when the store was created. In addition, though not shown in the snapshot below, we get a list of the input and output variables used.

Let’s take a look at the DS2 score code (“gb_epcode.sas”) produced by the EPCODE option in the DESCRIBE statement within PROC ASTORE.

data sasep.out;
	 dcl package score sc();
	 dcl double "LOAN";
	 dcl double "MORTDUE";
	 dcl double "DEBTINC";
	 dcl double "VALUE";
	 dcl double "YOJ";
	 dcl double "DEROG";
	 dcl double "DELINQ";
	 dcl double "CLAGE";
	 dcl double "NINQ";
	 dcl double "CLNO";
	 dcl nchar(7) "REASON";
	 dcl nchar(7) "JOB";
	 dcl double "BAD";
	 dcl double "P_BAD1" having label n'Predicted: BAD=1';
	 dcl double "P_BAD0" having label n'Predicted: BAD=0';
	 dcl nchar(32) "I_BAD" having label n'Into: BAD';
	 dcl nchar(4) "_WARN_" having label n'Warnings';
	 Keep 
		 "P_BAD1" 
		 "P_BAD0" 
		 "I_BAD" 
		 "_WARN_" 
		 "BAD" 
		 "LOAN" 
		 "MORTDUE" 
		 "VALUE" 
		 "REASON" 
		 "JOB" 
		 "YOJ" 
		 "DEROG" 
		 "DELINQ" 
		 "CLAGE" 
		 "NINQ" 
		 "CLNO" 
		 "DEBTINC" 
		;
	 varlist allvars[_all_];
	 method init();
		 sc.setvars(allvars);
		 sc.setKey(n'F8E7B0B4B71C8F39D679ECDCC70F6C3533C21BD5');
	 end;
	 method preScoreRecord();
	 end;
	 method postScoreRecord();
	 end;
	 method term();
	 end;
	 method run();
		 set sasep.in;
		 preScoreRecord();
		 sc.scoreRecord();
		 postScoreRecord();
	 end;
 enddata;

The sc.setKey in the method init () method block contains a string identifier for the analytic store; this is the same ASTORE identifier that was previously outputted as part of PROC ASTORE. In order to override the scoring decision created from the original gradient boosting model, we will edit the gb_epcode.sas file (shown above) by inserting new statements in the postScoreRecord method block; the edited file must follow DS2 language syntax. For more information about the DS2 language, see 

method postScoreRecord();
      if YOJ>10 then do; I_BAD_NEW='0'; end; else do; I_BAD_NEW=I_BAD; end;
    end;

Because we are saving the outcome into a new variable called “I_BAD_NEW,” we will need to declare this variable upfront along with the rest of the variables in the score file.

In order for this override to take effect, we will need to run the SCORE statement in PROC ASTORE and provide both the original ASTORE file (astore_gb), as well as the edited DS2 score code (gb_epcode.sas).

proc astore;
    score data=public.hmeq epcode="/viyafiles/jukhar/gb_epcode.sas"
          rstore=public.astore_gb
          out=public.hmeq_new; run;

A comparison of “I_BAD” and “I_BAD_NEW” in the output of the above code for select variables shows that the override rule for scoring has indeed taken place.

In this article we explored how to override the scoring decision produced from a machine learning model in SAS Viya. You will find more information about scoring in the Using PROC ASTORE to override scoring decisions in SAS® Viya® was published on SAS Users.

1月 262018
 

Batch Scripts in SASIf Necessity is the mother of Invention, then, perhaps, the father of Automation is Laziness. Automation is all about convenience, comfort, and productivity. Why do it yourself if you can devise something to do it for you!

In my previous post Running SAS programs in batch under Unix/Linux, we learned that in order to automate SAS jobs submissions, they are often run in batch mode. We also learned that we usually create batch scripts as a convenient way to run SAS programs in batch. To create a unique SAS log file generated with each batch submission, a typical batch script may look like follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

It will allow you to submit your SAS program /sas/code/project1/program1.sas in batch, and also capture SAS log file with a convenient date-time suffix in the same directory.

SAS program to write batch scripts

But what if we are deploying multiple SAS programs? Well, then we would need to create a batch script for each of them. They will all look similar to each other, and that is when most human errors usually occur – when we do something similar, monotonously, over and over again.  Besides, I found working with the Unix Visual Editor (“vi editor”) is not quite a 21st century experience.

What would a normal SAS programmer do in such a situation? That’s right – we would write a SAS program to write a batch script file! Let’s do it. Let’s automate the automation.

In its simplest form, to replicate the above batch script example our SAS program would look like this:

filename b '/sas/code/project1/program1.sh';
 
data _null_;
file b;
input;
put _infile_;
datalines;
#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname
;

Setting up batch file permissions

As we already know from my previous post, we need to assign certain permissions to our batch file in order to make it executable. For example, if you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In order to do that you can to add to your SAS code the following x-statement:

options noxwait;
x 'chmod 750 /sas/code/project1/program1.sh';

Alternatively, you can use %SYSEXEC macro statement (no quoting for the OS command) or SYSTASK statement, or CALL SYSTEM routine (used within a data step).

When you create a batch script by running the above code in SAS Enterprise Guide (EG), you don’t have to leave the comfort of your SAS environment or even touch Unix vi editor. Moreover, you can even submit your SAS job in batch mode right from your SAS EG Program Editor.

However, all of these will work fine, unless XCMD System Option is disabled (NOXCMD).

Assigning batch file permissions when XCMD System Option is disabled

ERROR: Shell escape is not valid in this SAS session.

Bummer! Have you ever seen this error message in SAS Enterprise Guide while trying to run SAS code with the X statement? It indicates that executing OS commands in the SAS environment is not allowed.

In many organizations, IT department policies do not allow enabling the SAS XCMD system option due to cyber-security concerns. This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration.  In this case, no operating system command is allowed to be executed from within SAS.

Of course, this substantially limits SAS’ automation power, but that is the goal and the price to pay for enhanced security.

Still, even without OS command execution at our disposal, we can set Unix script file permissions using FILENAME statement’s PERMISSION= option. Then our above filename statement will look like this:

filename b '/sas/code/project1/program1.sh'
   permission='A::u::rwx,A::g::r-x,A::o::---';

However, it is important to realize that your ability to fully control file permissions via FILENAME statement’s PERMISSION= option is still restricted by the Unix umask value set by your IT system administrator. But usually, it is not overly restrictive, at least for the purpose of creating executable files in the environments I have worked with.

The double benefit of the FILENAME statement’s PERMISSION= option is that it can be used for setting up file permissions in any SAS installation whether the XCMD system option is enabled or disabled.

SAS macro to create batch script files

Let’s wrap all the above SAS code pieces into a SAS macro that writes batch scripts. Here is the macro code definition:

%macro write_shell(code);
   %let fdir = %substr(&code,1,%sysfunc(findc(&code,/,b)));
 
   options dlcreatedir;
   libname _flib "&fdir";
   libname _flib;
 
   %let core = %substr(&code,1,%eval(%length(&code)-4));
   filename _fout "&core..sh" permission='A::u::rwx,A::g::r-x,A::o::---'; 
   data _null_;
      file _fout;
      put
         '#!/bin/sh' //
         'now=$(date +%Y.%m.%d_%H.%M.%S)' /
         "pgmname=""&code""" /
         "logname=""&core._$now.log""" /
         'sas $pgmname -log $logname'
         ;
   run;
   filename _fout;
%mend write_shell;

The single macro parameter (code) represents full path name of your SAS code. And here is a macro invocation example:

 
%write_shell(/sas/code/project1/program1.sas)

The assumption here is that the script file gets created in the same directory as the relevant SAS code and SAS logs for each of the batch runs. It will be assigned the same name as your SAS program, only with the .sh name extension. As you can see, we do some string parsing to derive directory name, script file name and SAS log file name from the single macro parameter representing full path name of your SAS code. As an added bonus, if a specified directory (/sas/code/project1/) does not yet exist, it will be created by this macro. DLCREATEDIR System Option (along with the two subsequent libname statements) are responsible for the directory creation.

If you want to create many script files for your multiple SAS programs, you just invoke the macro as many times. You can even go totally data-driven for mass script file creation.

Do you find this useful?

Please let me know in the comments section below if you find this blog post useful. Thank you for reading! I also invite you to share your ideas and experiences on the topic.

Let SAS write batch scripts for you was published on SAS Users.

1月 252018
 

Most people who work with optimization are familiar with Linear and Integer Programming, to their toolkit they could add Constraint Programming. Constraint Programming is a powerful technique that is used to solve powerful “real-world” problems in a variety of areas, such as, planning, scheduling, DNA Sequencing, computer graphics and natural language processing.

Constraint Programming is a powerful paradigm which can be used by itself or in combination with Integer Programming. In this article, I’ll show you how to implement a simple Constraint Programming example that solves Sudoku puzzles using the CLP functionality in SAS Optimization.

Have you ever wondered after working in a particularly difficult Sudoku puzzle if the puzzle can be solved? Would you like to schedule your child’s little league games like a pro using the Round-Robin tournament format, just like it is done in professional sport leagues?

If so, Constraint Programming is the answer. But what is Constraint Programming?  Let’s start answering this question by reviewing the familiar Linear and Integer Programming formulations and then comparing them with the one for Constraint Programming.

Most people have heard about Linear Programming and Integer Programming, where the typical mathematical structure for an Integer Programming model is:

Max    c1x1+ c2x2+ … + cnxn

Subject to
a11x1 + a12x2+ … + a1nxnb1
….
an1x1+ an2x2+ … + annxnbn

xj  integer for  all j = 1 to n

These equations describe a problem where the goal (or objective) is to maximize a metric that is related to a set of variables (x1, …, xn) to be determined by solving the problem. The goal (or objective) to be maximized could be, for example, profit, amount of food distributed, etc. The set of variables are related to the goal, and in a typical marketing problem would represent marketing campaigns, customer response, channels used to distribute those campaigns, etc. Constraints are the rules that relate the variables to the available resources to solve the problem. In a marketing problem, b1 could represent the available budget, …, bn could represent the capacity of the call center.

When all variables are continuous we have a linear program; when some of the variables must be integers, we have a mixed integer programming problem. Notice that the constraints in the formulation above simply describe a logical relationship among several variables. Because each variable must take an integer value, their domain is the set of integers.

In Constraint Programming the relationships between variables are stated in the form of constraints. Constraints specify the properties of a solution to be found. A key insight for Constraint Programming is to understand that a constraint is simply a logical relationship among several finite unknowns (or variables), each taking a value in a finite domain. A constraint thus restricts the possible values that the variables can simultaneously take, it represents some partial information about the variables of interest.

An example of a scheduling problem described using the Constraint Programming approach is below All tasks relationships are of type “FS” which means “finish-to-start” and can be used to indicate which task precedes another one:

Forall (j in Jobs)

/* Indicates which task precedes another one */
Forall (t in 1..nbTasks-1)

task [j,t]   FS   task[j, t+1];

forall ( j in Jobs)

/* Indicates which tools to be used */

forall ( t in Tasks)

requires task[j,t] = (tool[j,t];

In this scheduling problem, the goal is to find the task sequence for each job while satisfying the constraints on task precedence and tool availability.

More formally, a Constraint Program can be defined using a triple X, D, C, where

  • X = { X1, …, Xn}  is a finite set of variables
  • D = {D1, …, Dn}  is a finite set of domains, where Di is a finite set of possible values that the variable Xi can take. Di is known as the domain of variable Xi
  • C = {C1, …, Cn}  is a finite set of constraints that restrict the values that the variables can simultaneously take.

Constraint solvers find an assignment to the variables that satisfies all the constraints using constraint propagation, backtracking, branch and bound algorithms or local search. There are many specialized resources (books, articles, etc.) that describe these methods.

Many times for complex problems, a hybrid approach is used, that is, an approach that uses Integer Programming, Constraint Programming and Heuristic procedures.

Let’s solve the simple Send More Money and the Sudoku puzzles to make clear the formal Constraint Program formulation given above.

Send More Money Puzzle

The Send More Money puzzle consists of finding unique digits for the letters D, E, M, N, O, R, S, and Y such that S and M are different from zero (no leading zeros) and the following equation is satisfied:

S E N D
+   M O R E

M O N E Y

Step #1: Define the variables:

S, E, N, D, M, O, R, E, Y

Step #2: Define the Domain of those variables

  1. S, E, N, D, M, O, R, E, Y must take integer values between 1 and 9
  2. S can’t be zero
  3. M can’t be zero

Step #2: Define the Domain of those variables

  1. S * 1000 + E * 100 + N * 10 + D + M * 1000 + O * 100 + R * 10 + E =
    10000 * M + O * 1000 + N * 100 + E * 10 + Y
  2. All variables must be different

The unique solution to this problem is

S E N D M O R Y
9 5 6 7 1 0 8 2

 

And can be found using the CLP procedure in SAS Optimization, with this code

proc clp dom=[0,9] 		/* Define the default domain */
out=out; 			/* Name the output data set */
var S E N D M O R Y; 	        /* Declare the variables */
lincon 				/* Linear constraints */
 
/* SEND + MORE = MONEY */
1000*S + 100*E + 10*N + D + 1000*M + 100*O + 10*R + E
=
10000*M + 1000*O + 100*N + 10*E + Y,
 
 
 
S<>0,                           
M<>0;                          /* No leading zeros */
 
alldiff(); 		/* All variables have pairwise
 				   distinct values*/
run;

The Sudoku Puzzle

Step #1: Define your variables.

We are searching for 81 variables that are arranged in a 9×9 matrix, let Cij represent the value of the cell in the ith row and the jth column, where i=1, …, 9 and j=1, …, 9

Step # 2: Define the Domain of those variables

Cij can take any integer value between 1 and 9

Step # 3: Define the Constraints.

  1. For each row i, all values in that row must be different.
  2. For each column j, all values in that column must be different.
  3. For each 3×3 block Bb all values in that block must be different.

If we start with the initial values

Constraint Programming in SAS Optimization

Then the solution is

Constraint Programming in SAS Optimization

This solution can be found using the CLP procedure in SAS Optimization, with this code (note that the initial puzzle is entered in the step data indata and the final solution is nicely printed with the macro printSol).

data indata;
input C1-C9;
datalines;
. . 5 . . 7 . . 1
. 7 . . 9 . . 3 .
. . . 6 . . . . .
. . 3 . . 1 . . 5
. 9 . . 8 . . 2 .
1 . . 2 . . 4 . .
. . 2 . . 6 . . 9
. . . . 4 . . 8 .
8 . . 1 . . 5 . .
;
run;
%macro store_initial_values;
/* store initial values into macro variable C_i_j */
data _null_;
set indata;
 
array C{9};
do j = 1 to 9;
i = _N_;
call symput(compress("C_" ||  put(i,best.)  || "_"  || put(j,best.)),
put(C[j],best.));
end;
run;
%mend store_initial_values;
%store_initial_values;
 
%macro solve;
proc clp out=outdata;
 
%do i = 1 %to 9;
var (X_&i._1-X_&i._9) = [1,9];
alldiff(X_&i._1-X_&i._9);
%end;
 
%do j = 1 %to 9;
alldiff(
%do i = 1 %to 9;
X_&i._&j
%end;
);
%end;
 
%do s = 0 %to 2;
%do t = 0 %to 2;
alldiff(
%do i = 3*&s + 1 %to 3*&s + 3;
%do j = 3*&t + 1 %to 3*&t + 3;
X_&i._&j
%end;
%end;
);
%end;
%end;
 
 
%do i = 1 %to 9;
%do j = 1 %to 9;
%if &&C_&i._&j ne . %then %do;
lincon X_&i._&j = &&C_&i._&j;
%end;
%end;
%end;
run;
%put &_ORCLP_;
%mend solve;
%solve;
 
%macro printSol;
data final (keep= A1 A2 A3 A4 A5 A6 A7 A8 A9);
set outdata;
array A{9};
%do i = 1 %to 9;
%do j = 1 %to 9;
A(&j)=X_&i._&j;
%end;
output ;
%end;
run;
%mend printSol;
%printSol;

Conclusion

Every optimization person could benefit from using Constraint programming. It is a powerful tool, which can be used in hybrid approaches with Integer Programming and heuristic procedures.

Solving Sudoku puzzles using Constraint Programming in SAS Optimization was published on SAS Users.

1月 242018
 

This article and accompanying technical white paper are written to help SAS 9 users process existing SAS 9 code multi-threaded in SAS Viya 3.3.
Read the full paper, Getting Your SAS 9 Code to Run Multi-Threaded in SAS Viya 3.3.

The Future is Multi-threaded Processing Using SAS® Viya®

When I first began researching how to run SAS 9 code as multi-threaded in SAS Viya, I decided to compile all the relevant information and detail the internal processing rules that guide how the SAS Viya Cloud Analytics Services (CAS) server handles code and data. What I found was that there are a simple set of guidelines, which if followed, help ensure that most of your existing SAS 9 code will run multi-threaded. I have stashed a lot of great information into a single whitepaper that is available today called “Getting Your SAS 9 Code to Run Multi-threaded in SAS Viya”.

Starting with the basic distinctions between single and parallel processing is key to understanding why and how some of the parallel processing changes have been implemented. You see, SAS Viya technology constructs code so that everything runs in pooled memory using multiple processors. Redefining SAS for this parallel processing paradigm has led to huge gains in decreasing program run-times, as well as concomitant increases in accuracy for a variety of machine learning techniques. Using SAS Viya products helps revolutionize how we think about undertaking large-scale work because now we can complete so many more tasks in a fraction of the time it took before.

The new SAS Viya products bring a ton of value compared to other choices you might have in the analytics marketplace. Unfortunately most open source libraries and packages, especially those developed for use in Python and R, are limited to single-threading. SAS Viya offers a way forward by coding in these languages using an alternative set of SAS objects that can run as parallel, multi-threaded, distributed processes. The real difference is in the shared memory architecture, which is not the same as parallel, distributed processing that you hear claimed from most Hadoop and cloud vendors. Even though parallel, distributed processing is faster than single-threading, it proverbially hits a performance wall that is far below what pooled and persisted data provides when using multi-threaded techniques and code.

For these reasons, I believe that SAS Viya is the future of data/decision science, with shared memory running against hundreds if not thousands of processors, and returning results back almost instantaneously. And it’s not for just a handful of statistical techniques. I’m talking about running every task in the analytics lifecycle as a multi-threaded process, meaning end-to-end processing through data staging, model development and deployment, all potentially accomplished through a point-and-click interface if you choose. Give SAS Viya a try and you will see what I am talking about. Oh, and don’t forget to read my technical white paper that provides a checklist of all the things that you may need to consider when transitioning your SAS 9 code to run multi-threaded in SAS Viya!

Questions or Comments?

If you have any questions about the details found in this paper, feel free to leave them in the comments field below.

Getting your SAS 9 code to run multi-threaded in SAS Viya 3.3 was published on SAS Users.

1月 232018
 

Is it time to add SAS to the list of "romance" languages?*.

It's no secret that there are enthusiastic SAS programmers who love the SAS language. So it only makes sense that sometimes, these programmers will "nerd out" and express their adoration for fellow humans by using the code that they love. For evidence, check out these contributions from some nerdy would-be Cupids:

Next-level Nerd love: popping the question with SAS

Recently a SAS community member took this to the next level and crafted a marriage proposal using SAS. With help from the SAS community, this troubadour wrote a SAS program that popped the question in a completely new way. Our hero wanted to present a SAS program that was cryptic enough so that a casual glance would not reveal its purpose. But it had to be easy to open and run within SAS University Edition. His girlfriend is a biostatistics student; she's not an expert in SAS, but she does use it for her coursework and research. With over 70 messages exchanged among nearly 20 people, the community came through.

When Brandon's beloved ran his program in SAS University Edition, she was presented with an animated graphic that metaphorically got down on one knee and asked for her hand in marriage. And she said "Yes!" (Or maybe in SAS terms her answer was "RESPONSE=1".)

Simulated version of the proposal created with SAS

A marriage proposal that was helped along by the SAS Support Communities? That's definitely a remarkable event, but it's not completely surprising to us at SAS. SAS users regularly impress us by combining their SAS skills with their creativity. And, with their willingness to help one another.

The SAS Valentine Challenge for YOU

With Valentine's Day approaching, we thought that it would be fun to prepare some Valentine's greetings using SAS. We've started a topic on the SAS Support Communities and we invite you to contribute your programs and ideas. You can use any SAS code or tool that you want: ODS Graphics, SAS/GRAPH, SAS Visual Analytics, even text-based output. Our only request is that it's something that another SAS user could pick up and use. In other words, show your work!

It's not exactly a contest so there are no rules for "winning." Still, we hope that you'll try to one-up one another with your creative contributions. The best entries will get bragging rights and some social media love from SAS and others. And who knows? Maybe we'll have some other surprises along the way. We'll come back and update this post with some of our favorite replies.

Check out the community topic and add your idea. We can't wait to see what you come up with!

* Yes, I know that "romance" languages are named such because they stem from the language spoken by the Romans -- not because they are inherently romantic in the lovey-dovey sense. Still, it's difficult to resist the play on words.

A Valentine challenge: express your <3 with SAS was published on SAS Users.

1月 202018
 

SAS/GRAPH® Annotate FacilityThe

data myanno;                                                                                                                            
  length function color $8;                                                                                                                                                                                                                             
  function='move';                                                                                                                      
    x=0;  y=0;                                                                                                                          
    output;                                                                                                                             
  function='draw';                                                                                                                      
   x=100;  y=100;                                                                                                                       
   color='red';                                                                                                                         
   output;                                                                                                                              
run;                                                                                                                                    
 
proc gplot data=sashelp.cars;                                                                                                           
  plot mpg_highway*cylinders / vaxis=axis1 haxis=axis2 annotate=myanno;                                                                                         
  symbol1 interpol=none value=dot color=blue; 
  axis1 label=(angle=90);                                                                                                               
  axis2 offset=(2,2)pct;                                                                                                                                                                                                          
run;                                                                                                                                    
quit;

 

The following annotate errors are written to the SAS log when I run this code:

NOTE: ERROR DETECTED IN ANNOTATE= DATASET WORK.MYANNO.
NOTE: PROBLEM IN OBSERVATION     2 -
      A CALCULATED COORDINATE LIES OUTSIDE THE VISIBLE AREA           X
      A CALCULATED COORDINATE LIES OUTSIDE THE VISIBLE AREA           Y

 

Here is the resulting graph:

The annotated line is drawn outside the axis area. But why? I defined my X and Y coordinates for the MOVE and DRAW functions correctly, did I not?

The coordinates are defined correctly, but what I did not define is the coordinate system for the annotation. The XSYS and

data myanno;                                                                                                                            
  length function color $8;                                                                                                             
  retain xsys ysys '1';                                                                                                                 
  function='move';                                                                                                                      
    x=0;  y=0;                                                                                                                          
    output;                                                                                                                             
  function='draw';                                                                                                                      
   x=100;  y=100;                                                                                                                       
   color='red';                                                                                                                         
   output;                                                                                                                              
run;                                                                                                                                    
 
proc gplot data=sashelp.cars;                                                                                                           
  plot mpg_highway*cylinders / vaxis=axis1 haxis=axis2 annotate=myanno;                                                                                         
  symbol1 interpol=none value=dot color=blue;
  axis1 label=(angle=90);                                                                                                               
  axis2 offset=(2,2)pct;                                                                                                                                                                                                           
run;                                                                                                                                    
quit;

 

Here is the graph containing the correct line:

Creating Multiple Graphs with an Annotate Data Set, BY-and-BY

The need to generate multiple graphs from one procedure using a BY statement is very common. However, using an Annotate data set with a BY statement can be a little tricky. Here are the general rules for using an Annotate data set with a SAS/GRAPH procedure that creates multiple graphs with a BY statement:

  1. Make sure that the Annotate data set and the input data set for the procedure include the same BY variables. The BY variables must also be the same data type in both data sets.
  2. Both the Annotate data set and the input data set must be sorted by the BY variables.
  3. Include the ANNOTATE= (or ANNO=) option in the action statement of the SAS/GRAPH procedure.

The goal of the following program is to create two graphs using a BY statement in which the annotation is specific to each graph. The Annotate data set draws the maximum MPG_Highway value at the maximum point for each X value.

/* Compute the maximum MPG_Highway values */                                                                                            
proc sort data=sashelp.cars(where=(origin in('USA' 'Europe'))) out=cars;                                                                
  by origin cylinders;                                                                                                                  
run;                                                                                                                                    
 
proc means data=cars noprint;                                                                                                           
  by origin cylinders;                                                                                                                  
  var mpg_highway;                                                                                                                      
  output out=meansout max=max;                                                                                                          
run;                                                                                                                                    
 
data myanno;                                                                                                                            
  length function color text $8;                                                                                                        
  retain xsys ysys '2' color 'black' position '2' size 1.5;                                                                             
  set meansout;                                                                                                                         
 
  function='label';                                                                                                                     
    x=cylinders;  y=max;                                                                                                                
    text=strip(max);                                                                                                                    
    output;                                                                                                                             
run;                                                                                                                                    
 
proc gplot data=cars annotate=myanno;                                                                                                   
  by origin;                                                                                                                            
  plot mpg_highway*cylinders / vaxis=axis1 haxis=axis2;                                                                                 
  symbol1 interpol=none value=dot color=blue;                                                                                           
  axis1 label=(angle=90);                                                                                                               
  axis2 offset=(2,2)pct;                                                                                                                
run;                                                                                                                                    
quit;

 

Here are the resulting graphs:

There are two issues here. First, there should be only one maximum value displayed for each X value. There are duplicate values of the annotated text on each graph. Second, the following messages are written to the SAS log:

NOTE: ERROR DETECTED IN ANNOTATE= DATASET WORK.MYANNO.
NOTE: PROBLEM IN OBSERVATION     1 -
      DATA SYSTEM REQUESTED, BUT VALUE IS NOT ON GRAPH    'Y'
NOTE: PROBLEM IN OBSERVATION     5 -
      DATA SYSTEM REQUESTED, BUT VALUE IS NOT ON GRAPH    'X'
NOTE: The above message was for the following BY group:
      Origin=USA

 

These notes tell me that either the X or the Y coordinate in two of the observations in the Annotate data set do not exist on one of the graphs. This issue occurs because the Annotate coordinates for each of the BY values are different for each graph. The axis ranges are different on the two graphs. So, when all of the annotation, instead of the annotation for only each BY value, is drawn on each graph, some of the Annotate coordinates cannot be found on the graph.

Both of these issues occur because the ANNOTATE=Myanno option is in the PROC GPLOT statement instead of in the action (PLOT) statement. Moving the ANNOTATE=Myanno option to the PLOT statement generates the expected output:

proc gplot data=cars;                                                                                                                   
  by origin;                                                                                                                            
  plot mpg_highway*cylinders / vaxis=axis1 haxis=axis2 annotate=myanno;                                                                 
  symbol1 interpol=none value=dot color=blue;                                                                                           
  axis1 label=(angle=90);                                                                                                               
  axis2 offset=(2,2)pct;                                                                                                                
run;                                                                                                                                    
quit;

 

Off the Grid

Another common issue with using an Annotate data set is when a coordinate in the Annotate data set lies outside the range of an axis on the graph. For example, I will chart the mean MPG_Highway values with the GCHART procedure and draw a symbol at the maximum value for each country of origin using an Annotate data set:

proc sort data=sashelp.cars out=cars;                                                                                                   
  by origin;                                                                                                                            
run;                                                                                                                                    
 
/* Compute the mean and the max */                                                                                                      
proc means data=cars noprint;                                                                                                           
  by origin;                                                                                                                            
  var mpg_highway;                                                                                                                      
  output out=meansout mean=mean max=max;                                                                                                
run;                                                                                                                                    
 
data myanno;                                                                                                                            
  length function color $8 text $14;                                                                                                    
  retain xsys ysys '2' color 'red' position '2' size 2;                                                                                 
  set meansout;                                                                                                                         
 
  function='symbol';                                                                                                                    
    midpoint=origin;  y=max;                                                                                                            
    text='diamondfilled';                                                                                                               
    output;                                                                                                                             
run;                                                                                                                                    
 
proc gchart data=meansout;                                                                                                              
  vbar origin / sumvar=mean annotate=myanno raxis=axis1;                                                                                
  axis1 label=(angle=90);                                                                                                               
run;                                                                                                                                    
quit;

 

When I run this program, the following graph is produced, excluding the annotated symbols:

The following annotate error messages are written to the SAS log:

NOTE: ERROR DETECTED IN ANNOTATE= DATASET WORK.MYANNO.
NOTE: PROBLEM IN OBSERVATION     1 -
      DATA SYSTEM REQUESTED, BUT VALUE IS NOT ON GRAPH    'RESPONSE'
NOTE: PROBLEM IN OBSERVATION     2 -
      DATA SYSTEM REQUESTED, BUT VALUE IS NOT ON GRAPH    'RESPONSE'
NOTE: PROBLEM IN OBSERVATION     3 -
      DATA SYSTEM REQUESTED, BUT VALUE IS NOT ON GRAPH    'RESPONSE'

 

These messages tell me that multiple response values (Y coordinates) in the Annotate data set lie outside the range of the Y axis. The procedure does not automatically extend the Y-axis range to accommodate the annotation, so I need to do this by including the ORDER= option in the AXIS1 statement:

proc gchart data=meansout;                                                                                                              
  vbar origin / sumvar=mean annotate=myanno raxis=axis1;                                                                                
  axis1 label=(angle=90) order=(0 to 70 by 10);                                                                                         
run;                                                                                                                                    
quit;

 

The correct graph is now generated:

Annotation is a useful tool that enables you to draw features on a graph that the graphics procedure might not have the capability to draw. Using an Annotate data set is easier once you understand what the SAS log messages are telling you and can take steps to avoid common issues. Don’t be afraid to dive in!

Happy drawing!

Common annotate pitfalls and how to avoid them was published on SAS Users.

1月 182018
 

One of the most exciting features from the newest release of Visual Data Mining and Machine Learning on SAS Viya is the ability to perform Market Basket Analysis on large amounts of transactional data. Market Basket Analysis allows companies to analyze large transactional files to identify significant relationships between items. While most commonly used by retailers, this technique can be used by any company that has transactional data.

For this example, we will be looking at customer supermarket purchases over the past month. Customer is the Transaction ID; Time is the time of purchase; and Product is the item purchased. The data must be transactional in nature and not aggregated, with one row for each product purchased by each customer.

Market Basket Analysis in SAS Viya

With our data ready, we can now perform the analysis using the MBANALYSIS Procedure. As illustrated below in SAS Studio, by specifying pctsupport=1, we will only look at items, or groups of items, that appear in at least 1% of the transactions. For very large datasets this saves time by only looking at combinations of items that appear frequently. This allows extraction of the most common and most useful relationships.

The MBANALYSIS procedure outputs a list of significant relationships, called Association Rules, by calculating the LIFT metric. A lift greater than one generally indicates that a Rule is significant. By default, each relationship has two items, although this can be changed to include multiple items.

Below is a screenshot of the ten most important rules. The first item in the rule is the “Left Hand Side” and the second item after the arrow is the “Right Hand Side.” For the first rule, we can see that coke and ice cream appear together in 220 transactions and have a lift of 2.37, meaning purchasing Coke makes the purchase of ice cream about twice as likely.

Top 10 Association Rules

While Association Rules above give powerful insights into large transactional datasets, the challenge is exploring these rules visually. One way to do this is by linking the rules together via a Network Diagram. This allows users to see the relationships between multiple rules, and identify the most important items in the network. The following SQL code prepares the data for the Network Diagram.

Network Diagrams plot a set of “Source” values (T1_ITEM), and connects them to a “Target” value (ITEM2). If the source value represents the left hand side of the rule, the corresponding right hand side of the rule is listed as the Target variable. We will use the “Lift” value to link these source and target variables. If the target value is the right hand side of the rule, the target and the lift are missing. This allows us to plot the product, but no linkage will be made.

Now, my data is ready to be visualized as a Network Diagram. Using the following code, I am able to promote my Association Rules, making this dataset available via SAS® Visual Analytics.

Now, I am able to quickly and easily generate my Network Diagram without having to create any code.

Hovering over a node allows me to see specific information about that particular item. Here, we can see that Heineken was purchased in 59.9% of all transactions, which is 600 transactions.

Hovering over the linkage, we can see specific information about the rule. Below, we can see that purchasing artichoke (artichoke) makes the purchase of Heineken about 38% more likely. Many other rules link to Heineken, showing its importance in the network. Business Unit Experts can use this diagram as a starting point to analyze selling strategies to make proper adjustments for the business.

Conclusion

The Market Basket Analysis procedure in Visual Data Mining and Machine Learning on SAS Viya can help retailers quickly scan large transactional files and identify key relationships. These relationships can then be visualized in a Network Diagram to quickly and easily find important relationships in the network, not just a set of rules. As transactional data, whether in-store, online, or in any other form gets bigger, this Market Basket functionality is a must have weapon in the analytical toolkit of any business.

Visualizing the results of a Market Basket Analysis in SAS Viya was published on SAS Users.