analytics

1月 252018
 

Data doesn't always have to be 'big data' to be interesting. For example, I recently ran across a small, but very interesting, database containing all of North Korea's missile tests. The data was a bit difficult to digest in the formats provided, therefore I decided to try my hand at [...]

The post North Korea's 117 missile tests appeared first on SAS Learning Post.

1月 252018
 

Most people who work with optimization are familiar with Linear and Integer Programming, to their toolkit they could add Constraint Programming. Constraint Programming is a powerful technique that is used to solve powerful “real-world” problems in a variety of areas, such as, planning, scheduling, DNA Sequencing, computer graphics and natural language processing.

Constraint Programming is a powerful paradigm which can be used by itself or in combination with Integer Programming. In this article, I’ll show you how to implement a simple Constraint Programming example that solves Sudoku puzzles using the CLP functionality in SAS Optimization.

Have you ever wondered after working in a particularly difficult Sudoku puzzle if the puzzle can be solved? Would you like to schedule your child’s little league games like a pro using the Round-Robin tournament format, just like it is done in professional sport leagues?

If so, Constraint Programming is the answer. But what is Constraint Programming?  Let’s start answering this question by reviewing the familiar Linear and Integer Programming formulations and then comparing them with the one for Constraint Programming.

Most people have heard about Linear Programming and Integer Programming, where the typical mathematical structure for an Integer Programming model is:

Max    c1x1+ c2x2+ … + cnxn

Subject to
a11x1 + a12x2+ … + a1nxnb1
….
an1x1+ an2x2+ … + annxnbn

xj  integer for  all j = 1 to n

These equations describe a problem where the goal (or objective) is to maximize a metric that is related to a set of variables (x1, …, xn) to be determined by solving the problem. The goal (or objective) to be maximized could be, for example, profit, amount of food distributed, etc. The set of variables are related to the goal, and in a typical marketing problem would represent marketing campaigns, customer response, channels used to distribute those campaigns, etc. Constraints are the rules that relate the variables to the available resources to solve the problem. In a marketing problem, b1 could represent the available budget, …, bn could represent the capacity of the call center.

When all variables are continuous we have a linear program; when some of the variables must be integers, we have a mixed integer programming problem. Notice that the constraints in the formulation above simply describe a logical relationship among several variables. Because each variable must take an integer value, their domain is the set of integers.

In Constraint Programming the relationships between variables are stated in the form of constraints. Constraints specify the properties of a solution to be found. A key insight for Constraint Programming is to understand that a constraint is simply a logical relationship among several finite unknowns (or variables), each taking a value in a finite domain. A constraint thus restricts the possible values that the variables can simultaneously take, it represents some partial information about the variables of interest.

An example of a scheduling problem described using the Constraint Programming approach is below All tasks relationships are of type “FS” which means “finish-to-start” and can be used to indicate which task precedes another one:

Forall (j in Jobs)

/* Indicates which task precedes another one */
Forall (t in 1..nbTasks-1)

task [j,t]   FS   task[j, t+1];

forall ( j in Jobs)

/* Indicates which tools to be used */

forall ( t in Tasks)

requires task[j,t] = (tool[j,t];

In this scheduling problem, the goal is to find the task sequence for each job while satisfying the constraints on task precedence and tool availability.

More formally, a Constraint Program can be defined using a triple X, D, C, where

  • X = { X1, …, Xn}  is a finite set of variables
  • D = {D1, …, Dn}  is a finite set of domains, where Di is a finite set of possible values that the variable Xi can take. Di is known as the domain of variable Xi
  • C = {C1, …, Cn}  is a finite set of constraints that restrict the values that the variables can simultaneously take.

Constraint solvers find an assignment to the variables that satisfies all the constraints using constraint propagation, backtracking, branch and bound algorithms or local search. There are many specialized resources (books, articles, etc.) that describe these methods.

Many times for complex problems, a hybrid approach is used, that is, an approach that uses Integer Programming, Constraint Programming and Heuristic procedures.

Let’s solve the simple Send More Money and the Sudoku puzzles to make clear the formal Constraint Program formulation given above.

Send More Money Puzzle

The Send More Money puzzle consists of finding unique digits for the letters D, E, M, N, O, R, S, and Y such that S and M are different from zero (no leading zeros) and the following equation is satisfied:

S E N D
+   M O R E

M O N E Y

Step #1: Define the variables:

S, E, N, D, M, O, R, E, Y

Step #2: Define the Domain of those variables

  1. S, E, N, D, M, O, R, E, Y must take integer values between 1 and 9
  2. S can’t be zero
  3. M can’t be zero

Step #2: Define the Domain of those variables

  1. S * 1000 + E * 100 + N * 10 + D + M * 1000 + O * 100 + R * 10 + E =
    10000 * M + O * 1000 + N * 100 + E * 10 + Y
  2. All variables must be different

The unique solution to this problem is

S E N D M O R Y
9 5 6 7 1 0 8 2

 

And can be found using the CLP procedure in SAS Optimization, with this code

proc clp dom=[0,9] 		/* Define the default domain */
out=out; 			/* Name the output data set */
var S E N D M O R Y; 	        /* Declare the variables */
lincon 				/* Linear constraints */
 
/* SEND + MORE = MONEY */
1000*S + 100*E + 10*N + D + 1000*M + 100*O + 10*R + E
=
10000*M + 1000*O + 100*N + 10*E + Y,
 
 
 
S<>0,                           
M<>0;                          /* No leading zeros */
 
alldiff(); 		/* All variables have pairwise
 				   distinct values*/
run;

The Sudoku Puzzle

Step #1: Define your variables.

We are searching for 81 variables that are arranged in a 9×9 matrix, let Cij represent the value of the cell in the ith row and the jth column, where i=1, …, 9 and j=1, …, 9

Step # 2: Define the Domain of those variables

Cij can take any integer value between 1 and 9

Step # 3: Define the Constraints.

  1. For each row i, all values in that row must be different.
  2. For each column j, all values in that column must be different.
  3. For each 3×3 block Bb all values in that block must be different.

If we start with the initial values

Constraint Programming in SAS Optimization

Then the solution is

Constraint Programming in SAS Optimization

This solution can be found using the CLP procedure in SAS Optimization, with this code (note that the initial puzzle is entered in the step data indata and the final solution is nicely printed with the macro printSol).

data indata;
input C1-C9;
datalines;
. . 5 . . 7 . . 1
. 7 . . 9 . . 3 .
. . . 6 . . . . .
. . 3 . . 1 . . 5
. 9 . . 8 . . 2 .
1 . . 2 . . 4 . .
. . 2 . . 6 . . 9
. . . . 4 . . 8 .
8 . . 1 . . 5 . .
;
run;
%macro store_initial_values;
/* store initial values into macro variable C_i_j */
data _null_;
set indata;
 
array C{9};
do j = 1 to 9;
i = _N_;
call symput(compress("C_" ||  put(i,best.)  || "_"  || put(j,best.)),
put(C[j],best.));
end;
run;
%mend store_initial_values;
%store_initial_values;
 
%macro solve;
proc clp out=outdata;
 
%do i = 1 %to 9;
var (X_&i._1-X_&i._9) = [1,9];
alldiff(X_&i._1-X_&i._9);
%end;
 
%do j = 1 %to 9;
alldiff(
%do i = 1 %to 9;
X_&i._&j
%end;
);
%end;
 
%do s = 0 %to 2;
%do t = 0 %to 2;
alldiff(
%do i = 3*&s + 1 %to 3*&s + 3;
%do j = 3*&t + 1 %to 3*&t + 3;
X_&i._&j
%end;
%end;
);
%end;
%end;
 
 
%do i = 1 %to 9;
%do j = 1 %to 9;
%if &&C_&i._&j ne . %then %do;
lincon X_&i._&j = &&C_&i._&j;
%end;
%end;
%end;
run;
%put &_ORCLP_;
%mend solve;
%solve;
 
%macro printSol;
data final (keep= A1 A2 A3 A4 A5 A6 A7 A8 A9);
set outdata;
array A{9};
%do i = 1 %to 9;
%do j = 1 %to 9;
A(&j)=X_&i._&j;
%end;
output ;
%end;
run;
%mend printSol;
%printSol;

Conclusion

Every optimization person could benefit from using Constraint programming. It is a powerful tool, which can be used in hybrid approaches with Integer Programming and heuristic procedures.

Solving Sudoku puzzles using Constraint Programming in SAS Optimization was published on SAS Users.

1月 222018
 

Amazon recently announced the 20 finalists in their search for a location to house their new 2nd headquarters (nicknamed 'hq2'). They showed a map on their web page, but I wanted to create my own map, with a few improvements ... Here's their official map. It's not terrible, but it [...]

The post Here are the Amazon HQ2 finalists! appeared first on SAS Learning Post.

1月 182018
 

One of the most exciting features from the newest release of Visual Data Mining and Machine Learning on SAS Viya is the ability to perform Market Basket Analysis on large amounts of transactional data. Market Basket Analysis allows companies to analyze large transactional files to identify significant relationships between items. While most commonly used by retailers, this technique can be used by any company that has transactional data.

For this example, we will be looking at customer supermarket purchases over the past month. Customer is the Transaction ID; Time is the time of purchase; and Product is the item purchased. The data must be transactional in nature and not aggregated, with one row for each product purchased by each customer.

Market Basket Analysis in SAS Viya

With our data ready, we can now perform the analysis using the MBANALYSIS Procedure. As illustrated below in SAS Studio, by specifying pctsupport=1, we will only look at items, or groups of items, that appear in at least 1% of the transactions. For very large datasets this saves time by only looking at combinations of items that appear frequently. This allows extraction of the most common and most useful relationships.

The MBANALYSIS procedure outputs a list of significant relationships, called Association Rules, by calculating the LIFT metric. A lift greater than one generally indicates that a Rule is significant. By default, each relationship has two items, although this can be changed to include multiple items.

Below is a screenshot of the ten most important rules. The first item in the rule is the “Left Hand Side” and the second item after the arrow is the “Right Hand Side.” For the first rule, we can see that coke and ice cream appear together in 220 transactions and have a lift of 2.37, meaning purchasing Coke makes the purchase of ice cream about twice as likely.

Top 10 Association Rules

While Association Rules above give powerful insights into large transactional datasets, the challenge is exploring these rules visually. One way to do this is by linking the rules together via a Network Diagram. This allows users to see the relationships between multiple rules, and identify the most important items in the network. The following SQL code prepares the data for the Network Diagram.

Network Diagrams plot a set of “Source” values (T1_ITEM), and connects them to a “Target” value (ITEM2). If the source value represents the left hand side of the rule, the corresponding right hand side of the rule is listed as the Target variable. We will use the “Lift” value to link these source and target variables. If the target value is the right hand side of the rule, the target and the lift are missing. This allows us to plot the product, but no linkage will be made.

Now, my data is ready to be visualized as a Network Diagram. Using the following code, I am able to promote my Association Rules, making this dataset available via SAS® Visual Analytics.

Now, I am able to quickly and easily generate my Network Diagram without having to create any code.

Hovering over a node allows me to see specific information about that particular item. Here, we can see that Heineken was purchased in 59.9% of all transactions, which is 600 transactions.

Hovering over the linkage, we can see specific information about the rule. Below, we can see that purchasing artichoke (artichoke) makes the purchase of Heineken about 38% more likely. Many other rules link to Heineken, showing its importance in the network. Business Unit Experts can use this diagram as a starting point to analyze selling strategies to make proper adjustments for the business.

Conclusion

The Market Basket Analysis procedure in Visual Data Mining and Machine Learning on SAS Viya can help retailers quickly scan large transactional files and identify key relationships. These relationships can then be visualized in a Network Diagram to quickly and easily find important relationships in the network, not just a set of rules. As transactional data, whether in-store, online, or in any other form gets bigger, this Market Basket functionality is a must have weapon in the analytical toolkit of any business.

Visualizing the results of a Market Basket Analysis in SAS Viya was published on SAS Users.

1月 132018
 

Former U.S. Chief Technology officer Megan Smith stressed the importance of continued investments in science, technology, engineering and mathematics (STEM) in her keynote address at SAS Analytics Experience, sharing a quote from George Washington. In his first address to Congress, in 1790, Washington said, “There is nothing which can better [...]

Former US Chief Technology Officer encourages STEM diversity was published on SAS Voices by Paula Henderson