9月 202021
 

A recent report suggests that the current state of climate change is alarming. Climate change puts billions of people at risk of events like extreme hurricane seasons and rising sea levels. However, data and analytics are playing a critical role in informing us about the situation, planning what’s ahead, and [...]

How analytics is helping with climate sustainability was published on SAS Voices by Caslee Sims

9月 202021
 

The SAS/IML language supports lists, which are containers that store other objects, such as matrices and other lists. A primary use of lists is to pack objects of various types into a single symbol that can be passed to and from modules. A useful feature of using lists is that lists can dynamically grow or shrink as necessary. You can use the ListAddItem subroutine to add a new item to an existing list, which can be convenient if you do not know until run time how many items the list needs to hold.

A SAS/IML programmer asked whether it is possible to add items to a sublist of a list. The answer is yes, and there is even an option to make the process efficient. Intuitively, you need to copy the sublist, modify the list, and then replace the sublist in the list. Because copying a list can be an expensive operation, the SAS/IML language provides a special option to move the list items into new symbols without copying. The move operator assigns the memory to a new symbol and "nulls out" the previous symbol.

Suppose you have a SAS/IML list, L, that contains sublists. Suppose you want to add a new item to the first sublist (L$1). You can do the following:

  • Get the sublist by using the ListGetSubItem function. For example, the syntax Y = ListGetSubItem(L, 1, 'm'); gets the first sublist and assigns it to the Y symbol. Using 'm' for the third argument means "move" the memory to the new symbol. After this call, L$1 is empty and the symbol Y points to the sublist that was formerly contained in L$1.
  • The symbol Y is a list, so use ListAddItem to add a new item to Y.
  • Now set the first item of L. You can use call ListSetSubItem(L, 1, Y, 'm'); to assign L$1 to be the value of Y. Because you use the 'm' option, the memory is moved, not copied. After this call, Y is an empty symbol. The first item of L contains the list that was formerly in Y.

The schematic diagram above illustrates the process. The following SAS/IML program shows how to implement the technique:

 
proc iml;
/* create list of lists. Specify L$1$1, L$1$2, L$2$1, and L$2$2 */
L = ListCreate(2);
L$1 = ['L11', 12];    /* first  item of L is a list with two items */
L$2 = ['L21', 22];    /* second item of L is a list with two items */
 
/* Now add a third item to L$1 and access it as L$1$3 */
newVal = 23;
Y = ListGetSubItem(L, 1, 'm');     /* get the sublist */
call ListAddItem(Y, newVal);       /* add new item */
call ListSetSubItem(L, 1, Y, 'm'); /* set it back in same position */
 
/* check that L$1$3 exists and that S is empty */
package load ListUtil;
call struct(L);
 
L13 = L$1$3;
dimY = dimension(Y);
print L13, dimY;

Summary

A useful feature of SAS/IML lists is that a list can dynamically grow. Lists also include an option that moves (rather than copies) memory to and from list items. You can use this feature to efficiently add items to a sublist of a list. For clarity, I used three separate function calls in the example program. If you plan to use this feature repeatedly, you can encapsulate the operations into a SAS/IML user-defined function.

The post Add an item to a sublist appeared first on The DO Loop.

9月 152021
 

In this Q&A, Iain Brown, SAS’s head of data science for the United Kingdom and Ireland, discusses technical readiness for AI, customer adoption trends, IT’s changing role, and mission-critical considerations for technology and talent. Q: What does it mean, from both a technology and a cultural standpoint, to be ready [...]

Technical readiness for Artificial Intelligence: A Q&A with Iain Brown was published on SAS Voices by Kimberly Nevala

9月 152021
 

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's dive into exploring your data by viewing the number of distinct and missing values that exist in each column using the simple.distinct CAS action.

In this example, I will use the CAS procedure to execute the distinct action. Be aware, instead of using the CAS procedure, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax from other languages.

Determine the Number of Distinct and Missing Values in a CAS Table

To begin, let's use the simple.distinct CAS action on the CARS in-memory table to view the action's default behavior.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"};
quit;

In the preceeding code, I specify the CAS procedure, the action, then reference the in-memory table. The results of the call are displayed below.

The results allow us to quickly explore the CAS table and see the number of distinct and missing values. That's great, but what if you only want to see specific columns?

Specify the Columns in the Distinct Action

Sometimes your CAS tables contain hundreds of columns, but you are only interested in a select few. With the distinct action, you can specify a subset of columns using the inputs parameter. Here I'll specify the Make, Origin and Type columns.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        inputs={"Make","Origin","Type"};
quit;

After executing the code the results return the information for only the Make, Origin and Type columns.

Next, let's explore what we can do with the results.

Create a CAS Table with the Results

Some actions allow you to create a CAS table with the results. You might want to do this for a variety of reasons like use the new CAS table in a SAS Visual Analytics dashboard or in a data visualization procedure like SGPLOT.

To create a CAS table with the distinct action result, add the casOut parameter and specify new CAS table information, like name and caslib.

proc cas;
    simple.distinct /
        table={name="cars", caslib="casuser"},
        casOut={name="distinctCars", caslib="casuser"};
quit;

After executing the code, the action returns information about the name and caslib of the new CAS table, and the number of rows and columns.

Visualize the Number of Distinct Values in Every Column

Lastly, what if you want to create a data visualization to better explore the table? Maybe you want to visualize the number of distinct values for each column? This task can be accomplished with variety of methods. However, since I know my newly created distinctCars CAS table has only 15 rows, I'll reference the CAS table directly using SGPLOT procedure.

This method works as long as the LIBNAME statement references your caslib correctly. I recommend this method when you know the CAS table is a manageable size. This is important because the CAS server does not execute the SGPLOT procedure on a distributed CAS table. The CAS server instead transfers the entire CAS table back to the client for processing.

To begin, the following LIBNAME statement will reference the casuser caslib.

libname casuser cas caslib="casuser";

Once the LIBNAME statement is correct, all you need to do is specify the CAS table in the DATA option of the SGPLOT procedure.

title justify=left height=14pt "Number of Distinct Values for Each Column in the CARS Table";
proc sgplot data=casuser.distinctCars
            noborder nowall;
    vbar Column / 
        response=NDistinct
        categoryorder=respdesc
        nooutline
        fillattrs=(color=cx0379cd);
    yaxis display=(NOLABEL);
    xaxis display=(NOLABEL);
quit;

The results show a bar chart with the number of distinct values for each column.

Summary

The simple.distinct CAS action is an easy way to explore a distributed CAS table. With one simple action, you can easily see how many distinct values are in each column, and the number of missing rows!

In Part 2 of this post, I'll further explore the simple.distinct CAS action and offer more ideas on how to interpret and use the results.

Additional Resources

distinct CAS action
SAS® Cloud Analytic Services: Fundamentals
Plotting a Cloud Analytic Services (CAS) In-Memory Table
Getting started with SGPLOT - Index
Code

CAS-Action! Simply Distinct - Part 1 was published on SAS Users.

9月 152021
 

I previously wrote about one way to solve the partition problem in SAS. In the partition problem, you divide (or partition) a set of N items into two groups of size k and N-k such that the sum of the items' weights is the same in each group. For example, if the weights of six items are X = {0.4, 1.0, 1.2, 1.7, 2.6, 2.7} and k=3, you can put the weights {0.4, 1.7, 2.7} in one group and the weights {1.0, 1.2, 2.6} in the other group. Both groups contain 4.8 units of weight.

The previous article discussed a brute-force approach in which you explicitly generate all combinations of k items from the set of N items. The program could produce all possible solutions, provided that a solution exists. This article recasts the partition problem as an optimization problem and shows two ways to solve it:

  • A feasibility problem: Define the problem as a set of constraints. Find one or more solutions that satisfy the constraints.
  • An optimization problem: Find a partition that minimizes the difference between the weights in the two groups. When there is a solution, this method finds it. If there is not a solution, this method finds partitions that distribute the weight as equally as possible.

This article shows how to use PROC OPTMODEL in SAS/OR software. In SAS Viya, you can submit the same statements by using the OPTMODEL procedure in SAS Optimization or the runOptModel action. I thank my colleague, Rob Pratt, who wrote the programs in this article and also kindly reviewed both articles about the partition problem.

A feasible set of binary vectors

The previous article formulated the problem in terms of a binary vector. I defined an N-dimensional vector, c, that contained k elements with the value -1 and N-k elements with the value +1. The two values of the vector c indicate whether an item belongs to the first group or the second group. If X is a solution (an ordering of the weights that satisfied the partition problem), then the inner product X`*c=0.

In this article, it is more convenient to define the two groups by using a 0/1 binary indicator vector, Y. With this definition of Y, a solution satisfies the equation X`*Y = X`*(1-Y). The equation specifies that the sum of weights in the '1' group equals the sum of weights in the '0' group.

To define the feasible set of Y vectors that solve the partition problem, you can define the following two constraints:

  1. 1`*Y = k, where 1 is the vector of 1s. This constraint requires that there be exactly k values of Y that are equal to 1.
  2. X`*Y = X`*(1-Y). This constraint means that the sum of the k values of X in the first set equals the sum of the N-k values of X in the second set.

The OPTMODEL procedure enables you to specify the problem and the constraints in a natural syntax. Instead of using vector notation, as I did above, it uses summations and indices. The following statements define a set that has 11 items that are to be partitioned into two groups that contain 5 and 6 items, respectively. This program returns an arbitrary feasible point and prints the solution:

/* feasibility problem: enforce equal group weights */
proc optmodel;
set ITEMS = 1..11;
num weight {ITEMS} = [0.8, 1.2, 1.3, 1.4, 1.6, 1.8, 1.9, 2.0, 2.2, 2.3, 2.5];
num k = 5;
 
/* Y[i] = 1 if item i is in group 1; Y[i] = 0 if item i is in group 2 */
var Y {ITEMS} binary;
 
/* group 1 must contain exactly k items */
con NumItemsInGroup1:            /* 1`*Y = k */
   sum{i in ITEMS} Y[i] = k;
 
/* constraint: each group must contain the same weight */
con SameWeightInEachGroup:      /* weight`*Y = weight`*(1-Y) */
   sum{i in ITEMS} weight[i] * Y[i] = sum{i in ITEMS} weight[i] * (1 - Y[i]);
 
/* find one feasible solution */
solve noobj;        /* default solver is MILP */
print Y weight;

As shown in the output, this solution partitions the weights {1.6, 1.8, 1.9, 2.0, 2.2} into one group and the other six weights into another group.

The solution is shown as a column vector. Because we will want to display multiple solutions, it is convenient to transpose the solution (or set of solutions) so that each solution is a row vector in a matrix. It is also helpful to standardize the solution vector so that the left-most k elements contain the items in one group, and the right-most N-k elements contain the elements in the other group. Fortunately, PROC OPTMODEL provides programming statements and the automatic variable _NSOL_, which tells us how many solutions were found. The following macro code fills a matrix that has _NSOL_ rows and N=11 columns. It then puts one group of weights into the first k columns and the other group in the last N-k columns. Because PROC OPTMODEL is an interactive procedure, it does not stop running until you specify a QUIT statement. Therefore, the procedure is still running, and you can submit the following statements:

/* Macro to print table with one row per solution. Put the group 
   with k items on the left and the group with N-k items on the right. */
%macro printSolutionTable;
for {s in 1.._NSOL_} do;
   leftCount  = 0;
   rightCount = 0;
   for {i in ITEMS} do;
      if Y[i].sol[s] > 0.5 then do;
         leftCount = leftCount + 1;
         soln[s,leftCount] = weight[i];
      end;
      else do;
         rightCount = rightCount + 1;
         soln[s,k+rightCount] = weight[i];
      end;
   end;
end;
print soln;
%mend printSolutionTable;
 
/* define the matrix soln and the index variables */
num soln {1.._NSOL_, 1..card(ITEMS)};
num leftCount, rightCount;
%printSolutionTable;

Find all feasible solutions to the partition problem

The previous section found one solution to the partition problem. If you want to find additional solutions, you can add options to the SOLVE statement. The options depend on the solver that you use. The previous SOLVE statement used the MILP solver. You can obtain additional solutions by using the following options:

/* find up to 100 feasible solutions */
solve noobj with milp / maxpoolsols=100 soltype=best;
put _NSOL_=;
%printSolutionTable;

By the way, you can also use constraint logic programming (CLP) to solve this problem. The CLP solver supports the FINDALLSOLNS option, which you can specify as follows:

/* an alternate way to find all feasible solutions */
solve with clp / findallsolns;

Both methods find all 13 solutions that were found in the previous article.

Minimize the absolute deviations between groups

In the previous sections, the second constraint ensures that the difference between the weights in the two groups is exactly zero. However, you might have data that cannot be evenly divided. In that case, you might want to find a partition that minimizes the absolute deviance between the two groups. You can do that by using the second equation as an objective function rather than as a constraint.

Note that you must minimize the ABSOLUTE VALUE of the difference between the weights in the two groups. If you minimize the difference, the optimization will put the smallest weights in one group and the largest in the other.

The absolute value of a linear function is not linear. However, you can use a standard trick to rewrite the objective function as a linear function subject to two linear constraints. Let x be the quantity to minimize. Recall that the absolute value |x| is defined by x when x≥0 and by -x when x≤0. Introduce a new variable, z, subject to the linear constraints that z ≥ x and z ≥ -x. The graph at the right shows the geometry of this choice. Minimizing the constrained variable z is equivalent to minimizing |x|, but it can be done by using linear programming techniques. For more information about handling absolute values in the objective function, see this "open textbook" from Cornell University.

By using this trick, you can formulate the partition problem as an optimization that minimizes the absolute deviation between the sum of weights in the two groups:

/* optimization problem: minimize absolute difference of group weights */
proc optmodel;
set ITEMS = 1..11;
num weight {ITEMS} = [0.8, 1.2, 1.3, 1.4, 1.6, 1.8, 1.9, 2.0, 2.2, 2.3, 2.5];
num k = 5;
 
/* Y[i] = 1 if item i is in group 1; Y[i] = 0 if item i is in group 2 */
var Y {ITEMS} binary;
 
/* group 1 must contain exactly k items */
con NumItemsInGroup1:
   sum{i in ITEMS} Y[i] = k;
 
/* minimize absolute difference of group weights */
/* Use a standard trick to handle absolute values in an objective function:
   z=|L*y| is z = {  L*y if L*y >= 0
                  { -L*y if L*y <  0
*/
var AbsValue;
min GroupWeightDifference = AbsValue;
 
con Linearize1:
   AbsValue >=  sum{i in ITEMS} weight[i] * Y[i] - sum{i in ITEMS} weight[i] * (1 - Y[i]);
con Linearize2:
   AbsValue >= -sum{i in ITEMS} weight[i] * Y[i] + sum{i in ITEMS} weight[i] * (1 - Y[i]);
 
/* find up to 100 optimal solutions */
solve with milp / maxpoolsols=100 soltype=best presolver=none;
put _NSOL_=;
 
num soln {1.._NSOL_, 1..card(ITEMS)};
num leftCount, rightCount;
%printSolutionTable;

This optimization also finds the 13 solutions, although the rows are displayed in a different order. The output is not shown.

An advantage of this formulation is that this technique enables you to handle the case where an exact solution is not possible. For example, if you change the smallest weight from 0.8 to 0.85, it is no longer possible to partition the weights evenly. However, the optimization continues to find 13 solutions, only now each solution defines two groups in which the weights differ by 0.05, which is the smallest possible difference.

Easier syntax in SAS Viya

The trick that handles an absolute value in an objective function is one example of a general method that can "linearize" several types of problems. In SAS Viya, there is an automatic way to perform "linearization" for these standard problems. In SAS Optimization software, PROC OPTMODEL supports the LINEARIZE option, beginning in SAS Viya 2020.1.1 (December 2020). Thus, in Viya you can solve the partition problem by using a simpler syntax:

/* show the LINEARIZE option in SAS Viya 2020.1.1 */
min GroupWeightDifference =
   abs( sum{i in ITEMS} weight[i] * Y[i] - sum{i in ITEMS} weight[i] * (1 - Y[i]) );
solve with milp LINEARIZE / maxpoolsols=100 soltype=best;

Summary

This article shows some optimization techniques for solving the partition problem on N items. You can use PROC OPTMODEL in SAS to solve the partition problem in two ways. If the partition problem has a solution, you can formulate the problem as a feasibility problem. Alternatively, you can minimize the absolute deviations between the two groups. This second approach will also find approximate partitions when an exact solution is not possible.

This article also shows a standard trick that enables you to optimize the absolute value of a linear objective function. The trick is to replace the absolute value by using two linear constraints. In SAS Viya, the LINEARIZE option implements the trick automatically.

The post The partition problem: An optimization approach appeared first on The DO Loop.

9月 142021
 

As organizations increasingly use artificial intelligence to collect and analyze data and identify individuals, the topic of ethical AI often rears its head. Last year, Michigan's Integrated Data Automated System flagged over 540,000 claims as possibly fraudulent. Thousands of state residents’ accounts were inaccurately flagged, making it almost impossible to [...]

SAS helps lead ethical AI discussion at Canadian social coding event was published on SAS Voices by Alex Coop

9月 132021
 

The Day of the Programmer is not enough time to celebrate our favorite code-creators. That’s why at SAS, we celebrate an entire week with SAS Programmer Week! If you want to extend the fun and learning of SAS Programmer Week year-round, SAS Press is here to support you with books for programmers at every level.

2021 has been a big year for learning, so we wanted to share the six most popular books for programmers this year. There are some old favorites on this list as well as some brand-new books on a variety of topics. Check out the list below, and see what your fellow programmers are reading this year!

  1. Little SAS Book: A Primer, Sixth Edition

This book is at the top of almost every list of recommended books for anyone who wants to learn SAS. And for good reason! It breaks down the basics of SAS into easy-to-understand chunks with tons of practice questions. If you are new to SAS or are interested in getting your basic certification, this is the book for you.

  1. Learning SAS by Example: A Programmer’s Guide, Second Edition

Whether you are learning SAS for the first time or just need a quick refresher on a single topic, this book is well-organized so that you can read start to finish or skip to your topic of interest. Filled with real-world examples, this is a book that should be on every SAS programmer’s bookshelf!

  1. Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS

If you work with big data, then you probably work with a lot of text. The third book on our list is for anyone who handles unstructured data. This book focuses on practical solutions to real-life problems. You’ll learn how to collect, cleanse, organize, categorize, explore, analyze, and interpret your data.

  1. End-to-End Data Science with SAS: A Hands-On Programming Guide

This book offers a step-by-step explanation of how to create machine learning models for any industry. If you want to learn how to think like a data scientist, wrangle messy code, choose a model, and evaluate models in SAS, then this book has the information that you need to be a successful data scientist.

  1. Cody's Data Cleaning Techniques Using SAS, Third Edition

Every programmer knows that garbage in = garbage out. Take out the trash with this indispensable guide to cleaning your data. You’ll learn how to find and correct errors and develop techniques for correcting data errors.

  1. SAS Graphics for Clinical Trials by Example

If you are a programmer who works in the health care and life sciences industry and want to create visually appealing graphs using SAS, then this book is designed specifically for you. You’ll learn how to create a wide range of graphs using Graph Template Language (GTL) and statistical graphics procedures to solve even the most challenging clinical graph problems.

An honorable mention also goes to the SAS Certification Guides. They are a great way to study for the certification exams for the SAS Certified Specialist: Base Programming and SAS Certified Professional: Advanced Programming credentials.

We have many books available to support you as you develop your programming skills – and some of them are free! Browse all our available titles today.

Top Books for SAS Programmers was published on SAS Users.

9月 132021
 

Photograph by Poussin Jean, license CC BY-SA 3.0, via Wikimedia Commons

The partition problem has many variations, but recently I encountered it as an interactive puzzle on a computer. (Try a similar game yourself!) The player is presented with an old-fashioned pan-balance scale and a set of objects of different weights. The challenge is to divide (or partition) the objects into two group. You put one group of weights on one side of the scale and the remaining group on the other side so that the scale balances.

Here's a canonical example of the partition problem for two groups. The weights of six items are X = {0.4, 1.0, 1.2, 1.7, 2.6, 2.7}. Divide the objects into two groups of three items so that each group contains half the weight, which is 4.8 for this example. Give it a try! I'll give a solution in the next section.

As is often the case, there are at least two ways to solve this problem: the brute-force approach and an optimization method that minimizes the difference in weights between the two groups. One advantage of the brute-force approach is that it is guaranteed to find all solutions. However, the brute-force method quickly becomes impractical as the number of items increases.

This article considers brute-force solutions. A more elegant solution will be discussed in a future article.

Permutations: A brute-force approach

Let's assume that the problem specifies the number of items that should be placed in each group. One way to specify groups is to use a vector of ±1 values to encode the group to which each item belongs. For example, if there are six items, the vector c = {-1, -1, -1, +1, +1, +1} indicates that the first three items belong to one group and the last three items belong to the other group.

One way to solve the partition problem for two groups is to consider all permutations of the items. For example, Y = {0.4, 1.7, 2.7, 1.0, 1.2, 2.6} is a permutation of the six weights in the previous section. The vector c = {-1, -1, -1, +1, +1, +1} indicates that the items {0.4, 1.7, 2.7} belong to one group and the items {1.0, 1.2, 2.6} belong to the other group. Both groups have 4.8 units of weight, so Y is a solution to the partition problem.

Notice that the inner product Y`*c = 0 for this permutation. Because c is a vector of ±1 values, the inner product is the difference between the sum of weights in the first group and the sum of weights in the second group. An inner product of 0 means that the sums are the same in both groups.

A program to generate all partitions

Let's see how you can use the ALLPERM function in SAS/IML to solve the two-group partition problem. Since the number of permutations grows very fast, let's use an example that contains only four items. The weights of the four items are X = {1.2, 1.7, 2.6, 3.1}. We want two items in each group, so define c = {-1, -1, 1, 1}. We search for a permutation Y = π(X), such that Y`*c = 0. The following SAS/IML program generates ALL permutations of the integers {1,2,3,4}. For each permutation, the sum of the first two weights is compared to the sum of the last two weights. The permutations for which Y`*c = 0 are solutions to the partition problem.

proc iml;
X = {1.2, 1.7, 2.6, 3.1};
c = { -1,  -1,   1,   1};  /* k = 2 */
 
/* Brute Force Method 1: Generate all permutations of items */
N = nrow(X);
P = allperm(N);            /* all permutations of 1:N */
Y = shape( X[P], nrow(P), ncol(P) );   /* reshape to N x N! matrix */
z = Y * c;                 /* want to find z=0 (or could minimize z) */
idx = loc(abs(z) < 1E-8);  /* indices where z=0 in finite precision */
Soln = Y[idx, ];           /* return all solutions; each row is solution */

This program generates P, a matrix whose rows contain all permutations of four elements. The matrix Y is a matrix where each row is a permutation of the weights. Therefore, Y*c is the vector of all differences. When a difference is zero, the two groups contain the same weights. The following statements count how many solutions are found and print the first solution:

numSolns = nrow(soln);
s = soln[1,];
print numSolns, s[c={'L1' 'L2' 'R1' 'R2'}];

There are a total of eight solutions. One solution is to put the weights {1.2, 3.1} in one group and the weights {1.7, 2.6} in the other group. What are the other solutions? For this set of weights, the other solutions are trivial variations of the first solution. The following statement prints all the solutions:

print soln[c={'L1' 'L2' 'R1' 'R2'}];

I have augmented the output so that it is easier to see the structure. In the first four rows, the values {1.2, 3.1} are in the first group and the values {1.7, 2.6} are in the second group. In the last four rows, the values switch groups. Thus, this method, which is based on generating all permutations, generates a lot of solutions that are qualitatively the same, in practice.

Combinations: Another brute-force approach

The all-permutation method generates N! possible partitions and, as we have seen, not all the partitions are qualitatively different. Thus, using all permutations is inefficient. A more efficient (but still brute-force) method is to use combinations instead of permutations. Combinations are essentially a sorted version of a permutation. The values {1, 2, 3}, {2, 1, 3}, and {3, 2, 1} are different permutations, whereas there is only one combination ({1, 2, 3}) that contains these three numbers. If there are six items and you want three items in each group, there are 6! = 720 permutations to consider, but only "6 choose 3" = 20 combinations.

The following SAS/IML function uses combinations to implement a brute-force solution of the two-group partition problem. The Partition2_Comb function takes a vector of item weights and a vector (nPart) that contains the number of items that you want in the first and second groups. If you want k items in the first group, the ALLCOMB function creates the complete set of combinations of k indices. The SETDIFF function computes the complementary set of indices. For example, if there are six items and {1, 4, 5} is a set of indices, then {2, 3, 6} is the complementary set of indices. After the various combinations are defined, the equation Y*c = 0 is used to find solutions, if any exist, where the first k elements of c are -1 and the last N-k elements are +1.

/* Brute Force Method 2: Generate all combination of size k, N-k */
start Partition2_Comb(_x, k, tol=1e-8);
   x = colvec(_x);
   N = nrow(x);
   call sort(x);                 /* Optional: standardize the output */
 
   c = j(k, 1, -1) // j(N-k, 1, 1); /* construct +/-1 vector */
   L = allcomb(N, k);            /* "N choose k" possible candidates in "left" group */
   R = j(nrow(L), N-k, .);
   do i = 1 to nrow(L);
      R[i,] = setdif(1:N, L[i,]);  /* complement indices in "right" group */
   end;
   P = L || R;                   /* combine the left and right indices */
 
   Y = shape( X[P], nrow(P) );   /* reshape X[P] into an N x (N choose k) matrix */
   z = Y * c;                    /* want to find z=0 (or could minimize z) */
   solnIdx = loc(abs(z) < tol);  /* indices where z=0 in finite precision */
   if ncol(solnIdx) = 0 then 
      soln = j(1, N, .);         /* no solution */
   else
      soln = Y[solnIdx, ];       /* each row is solution */
   return soln;                  /* return all solutions */
finish;
 
/* test the function on a set that has 11 items, partitioned into groups of 5 and 6 */
x = {0.8, 1.2, 1.3, 1.4, 1.6, 1.8, 1.9, 2.0, 2.2, 2.3, 2.5}; 
soln = Partition2_Comb(x, 5);
numSolns = nrow(soln);
print numSolns, soln[c=(('L1':'L5') || ('R1':'R6'))];

The output shows 13 solutions for a set of 11 items that are partitioned into two groups, one with five items and the other with 11-5=6 items. For the solution, each partition has 9.5 units of weight.

Summary

This article shows some SAS/IML programming techniques for using a brute-force method to solve the partition problem on N items. In the partition problem, you split items into two groups that have k and N-k items so that the weight of the items in each group is equal. This article introduced the solution in terms of all permutations of the items, but implemented a solution in terms of all combinations, which eliminates redundant orderings of the items.

In many applications, you don't need to find all solutions, you just need any solution. A subsequent article will discuss formulating the partition problem as an optimization that seeks to minimize the difference between the weights of the two groups.

The post The partition problem appeared first on The DO Loop.

9月 102021
 

As its thousands of users know, SAS Analytics Pro consists of three core elements of the SAS system: Base SAS®, SAS/GRAPH® and SAS/STAT®. It provides the fundamental capabilities of data handling, data visualization, and statistical analysis either through coding or through the SAS Studio interface. For many years, SAS Analytics Pro has been deployed on-site as the entry-level workhorse to the SAS system.

Now, SAS Analytics Pro includes a new option for containerized cloud-native deployment. In addition, the containerized option comes with the full selection of SAS/ACCESS engines making it even easier to work with data from virtually any source. For organizations considering the move to the cloud, or those already there, SAS Analytics Pro provides an exciting new option for cloud deployment.

What is SAS Analytics Pro?

SAS Analytics Pro is an easy-to-use, yet powerful package for accessing, manipulating, analyzing and presenting information. It lets organizations improve productivity with all the tools and methods needed for desktop data analysis – in one package.

  • Organizations can get analysis, reporting and easy-to-understand visualizations from one vendor. Rather than having to piece together niche software packages from different vendors, this consolidated portfolio reduces the cost of licensing, maintenance, training and support – while ensuring that consistent information is available across your enterprise.
  • Innovative statistical techniques are provided with procedures constantly being updated to reflect the latest advances in methodology. Organizations around the world rely on SAS to provide accurate answers to data questions, along with unsurpassed technical support.
  • SAS software integrates into virtually any computing environment, unifying your computing efforts to get a single view of your data, and freeing analysts to focus on analysis rather than data issues.
  • Easily build analytical-style graphs, maps and charts with virtually any style of output that is needed, so you can deliver analytic results where they’re needed most

Why data scientists should care about cloud and containers?

With the new containerized cloud-native deployment option for SAS Analytics Pro, this raises the question of why data scientists should care about cloud and containers? This question was addressed in a SAS Users blog post by SAS R&D director Brent Laster.

This post characterizes a container as a “self-contained environment with all the programs, configuration, initial data, and other supporting pieces to run applications.” The nice thing for data scientists is that this environment can be treated as a stand-alone unit, to turn on and run at any time – sort of a “portable machine.” It provides a complete virtual system configured to run your targeted application.

Using popular container runtime environments (e.g., Docker), containers can be an efficient way for individual users to deploy and manage software applications. This is especially useful for applications like SAS Analytics Pro, which participates in SAS Viya’s continuous delivery approach, releasing updates on a regular basis.

For large, IT-managed environments, containers can call for something like a “data center” to simplify deployment and management of dynamic workloads. The most prominent one today is Kubernetes, which automates key needs around containers including deployment, scaling, scheduling, healing, and monitoring – so the data scientist doesn’t have to.

The combination of containers and cloud environments provides an evolutionary jump in the infrastructure and runtime environments where data scientists run their applications. And this gives them a similar jump in being able to provide the business value their customers demand. Containerized and cloud-native deployment of SAS Analytics Pro provides the automatic optimization of resources and the automatic management of workloads that your organization needs to be competitive.

Note that existing customers can continue programming in SAS in a small footprint environment while availing themselves of the SAS container-based continuous delivery process. And if you aren’t already a SAS customer, cloud deployment gives you one more good reason to start letting SAS Analytics Pro deliver value to your organization.

Learn more

Website: SAS® Analytics Pro
Training: SAS® Global Certification Program

On-Premises Documentation:

Cloud-Native Documentation:

SAS Analytics Pro now available for on-site or containerized cloud-native deployment was published on SAS Users.