Tech

2月 152018
 

How effective is your organization at leveraging data and analytics to power your business models?

This question is surprising hard for many organizations to answer.  Most organizations lack a roadmap against which they can measure their effectiveness for using data and analytics to optimize key business processes, uncover new business opportunities or deliver a differentiated customer experience. They do not understand what’s possible with respect to integrating big data and data science into the organization’s business model (see Figure 1).

the economic value of data

Figure 1: Big Data Business Model Maturity Index

My SAS Global Forum 2018 presentation on Tuesday April 10, 2018 will discuss the transformative potential of big data and advanced analytics, and will leverage the Big Data Business Model Maturity Index as a guide for helping organizations understand where and how they can leverage data and analytics to power their business models.

Digital Twins, Analytics Profiles and the Power of One

We all understand that the volume and variety of data are increasing exponentially.  Your customers are leaving their digital fingerprints across the Internet via their website, social media, and mobile devices usage.  The Internet of Things will unleash an estimated 44 Zettabytes of data across 7 billion connected people by 2020.

However, big data isn’t really about big; it’s about small. It’s about understanding your customer and product behaviors at the level of the individual.  Big Data is about building detailed behavioral or analytic profiles for each individual (see Figure 2).

Figure 2: Building Individual Behavioral or Analytic Profiles

If you want to better serve your customers, you need to understand their tendencies, behaviors, inclinations, preferences, interests and passions at the level of each individual customer.

Customers’ expectations of their vendors are changing due to their personal experiences.  From recommending products, services, movies, music, routes and even spouses, customers are expecting their vendors to understand they well enough that these vendors can provide a hyper-personalized customer experience.

Demystifying Data Science (AI | ML | DL)

Too many organizations are spending too much time confusing too many executives on the capabilities of data science.  The concept of data science is simple; data science is about identifying the variables and metrics that might be better predictors of business and operational performance (see Figure 3).

Figure 3: A Moneyball Definition of Data Science

Whether using basic statistics, predictive analytics, data mining, machine learning, or deep learning, almost all of data science benefits are achieved from the simple formula of: Input (A) → Response (B).

Source: Andrew Ng, “What Artificial Intelligence Can and Can’t Do Right Now”

By collaborating closely with the business subject matter experts to choosing Input (A), those variables and metrics that might be better predictor of performance, the data science team can achieve more accurate, more granular, lower latency Response (B).  And the creative creation and selection of Input (A) creatively has already revolutionized many industries, and is poised to revolutionize more.

Data Monetization and the Economic Value of Data

Data is an unusual asset – it doesn’t deplete, it doesn’t wear out and it can be used across an infinite number of use cases at near zero marginal cost.  Organizations have no other assets with those unique characteristics.  And while traditional accounting methods of valuing assets works well with physical assets, account methods fall horribly – dangerously – short in properly determining the economic value of data.

Instead of using traditional accounting techniques to determine the value of the organization’s data, apply economic and data science concepts to determine the economic value of the data based upon it’s ability to optimize key business and operational processes, reduce compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated customer experience (see Figure 4).

Figure 4: Data Lake 3.0: Collaborative Value Creation Platform

The data lake, which can house both data and analytic models, is transformed from a simple data repository into a “collaborative value creation platform” that facilities the capture, refinement and sharing of the data and analytic digital assets across the enterprise.

Creating the Intelligent Enterprise

When you add up all of these concepts and advancements – Big Data, Analytic Profiles, Data Science and the Economic Value of Data – organizations are poised for digital transformation (see Figure 5).

Figure 5: Achieving Digital Transformation

And what is Digital Transformation?

Digital Transformation is application of digital capabilities to processes, products, and assets to improve efficiency, enhance customer value, manage risk, and uncover new monetization opportunities.

Looking forward to seeing you at my SAS Global Forum 2018 session and helping your organizations on its digital transformation!

Data monetization and the economic value of data was published on SAS Users.

2月 152018
 

In this article, I will set out clear principles for how SAS Viya 3.3 will interoperate with Kerberos. My aim is to present some overview concepts for how we can use Kerberos authentication with SAS Viya 3.3. We will look at both SAS Viya 3.3 clients and SAS 9.4M5 clents. In future blog posts, we’ll examine some of these use cases in more detail.

With SAS Viya 3.3 clients we have different use cases for how we can use Kerberos with the environment. In the first case, we use Kerberos delegation throughout the environment.

Use Case 1 – SAS Viya 3.3

The diagram below illustrates the use case where Kerberos delegation is used into, within, and out from the environment.

How SAS Viya 3.3 will interoperate with Kerberos

In this diagram, we show the end-user relying on Kerberos or Integrated Windows Authentication to log onto the SAS Logon Manager as part of their access to the visual interfaces. SAS Logon Manager is provided with a Kerberos keytab and HTTP principal to enable the Kerberos connection. In addition, the HTTP principal is flagged as “trusted for delegation” so that the credentials sent by the client include the delegated or forwardable Ticket-Granting Ticket (TGT). The configuration of SAS Logon Manager with SAS Viya 3.3 includes a new option to store this delegated credential. The delegated credential is stored in the credentials microservice, and secured so that only the end-user to which the credential belongs can access it.

When the end-user accesses SAS CAS from the visual interfaces the initial authentication takes place with the standard internal OAuth token. However, since the end-user stored a delegated credential when accessing the SAS Logon Manager an additional Origin attribute is set on the token of “Kerberos.” The internal OAuth token also contains the groups the end-user is a member of within the Claims. Since we want this end-user to run the SAS CAS session as themselves they must have been added to a custom group with the ID=CASHostAccountRequired. When the SAS CAS Controller receives the OAuth token with the additional Kerberos Origin, it requests the visual interface to make a second Kerberized connection. So, the visual interface retrieves the delegated credential from the credentials microservice and uses this to request a Service Ticket to connect to SAS CAS.

SAS CAS has been provided with a Kerberos keytab and a sascas principal to enable the Kerberos connection. Since the sascas principal is flagged as “trusted for delegation,” the credentials sent by the visual interfaces include a delegated or forwardable Ticket-Granting Ticket (TGT). SAS CAS validates the Service Ticket, which in turn authenticates the end-user. The SAS CAS Controller then launches the session as the end-user and constructs a Kerberos ticket cache containing the delegated TGT. Now, within their SAS CAS session the end-user can connect to the Secured Hadoop environment as themselves since the SAS CAS session has access to a TGT for the end-user.

This means in this first use case all access to, within, and out from the SAS Viya 3.3 environment leverages strong Kerberos authentication. This is our “gold-standard” for authenticating the end-user to each part of the environment.

But, it is strictly dependent on the end-user being a member of the custom group with ID=CASHostAccountRequired, and the two principals (HTTP and sascas) being trusted for delegation. Without both the Kerberos delegation will not take place.

Use Case 1a – SAS Viya 3.3

The diagram below illustrates a slight deviation on the first use case.

Here, either through choice or by omission, the end-user is not a member of the custom group with the ID=CASHostAccountRequired. Now even though the end-user connects with Kerberos and irrespective of the configuration of SAS Logon Manager to store delegated credentials the second connection using Kerberos is not made to SAS CAS. Now the SAS CAS session runs as the account that launched the SAS CAS controller, cas by default. Since, the session is not running as the end-user and SAS CAS did not receive a Kerberos connection, the Kerberos ticket cache that is generated for the session does not contain the credentials of the end-user. Instead, the Kerberos keytab and principal supplied to SAS CAS are used to establish the credentials in the Kerberos ticket cache.

This means that even though Kerberos was used to connect to SAS Logon Manager the connection to the Secured Hadoop environment is as the sascas principal and not the end-user.

The same situation could be arrived at if the HTTP principal for SAS Logon Manager is not trusted for delegation.

Use Case 1b – SAS Viya 3.3

A final deviation to the initial use case is shown in the following diagram.

In this case the end-user connects to SAS Logon Manager with any other form of authentication. This could be the default LDAP authentication, external OAuth, or external SAML authentication. Just as in use case 1a, this means that the connection to SAS CAS from the visual interfaces only uses the internal OAuth token. Again, since no delegated credentials are used to connect to SAS CAS the session is run as the account that launched the SAS CAS controller. Also, the ticket cache created by the SAS Cloud Analytic Service Controller contains the credentials from the Kerberos keytab, i.e. the sascas principal. This means that access to the Secured Hadoop environment is as the sascas principal and not the end-user.

Use Case 2 – SAS Viya 3.3

Our second use case covers those users entering the environment via the programming interfaces, for example SAS Studio. In this case, the end-users have entered a username and password, a credential set, into SAS Studio. This credential set is used to start their individual SAS Workspace Session and to connect to SAS CAS from the SAS Workspace Server. This is illustrated in the following figure.

Since the end-users are providing their username and password to SAS CAS it behaves differently. SAS CAS uses its own Pluggable Authentication Modules (PAM) configuration to validate the end-user’s credentials and hence launch the SAS CAS session process running as the end-user. However, in addition the SAS CAS Controller also uses the username and password to obtain an OAuth token from SAS Logon Manager and then can obtain any access control information from the SAS Viya 3.3 microservices. Obtaining the OAuth token from the SAS Logon Manager ensures any restrictions or global caslibs defined in the visual interfaces are observed in the programming interfaces.

With the SAS CAS session running as the end-user and any access controls validated, the SAS CAS session can access the Secured Hadoop cluster. Now since the SAS CAS session was launched using the PAM configuration, the Kerberos credentials used to access Hadoop will be those of the end-user. This means the PAM configuration on the machines hosting SAS CAS must be linked to Kerberos. This PAM configuration then ensures the Kerberos Ticket-Granting Ticket is available to the CAS Session as is it launched.

Next, we consider three further use cases where the client is SAS 9.4 maintenance 5. Remember that SAS 9.4 maintenance 5 can make a direct connection to SAS CAS without requiring SAS/CONNECT. The use cases we will discuss will illustrate the example with a SAS 9.4 maintenance 5 web application, such as SAS Studio. However, the statements and basic flows remain the same if the SAS 9.4 maintenance 5 client is a desktop application like SAS Enterprise Guide.

Use Case 3 – SAS 9.4 maintenance 5

First, let’s consider the case where our SAS 9.4 maintenance 5 end-user enters their username and password to access the SAS 9.4 environment. This is illustrated in the following diagram.

In this case, since the SAS 9.4 Workspace Server is launched using a username and password, these are cached on the launch of the process. This enables the SAS 9.4 Workspace Server to use these cached credentials when connecting to SAS CAS. However, the same process occurs if instead of the cached credentials being provided by the launching process, they are provided by another mechanism. These credentials could be provided from SAS 9.4 Metadata Server or from an authinfo file in the user’s home directory on the SAS 9.4 environment. In any case, the process on the SAS Cloud Analytic Server Controller is the same.

The username and password used to connect are validated through the PAM stack on the SAS CAS Controller, as well as being used to generate an internal OAuth token from the SAS Viya 3.3 Logon Manager. The PAM stack, just as in the SAS Viya 3.3 programming interface use case 2 above, is responsible for initializing the Kerberos credentials for the end-user. These Kerberos credentials are placed into a Kerberos Ticket cache which makes them available to the SAS CAS session for the connection to the Secured Hadoop environment. Therefore, all the different sessions within SAS 9.4, SAS Viya 3.3, and the Secured Hadoop environment run as the end-user.

Use Case 4 – SAS 9.4 maintenance 5

Now what about the case where the SAS 9.4 maintenance 5 environment is configured for Kerberos authentication throughout. The case where we have Kerberos delegation configured in SAS 9.4 is shown here.

Here the SAS 9.4 Workspace Server is launched with Kerberos credentials, the Service Principal for the SAS 9.4 Object Spawner will need to be trusted for delegation. This means that a Kerberos credential for the end-user is available to the SAS 9.4 Workspace Server. The SAS 9.4 Workspace Server can use this end-user Kerberos credential to request a Service Ticket for the connection to SAS CAS. SAS CAS is provided with a Kerberos keytab and principal it can use to validate this Service Ticket. Validating the Service Ticket authenticates the SAS 9.4 end-user to SAS CAS. The principal for SAS CAS must also be trusted for delegation. We need the SAS CAS session to have access to the Kerberos credentials of the SAS 9.4 end-user.

These Kerberos credentials made available to the SAS CAS are used for two purposes. First, they are used to make a Kerberized connection to the SAS Viya Logon Manager, this is to obtain the SAS Viya internal OAuth token. Therefore, the SAS Viya Logon Manager must be configured to accept Kerberos connections. Second, the Kerberos credentials of the SAS 9.4 end-user are used to connect to the Secure Hadoop environment.

In this case since all the various principals are trusted for delegation, our SAS 9.4 end-user can perform multiple authentication hops using Kerberos with each component. This means that through the use of Kerberos authentication the SAS 9.4 end-user is authenticated into SAS CAS and out to the Secure Hadoop environment.

Use Case 5 – SAS 9.4 maintenance 5

Finally, what about cases where the SAS 9.4 maintenance 5 session is not running as the end-user but as a launch credential; this is illustrated here.

The SAS 9.4 session in this case could be a SAS Stored Process Server, Pooled Workspace Server, or a SAS Workspace server leveraging a launch credential such as sassrv. The key point being that now the SAS 9.4 session is not running as the end-user and has no access to the end-user credentials. In this case we can still connect to SAS CAS and from there out to the Secured Hadoop environment. However, this requires some additional configuration. This setup will leverage One-Time-Passwords generated by the SAS 9.4 Metadata Server, so the SAS 9.4 Metadata Server must be made aware of the SAS CAS. This is done by adding a SAS 9.4 metadata definition for the SAS CAS. Our connection from SAS 9.4 must then be “metadata aware,” achieved by using authdomain=_sasmeta_ on the connection.

Equally, the SAS Viya 3.3 side of the environment must be able to validate the One-Time-Password used to connect to SAS CAS. When SAS CAS receives the One-Time-Password on the connection, it is sent to the SAS Viya Logon Manager for validation and to obtain a SAS Viya internal OAuth token. We need to add some configuration to the SAS Viya Logon Manager to enable this to validate the One-Time-Password. We configured the SAS Viya Logon Manager with the details of where the SAS 9.4 Web Infrastructure Platform is running. The SAS Viya Logon Manager passes the One-Time-Password to the SAS 9.4 Web Infrastructure Platform to validate the One-Time-Password. After the One-Time-Password is validated a SAS Viya internal OAuth token is generated and passed back to SAS CAS.

Since SAS CAS does not have access to the end-user credentials, the session that is created will be run using the account used to launch the controller process, cas by default. Since the end-user credentials are not available, the Kerberos credentials that are initialized for the session are from the Kerberos keytab provided to SAS CAS. Then the connection to the Secured Hadoop environment will be made using those Kerberos credentials of the principal assigned to the SAS CAS.

Summary

We have presented several use cases above. The table below can be used to summarize and differentiate the use cases based on key factors.

SAS Viya 3.3 - Some Kerberos principles was published on SAS Users.

2月 082018
 

In my last article, I worked with an example of using custom polygon data to create a regional geo map in SAS Visual Analytics 7.4. In this article, I will use almost the same example to illustrate the ease of implementing custom polygons to produce the same regional map in SAS Visual Analytics 8.2.

In this example, as in my last blog, the site has sales data for each sales region in the US and would like to display a geo map of the regions.

The six sales regions are:

Custom polygons in SAS Visual Analytics

We will again start with the MAPSGFK.US_STATES dataset, which contains the data required to overlay all states of the US on a VA region geomap and has these columns:

As in my last post, we will add the sales regions (REGION) column and values using data step code, and then use GREMOVE to remove the state boundaries, leaving the region boundary points.  For a look at that code, see my previous blog.

The following datastep adds the necessary columns/values to the polygon dataset so that the form of the data is what is expected by VA.  Note that the LAT and LONG columns are already in unprojected form, so we just assign those values to Y and X, so our column names will more closely match what we will see in the VA interface when creating the geographic data item.   We also create a SEQUENCE column, required by VA 8.2,  using the values of the internal variable, _n_.

data mydata.regions;
   set mydata.regions;
   sequence=_n_;
   id=region;
   x=long;
   y=lat;
   keep ID SEQUENCE SEGMENT X Y ;
   run;

The polygon table, REGIONS,  now has the following columns.

The dataset containing the region and measure data, REGIONSALES contains these columns:

Both datasets should be loaded into memory. Sign in to SAS Visual Analytics – Explore and Visualize Data and create a new report with data source REGIONSALES.

Create a new Geography data item from REGION as shown below, also specifying a New Polygon Provider with values shown on the next several screen shots.  Give the new provider a name and label, and specify the CAS server, library, and table name.

Scroll down to add the ID, Sequence, Segment, latitude and longitude columns.

The new geography data item, after clicking OK:

Now create a Geo Map of type Regions as shown:

Please Creating a regional map with custom polygons in SAS Visual Analytics 8.2 was published on SAS Users.

2月 082018
 

By default, SAS Visual Analytics 7.4 supports country and state level polygons for regional geomaps. In SAS Visual Analytics 7.4, custom shape files are now supported, as well. This means that if a site has their own custom polygon data that defines custom regions, it’s possible to create a region geomap that displays those regions.

Implementing the process requires completing some preparatory steps, explicitly execution of some SAS code, but the steps are explained in Appendix 2 of the SAS Visual Analytics 7.4: Administration Guide. The SAS program that completes the steps is provided for download at http://support.sas.com/rnd/datavisualization/vageo/va74polygons.sas.

Two examples using the program are provided in Appendix 2 for US counties and German provinces. The instructions in Appendix 2 assume that the custom polygon data is provided in ESRI shape file format, which is likely the most common use-case. The site will need access to a SAS programming environment and SAS/GRAPH software, and whoever completes the process will need access to the SAS Visual Analytics configuration directory and the ability to restart services—so an administrator-type person will be required.

One common request is to provide a regional geomap, where the regions are site-defined groups of states or provinces of a country. In this example problem, the site has sales data for each sales region in the US and would like to display a geo map of the regions.

Custom regional map in SAS Visual AnalyticsFor this type of region/province example, you will likely be able to use one of the maps already provided by SAS in the MAPSGFK library to produce your region boundaries. For more information on the datasets in the MAPSGFK library, see this paper. 

The MAPSGFK.US_STATES dataset contains the data required to overlay all states of the US on a VA region geomap and has these columns:

The highlighted columns, STATECODE, LONG, and LAT will be particularly useful, but first, the sales region (REGION) column and values must be added using simple data step code. The unnecessary FIPS code (STATE) can be dropped in the same DATA step.  Note that the region values are assigned in upper case, as these will later be converted to ID values, which VA expects to be in upper case.

data regions;
   length region $ 12;
   drop state;
   set mapsgfk.us_states;
      if statecode in ('AK','HI','PR') then delete;
      else if statecode in ('WA','MT','OR','ID','WY')
         then region='NORTHWEST';
      else if statecode in ('CA','NV','UT','AZ','CO','NM')
         then region='SOUTHWEST'; 
      else if statecode in ('ND','SD','NE','MN','WI','MI','IA','IL','IN')
         then region='NORTHCENTRAL'; 
      else if statecode in ('KS','OK','TX','MO','AR')
         then region='SOUTHCENTRAL'; 
      else if statecode in ('ME','NH','VT','MA','RI','CT','NY','PA','NJ','OH','DE',
'MD','DC')then region='NORTHEAST';
      else if statecode in ('KY','WV','VA','TN','NC','MS','AL','LA','GA','SC','FL')
         then region='SOUTHEAST';
      run;

The data is then sorted by the REGION values, a requirement of the SAS/GRAPH GREMOVE procedure, which is used to remove the internal state boundary data points, leaving the region boundary points only.

proc sort data=regions;
   by region;
 proc gremove data=regions out=mapscstm.regions1;
    by region;
    id statecode;
    run;

To complete the process, since the LAT and LONG values are already in the form that VA needs (unprojected) and we are using a SAS dataset rather than the ESRI shape file format, we’ll only use a part of the code from the downloadable program mentioned at the beginning of the blog.

First, create a mapscstm directory under /SASHome/SASFoundation/9.4 to store the custom polygon dataset.  Make sure that the library is accessible to the SAS session by including a libname statement in the appserver_autoexec_usermods.sas file, found in config/Lev1/SASApp, and then restarting the Object Spawner.

Example:

libname MAPSCSTM “SASHome/SASFoundation/9.4/mapscstm”;

Tip:  Be sure to back up the original ATTRLOOKUP and CENTLOOKUP datasets before running any additional code, as you will be modifying the originals.

To complete creation of the polygon dataset, you will need to execute only a part of the downloadable program to:
• Make sure that your polygon dataset has all of the columns expected by SAS Visual Analytics.
• Add the region attributes to the ATTRLOOKUP.
• Add the region center point locations to the CENTLOOKUP dataset.

%let REGION_LABEL=USRegions;   /* The label for the custom region */
 %let REGION_PREFIX=R1; /* unique ISO 2-Letter Code  */
 %let REGION_ISO=000; /* unique ISO Code  */
 %let REGION_DATASET=MAPSCSTM.REGIONS1;  /* Polygon data set to be 
              created - be sure to use suffix "1" */

Note that the downloadable program includes additional macro assignments and additional code, but since our data is already in the form of a SAS dataset, rather than ESRI shape file format, we won’t be using all of the code.

The following datastep adds the necessary columns/values to the polygon dataset so that the form of the data is what is expected by VA.  Note that the LAT and LONG columns are already in unprojected form, so we just assign the same values to X and Y.  (VA doesn’t actually use the X,Y columns from the polygon dataset.)

data &REGION_DATASET.;
   set &REGION_DATASET.;
   where density <= 3; 
   id=region;
   idname=region;
   x=long;  
   y=lat;
   ISO = "&REGION_ISO.";
   RESOLUTION = 1;
   LAKE = 0;
   ISOALPHA2 = "&REGION_PREFIX.";
   AdminType = "regions";
   keep ID SEGMENT IDNAME LONG LAT X Y ISO DENSITY RESOLUTION LAKE ISOALPHA2 AdminType;
   run;

Then PROC SQL steps are executed to add rows relative to the custom polygons to the ATTRLOOKUP and CENTLOOKUP datasets:

This step adds the USRegions row to ATTRLOOKUP:

proc sql;
   insert into valib.attrlookup
      values ( 
         "&REGION_LABEL.",         /* IDLABEL=State/Province Label */
         "&REGION_PREFIX.",        /* ID=SAS Map ID Value */
         "&REGION_LABEL.",         /* IDNAME=State/Province Name */
         "",                       /* ID1NAME=Country Name */
         "",                       /* ID2NAME */
         "&REGION_ISO.",           /* ISO=Country ISO Numeric Code */
         "&REGION_LABEL.",         /* ISONAME */
         "&REGION_LABEL.",         /* KEY */
         "",                       /* ID1=Country ISO 2-Letter Code */
         "",                       /* ID2 */
         "",                       /* ID3 */
         "",                       /* ID3NAME */
         0                         /* LEVEL (0=country level, 1=state level) */
         );
quit;

This step adds a row to ATTRLOOKUP for each individual region:

proc sql;
   insert into valib.attrlookup
      select distinct 
         IDNAME,            /* IDLABEL=State/Province Label */
         ID,                /* ID=SAS Map ID Value */
         IDNAME,            /* IDNAME=State/Province Name */
 
         "&REGION_LABEL.",  /* ID1NAME=Country Name */
         "",                /* ID2NAME */
         "&REGION_ISO.",    /* ISO=Country ISO Numeric Code */
         "&REGION_LABEL.",  /* ISONAME */
         trim(IDNAME) || "|&REGION_LABEL.",  /* KEY */
         "&REGION_PREFIX.",   /* ID1=Country ISO 2-Letter Code */
         "",                  /* ID2 */
         "",                  /* ID3 */
         "",                  /* ID3NAME */
         1                    /* LEVEL (1=state level) */
   from &REGION_DATASET.;
quit;

This step calculates and adds the central location point for each of the regions to the CENTLOOKUP dataset.   The site data contains only the 48 contiguous states (no Alaska or Hawaii). If Alaska and Hawaii had been included, a different algorithm would need to be used to calculate the central location.

proc sql;
   /* Add custom region */
   insert into valib.centlookup
      select distinct
         "&REGION_DATASET." as mapname,
         "&REGION_PREFIX." as ID,
         avg(x) as x,
         avg(y) as y
      from &REGION_DATASET.;
 
   /* Add custom provinces */
   insert into valib.centlookup
      select distinct
         "&REGION_DATASET." as mapname,
         ID as ID,
         avg(x) as x,
         avg(y) as y
      from &REGION_DATASET.
         group by id;
quit;

After executing the code above, you will need to restart the Web Application server, so that SAS Visual Analytics has access to the new polygons.

Code is also included in the downloadable program to create a dataset for validating your results. The validate dataset includes a column for the ID and IDNAME of the regions, in addition to two randomly calculated measures.  In our case, we will instead just use our original REGIONSALES dataset containing the regional sales data.

1. Sign into SAS Visual Analytics and create a new exploration with data source REGIONSALES.
2. Create a Geo data item from State: Right-click Regions, select Geography?Subdivision(State, Province) Names. From the Country or Region drop-down list, select the USRegions region label.
3. Create a geo map visualization. Select Regions for the map style, Regions for the Geography role, and salesamt for the Color role.

Your regions should display, similar to this:

You can also include the region data item in a hierarchy with the state data item to produce a drill-down region map:

Or a bubble or coordinate map:

I hope this example has been helpful to users of SAS Visual Analytics 7.4.  In my next blog, you will see that this process is tremendously simplified by new mapping features in SAS Visual Analytics 8.2.

Creating a custom regional map in SAS Visual Analytics 7.4 was published on SAS Users.

2月 072018
 

Jazz up your Geo Map or Network Analysis graph by applying icon-based display rule markers instead of color markers on the map. With SAS Visual Analytics, you may have already used display rules by populating intervals or adding color-mapped values for report objects. Now, you can jazz up your Geo Map or Network Analysis object by choosing from a curated set of icons and applying icon-based display rule markers.SAS Visual Analytics Geo Map

The set of curated icons in SAS Visual Analytics 8.2 are classified into these groups for use with icon-based display rules:

SAS Visual Analytics Geo Map
Here is an example of the display-rule icons that are available for Status:

SAS Visual Analytics Geo Map
When your mouse hovers over an icon, the name of that icon is displayed.

Applying Icon-Based Display Rules to a Geo Map

While working with a data source that included a measure for the total number of cellular mobile subscriptions per 100 in each country, I wanted to display the results in a Geo Map. Before creating the display rules, I looked at the data for the mobile cellular subscriptions for various countries, and decided that I wanted to create four display rules, each one associated with the number of mobile cellular subscriptions per 100. The icons that I wanted for my display rules were all available under Status. So here’s how I decided to set up my operators, values, and the icon style and color:


Here are the steps I followed to setup the icon-based display rules.

Create the New Geography Item

1.  In my new SAS Visual Analytics 8.2 report, I went to Objects, chose Geo Map (available under Graphs) and dragged it over to the blank canvas.
2.  From Data, I searched for my data source and added it to the report.
3.  In my data source, I highlighted the category (Country), right clicked, and selected New Geography.
4.  In the New Geography Item dialog, I entered a name for the new geographic item that I was creating: Country (Geographic Item).

Change Geo Map Type to Coordinates

5.  I select the Geo Map, go to Options in SAS Visual Analytics and scroll down to Map.
6.  By default, the Type is set to Bubbles. I change it to Coordinates (this is a requirement to create the icon-based display rules).


7.  The default size for the Marker size is 11. I change it to 14 because I would like my markers to show up slightly bigger in the Geo Map.
8.   By default, Legend is displayed for the Geo Map and Visibility is set to On. I chose not to display Legend information for the Geo Map, so I chose Off for Visibility.

Choose Role for the Geo Map

9.  I choose Roles, and I am ready to assign the geographic data item to Category. So I choose the Country (Geographic Item) that I had just created, and drag it over to Category. I now see the Country (Geographic) data role applied to the Geo Map.

Create Icon-Based Display Rules for the Geo Map

10.  I click on Rules and under Display Rules, I click on New rule.


11.  In the New Display Rule dialog, I chose <= for Operator and entered a 250 for Value.
12.  I click on Style and choose Red as the color for this display rule.


13.  I click on Icon, and I am presented with seven categories for the icons. When I hover an icon, the icon name is displayed.
14.  I click on the Significantly Lower icon and click OK.


15.  A quick review of what I just created and I click OK.


16.  I continue to create three additional display rules for my Geo Map.

Now, I have completed creating the four display rules. Here’s how they show in the SAS Visual Analytics Viewer:


17.  When all of the display rules have been created, the Geo Map displays with the colorful icon-based display rules applied to the various countries.


You’ve just seen how you can create icon-based display rules for a Geo Map. You can also create icon-based display rules for a Network Analysis object as well.

Jazz up a Geo Map with colorful icon-based display rules was published on SAS Users.

1月 272018
 

Are you interested in using SAS Visual Analytics 8.2 to visualize a state by regions, but all you have is a county shapefile?  As long as you can cross-walk counties to regions, this is easier to do than you might think.

Here are the steps involved:

Step 1

Obtain a county shapefile and extract all components to a folder. For example, I used the US Counties shapefile found in this SAS Visual Analytics community post.

Note: Shapefile is a geospatial data format developed by ESRI. Shapefiles are comprised of multiple files. When you unzip the shapefile found on the community site, make sure to extract all of its components and not just the .shp. You can get more information about shapefiles from this Wikipedia article:  https://en.wikipedia.org/wiki/Shapefile.

Step 2

Run PROC MAPIMPORT to convert the shapefile into a SAS map dataset.

libname geo 'C:\Geography'; /*location of the extracted shapefile*/
 
proc mapimport datafile="C:\Geography\UScounties.shp"
out=geo.shapefile_counties;
run;

Step 3

Add a Region variable to your SAS map dataset. If all you need is one state, you can subset the map dataset to keep just the state you need. For example, I only needed Texas, so I used the State_FIPS variable to subset the map dataset:

proc sql;
create table temp as select
*, 
/*cross-walk counties to regions*/
case
when name='Anderson' then '4'
when name='Andrews' then '9'
when name='Angelina' then '5'
when name='Aransas' then '11',
<……>
when name='Zapata' then '11'
when name='Zavala' then '8'
end as region 
from geo.shapefile_counties
/*subset to Texas*/
where state_fips='48'; 
quit;

Step 4

Use PROC GREMOVE to dissolve the boundaries between counties that belong to the same region. It is important to sort the county dataset by region before you run PROC GREMOVE.

proc sort data=temp;
by region;
run;
 
proc gremove
data=temp
out=geo.regions_shapefile
nodecycle;
by region;
id name; /*name is county name*/
run;

Step 5

To validate that your boundaries resolved correctly, run PROC GMAP to view the regions. If the regions do not look right when you run this step, it may signal an issue with the underlying data. For example, when I ran this with a county shapefile obtained from Census, I found that some of the counties were mislabeled, which of course, caused the regions to not dissolve correctly.

proc gmap map=geo.regions_shapefile data=geo.regions_shapefile all;
   id region;
   choro region / nolegend levels=1;
run;

Here’s the result I got, which is exactly what I expected:

Custom Regional Maps in SAS Visual Analytics

Step 6

Add a sequence number variable to the regions dataset. SAS Visual Analytics 8.2 needs it properly define a custom polygon inside a report:

data geo.regions_shapefile;
set geo.regions_shapefile;
seqno=_n_;
run;

Step 7

Load the new region shapefile in SAS Visual Analytics.

Step 8

In the dataset with the region variable that you want to visualize, create a new geography variable and define a new custom polygon provider.

Geography Variable:

Polygon Provider:

Step 9

Now, you can create a map of your custom regions:

How to create custom regional maps in SAS Visual Analytics 8.2 was published on SAS Users.

1月 262018
 

If you have worked with the different types of score code generated by the high-performance modeling nodes in SAS® Enterprise Miner™ 14.1, you have probably come across the Analytic Store (or ASTORE) file type for scoring.  The ASTOREfile type works very well for scoring complex machine learning models like random forests, gradient boosting, support vector machines and others. In this article, we will focus on ASTORE files generated by SAS® Viya® Visual Data Mining and Machine Learning (VDMML) procedures. An introduction to analytic stores on SAS Viya can be found here.

In this post, we will:

  1. Generate an ASTORE file for a PROC ASTORE in SAS Visual Data Mining and Machine Learning.

Generate an ASTORE file for a gradient boosting model

Our example dataset is a distributed in-memory CAS table that contains information about applicants who were granted credit for a certain home equity loan. The categorical binary-valued target variable ‘BAD’ identifies if a client either defaulted or repaid their loan. The remainder of the variables indicating the candidate’s credit history, debt-to-income ratio, occupation, etc., are used as predictors for the model. In the code below, we are training a gradient boosting model on a randomly sampled 70% of the data and validating against 30% of the data. The statement SAVESTATE creates an analytic store file (ASTORE) for the model and saves it as a binary file named “astore_gb.”

proc gradboost data=PUBLIC.HMEQ;
partition fraction(validate=0.3);
target BAD / level=nominal;
input LOAN MORTDUE DEBTINC VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO /
level=interval;
input REASON JOB / level=nominal;
score out=public.hmeq_scored copyvars=(_all_);
savestate rstore=public.astore_gb;
id _all_;
run;

Shown below are a few observations from the scored dataset hmeq_scored  where YOJ (years at present job) is greater than 10 years.

PROC ASTORE

Override the scoring decision using PROC ASTORE

In this segment, we will use PROC ASTORE to override the scoring decision from the gradient boosting model. To that end, we will first make use of the DESCRIBE statement in PROC ASTORE to produce basic DS2 scoring code using the EPCODE option. We will then edit the score code in DS2 language syntax to override the scoring decision produced from the gradient boosting model.

proc astore;
    describe rstore=public.astore_gb
        epcode="/viyafiles/jukhar/gb_epcode.sas"; run;

A snapshot of the output from the above code statements are shown below. The analytic store is assigned to a unique string identifier. We also get information about the analytic engine that produced the store (gradient boosting, in this case) and the time when the store was created. In addition, though not shown in the snapshot below, we get a list of the input and output variables used.

Let’s take a look at the DS2 score code (“gb_epcode.sas”) produced by the EPCODE option in the DESCRIBE statement within PROC ASTORE.

data sasep.out;
	 dcl package score sc();
	 dcl double "LOAN";
	 dcl double "MORTDUE";
	 dcl double "DEBTINC";
	 dcl double "VALUE";
	 dcl double "YOJ";
	 dcl double "DEROG";
	 dcl double "DELINQ";
	 dcl double "CLAGE";
	 dcl double "NINQ";
	 dcl double "CLNO";
	 dcl nchar(7) "REASON";
	 dcl nchar(7) "JOB";
	 dcl double "BAD";
	 dcl double "P_BAD1" having label n'Predicted: BAD=1';
	 dcl double "P_BAD0" having label n'Predicted: BAD=0';
	 dcl nchar(32) "I_BAD" having label n'Into: BAD';
	 dcl nchar(4) "_WARN_" having label n'Warnings';
	 Keep 
		 "P_BAD1" 
		 "P_BAD0" 
		 "I_BAD" 
		 "_WARN_" 
		 "BAD" 
		 "LOAN" 
		 "MORTDUE" 
		 "VALUE" 
		 "REASON" 
		 "JOB" 
		 "YOJ" 
		 "DEROG" 
		 "DELINQ" 
		 "CLAGE" 
		 "NINQ" 
		 "CLNO" 
		 "DEBTINC" 
		;
	 varlist allvars[_all_];
	 method init();
		 sc.setvars(allvars);
		 sc.setKey(n'F8E7B0B4B71C8F39D679ECDCC70F6C3533C21BD5');
	 end;
	 method preScoreRecord();
	 end;
	 method postScoreRecord();
	 end;
	 method term();
	 end;
	 method run();
		 set sasep.in;
		 preScoreRecord();
		 sc.scoreRecord();
		 postScoreRecord();
	 end;
 enddata;

The sc.setKey in the method init () method block contains a string identifier for the analytic store; this is the same ASTORE identifier that was previously outputted as part of PROC ASTORE. In order to override the scoring decision created from the original gradient boosting model, we will edit the gb_epcode.sas file (shown above) by inserting new statements in the postScoreRecord method block; the edited file must follow DS2 language syntax. For more information about the DS2 language, see 

method postScoreRecord();
      if YOJ>10 then do; I_BAD_NEW='0'; end; else do; I_BAD_NEW=I_BAD; end;
    end;

Because we are saving the outcome into a new variable called “I_BAD_NEW,” we will need to declare this variable upfront along with the rest of the variables in the score file.

In order for this override to take effect, we will need to run the SCORE statement in PROC ASTORE and provide both the original ASTORE file (astore_gb), as well as the edited DS2 score code (gb_epcode.sas).

proc astore;
    score data=public.hmeq epcode="/viyafiles/jukhar/gb_epcode.sas"
          rstore=public.astore_gb
          out=public.hmeq_new; run;

A comparison of “I_BAD” and “I_BAD_NEW” in the output of the above code for select variables shows that the override rule for scoring has indeed taken place.

In this article we explored how to override the scoring decision produced from a machine learning model in SAS Viya. You will find more information about scoring in the Using PROC ASTORE to override scoring decisions in SAS® Viya® was published on SAS Users.

1月 262018
 

If you have worked with the different types of score code generated by the high-performance modeling nodes in SAS® Enterprise Miner™ 14.1, you have probably come across the Analytic Store (or ASTORE) file type for scoring.  The ASTOREfile type works very well for scoring complex machine learning models like random forests, gradient boosting, support vector machines and others. In this article, we will focus on ASTORE files generated by SAS® Viya® Visual Data Mining and Machine Learning (VDMML) procedures. An introduction to analytic stores on SAS Viya can be found here.

In this post, we will:

  1. Generate an ASTORE file for a PROC ASTORE in SAS Visual Data Mining and Machine Learning.

Generate an ASTORE file for a gradient boosting model

Our example dataset is a distributed in-memory CAS table that contains information about applicants who were granted credit for a certain home equity loan. The categorical binary-valued target variable ‘BAD’ identifies if a client either defaulted or repaid their loan. The remainder of the variables indicating the candidate’s credit history, debt-to-income ratio, occupation, etc., are used as predictors for the model. In the code below, we are training a gradient boosting model on a randomly sampled 70% of the data and validating against 30% of the data. The statement SAVESTATE creates an analytic store file (ASTORE) for the model and saves it as a binary file named “astore_gb.”

proc gradboost data=PUBLIC.HMEQ;
partition fraction(validate=0.3);
target BAD / level=nominal;
input LOAN MORTDUE DEBTINC VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO /
level=interval;
input REASON JOB / level=nominal;
score out=public.hmeq_scored copyvars=(_all_);
savestate rstore=public.astore_gb;
id _all_;
run;

Shown below are a few observations from the scored dataset hmeq_scored  where YOJ (years at present job) is greater than 10 years.

PROC ASTORE

Override the scoring decision using PROC ASTORE

In this segment, we will use PROC ASTORE to override the scoring decision from the gradient boosting model. To that end, we will first make use of the DESCRIBE statement in PROC ASTORE to produce basic DS2 scoring code using the EPCODE option. We will then edit the score code in DS2 language syntax to override the scoring decision produced from the gradient boosting model.

proc astore;
    describe rstore=public.astore_gb
        epcode="/viyafiles/jukhar/gb_epcode.sas"; run;

A snapshot of the output from the above code statements are shown below. The analytic store is assigned to a unique string identifier. We also get information about the analytic engine that produced the store (gradient boosting, in this case) and the time when the store was created. In addition, though not shown in the snapshot below, we get a list of the input and output variables used.

Let’s take a look at the DS2 score code (“gb_epcode.sas”) produced by the EPCODE option in the DESCRIBE statement within PROC ASTORE.

data sasep.out;
	 dcl package score sc();
	 dcl double "LOAN";
	 dcl double "MORTDUE";
	 dcl double "DEBTINC";
	 dcl double "VALUE";
	 dcl double "YOJ";
	 dcl double "DEROG";
	 dcl double "DELINQ";
	 dcl double "CLAGE";
	 dcl double "NINQ";
	 dcl double "CLNO";
	 dcl nchar(7) "REASON";
	 dcl nchar(7) "JOB";
	 dcl double "BAD";
	 dcl double "P_BAD1" having label n'Predicted: BAD=1';
	 dcl double "P_BAD0" having label n'Predicted: BAD=0';
	 dcl nchar(32) "I_BAD" having label n'Into: BAD';
	 dcl nchar(4) "_WARN_" having label n'Warnings';
	 Keep 
		 "P_BAD1" 
		 "P_BAD0" 
		 "I_BAD" 
		 "_WARN_" 
		 "BAD" 
		 "LOAN" 
		 "MORTDUE" 
		 "VALUE" 
		 "REASON" 
		 "JOB" 
		 "YOJ" 
		 "DEROG" 
		 "DELINQ" 
		 "CLAGE" 
		 "NINQ" 
		 "CLNO" 
		 "DEBTINC" 
		;
	 varlist allvars[_all_];
	 method init();
		 sc.setvars(allvars);
		 sc.setKey(n'F8E7B0B4B71C8F39D679ECDCC70F6C3533C21BD5');
	 end;
	 method preScoreRecord();
	 end;
	 method postScoreRecord();
	 end;
	 method term();
	 end;
	 method run();
		 set sasep.in;
		 preScoreRecord();
		 sc.scoreRecord();
		 postScoreRecord();
	 end;
 enddata;

The sc.setKey in the method init () method block contains a string identifier for the analytic store; this is the same ASTORE identifier that was previously outputted as part of PROC ASTORE. In order to override the scoring decision created from the original gradient boosting model, we will edit the gb_epcode.sas file (shown above) by inserting new statements in the postScoreRecord method block; the edited file must follow DS2 language syntax. For more information about the DS2 language, see 

method postScoreRecord();
      if YOJ>10 then do; I_BAD_NEW='0'; end; else do; I_BAD_NEW=I_BAD; end;
    end;

Because we are saving the outcome into a new variable called “I_BAD_NEW,” we will need to declare this variable upfront along with the rest of the variables in the score file.

In order for this override to take effect, we will need to run the SCORE statement in PROC ASTORE and provide both the original ASTORE file (astore_gb), as well as the edited DS2 score code (gb_epcode.sas).

proc astore;
    score data=public.hmeq epcode="/viyafiles/jukhar/gb_epcode.sas"
          rstore=public.astore_gb
          out=public.hmeq_new; run;

A comparison of “I_BAD” and “I_BAD_NEW” in the output of the above code for select variables shows that the override rule for scoring has indeed taken place.

In this article we explored how to override the scoring decision produced from a machine learning model in SAS Viya. You will find more information about scoring in the Using PROC ASTORE to override scoring decisions in SAS® Viya® was published on SAS Users.

1月 262018
 

Batch Scripts in SASIf Necessity is the mother of Invention, then, perhaps, the father of Automation is Laziness. Automation is all about convenience, comfort, and productivity. Why do it yourself if you can devise something to do it for you!

In my previous post Running SAS programs in batch under Unix/Linux, we learned that in order to automate SAS jobs submissions, they are often run in batch mode. We also learned that we usually create batch scripts as a convenient way to run SAS programs in batch. To create a unique SAS log file generated with each batch submission, a typical batch script may look like follows:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

It will allow you to submit your SAS program /sas/code/project1/program1.sas in batch, and also capture SAS log file with a convenient date-time suffix in the same directory.

SAS program to write batch scripts

But what if we are deploying multiple SAS programs? Well, then we would need to create a batch script for each of them. They will all look similar to each other, and that is when most human errors usually occur – when we do something similar, monotonously, over and over again.  Besides, I found working with the Unix Visual Editor (“vi editor”) is not quite a 21st century experience.

What would a normal SAS programmer do in such a situation? That’s right – we would write a SAS program to write a batch script file! Let’s do it. Let’s automate the automation.

In its simplest form, to replicate the above batch script example our SAS program would look like this:

filename b '/sas/code/project1/program1.sh';
 
data _null_;
file b;
input;
put _infile_;
datalines;
#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname
;

Setting up batch file permissions

As we already know from my previous post, we need to assign certain permissions to our batch file in order to make it executable. For example, if you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In order to do that you can to add to your SAS code the following x-statement:

options noxwait;
x 'chmod 750 /sas/code/project1/program1.sh';

Alternatively, you can use %SYSEXEC macro statement (no quoting for the OS command) or SYSTASK statement, or CALL SYSTEM routine (used within a data step).

When you create a batch script by running the above code in SAS Enterprise Guide (EG), you don’t have to leave the comfort of your SAS environment or even touch Unix vi editor. Moreover, you can even submit your SAS job in batch mode right from your SAS EG Program Editor.

However, all of these will work fine, unless XCMD System Option is disabled (NOXCMD).

Assigning batch file permissions when XCMD System Option is disabled

ERROR: Shell escape is not valid in this SAS session.

Bummer! Have you ever seen this error message in SAS Enterprise Guide while trying to run SAS code with the X statement? It indicates that executing OS commands in the SAS environment is not allowed.

In many organizations, IT department policies do not allow enabling the SAS XCMD system option due to cyber-security concerns. This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration.  In this case, no operating system command is allowed to be executed from within SAS.

Of course, this substantially limits SAS’ automation power, but that is the goal and the price to pay for enhanced security.

Still, even without OS command execution at our disposal, we can set Unix script file permissions using FILENAME statement’s PERMISSION= option. Then our above filename statement will look like this:

filename b '/sas/code/project1/program1.sh'
   permission='A::u::rwx,A::g::r-x,A::o::---';

However, it is important to realize that your ability to fully control file permissions via FILENAME statement’s PERMISSION= option is still restricted by the Unix umask value set by your IT system administrator. But usually, it is not overly restrictive, at least for the purpose of creating executable files in the environments I have worked with.

The double benefit of the FILENAME statement’s PERMISSION= option is that it can be used for setting up file permissions in any SAS installation whether the XCMD system option is enabled or disabled.

SAS macro to create batch script files

Let’s wrap all the above SAS code pieces into a SAS macro that writes batch scripts. Here is the macro code definition:

%macro write_shell(code);
   %let fdir = %substr(&code,1,%sysfunc(findc(&code,/,b)));
 
   options dlcreatedir;
   libname _flib "&fdir";
   libname _flib;
 
   %let core = %substr(&code,1,%eval(%length(&code)-4));
   filename _fout "&core..sh" permission='A::u::rwx,A::g::r-x,A::o::---'; 
   data _null_;
      file _fout;
      put
         '#!/bin/sh' //
         'now=$(date +%Y.%m.%d_%H.%M.%S)' /
         "pgmname=""&code""" /
         "logname=""&core._$now.log""" /
         'sas $pgmname -log $logname'
         ;
   run;
   filename _fout;
%mend write_shell;

The single macro parameter (code) represents full path name of your SAS code. And here is a macro invocation example:

 
%write_shell(/sas/code/project1/program1.sas)

The assumption here is that the script file gets created in the same directory as the relevant SAS code and SAS logs for each of the batch runs. It will be assigned the same name as your SAS program, only with the .sh name extension. As you can see, we do some string parsing to derive directory name, script file name and SAS log file name from the single macro parameter representing full path name of your SAS code. As an added bonus, if a specified directory (/sas/code/project1/) does not yet exist, it will be created by this macro. DLCREATEDIR System Option (along with the two subsequent libname statements) are responsible for the directory creation.

If you want to create many script files for your multiple SAS programs, you just invoke the macro as many times. You can even go totally data-driven for mass script file creation.

Do you find this useful?

Please let me know in the comments section below if you find this blog post useful. Thank you for reading! I also invite you to share your ideas and experiences on the topic.

Let SAS write batch scripts for you was published on SAS Users.

1月 252018
 

Most people who work with optimization are familiar with Linear and Integer Programming, to their toolkit they could add Constraint Programming. Constraint Programming is a powerful technique that is used to solve powerful “real-world” problems in a variety of areas, such as, planning, scheduling, DNA Sequencing, computer graphics and natural language processing.

Constraint Programming is a powerful paradigm which can be used by itself or in combination with Integer Programming. In this article, I’ll show you how to implement a simple Constraint Programming example that solves Sudoku puzzles using the CLP functionality in SAS Optimization.

Have you ever wondered after working in a particularly difficult Sudoku puzzle if the puzzle can be solved? Would you like to schedule your child’s little league games like a pro using the Round-Robin tournament format, just like it is done in professional sport leagues?

If so, Constraint Programming is the answer. But what is Constraint Programming?  Let’s start answering this question by reviewing the familiar Linear and Integer Programming formulations and then comparing them with the one for Constraint Programming.

Most people have heard about Linear Programming and Integer Programming, where the typical mathematical structure for an Integer Programming model is:

Max    c1x1+ c2x2+ … + cnxn

Subject to
a11x1 + a12x2+ … + a1nxnb1
….
an1x1+ an2x2+ … + annxnbn

xj  integer for  all j = 1 to n

These equations describe a problem where the goal (or objective) is to maximize a metric that is related to a set of variables (x1, …, xn) to be determined by solving the problem. The goal (or objective) to be maximized could be, for example, profit, amount of food distributed, etc. The set of variables are related to the goal, and in a typical marketing problem would represent marketing campaigns, customer response, channels used to distribute those campaigns, etc. Constraints are the rules that relate the variables to the available resources to solve the problem. In a marketing problem, b1 could represent the available budget, …, bn could represent the capacity of the call center.

When all variables are continuous we have a linear program; when some of the variables must be integers, we have a mixed integer programming problem. Notice that the constraints in the formulation above simply describe a logical relationship among several variables. Because each variable must take an integer value, their domain is the set of integers.

In Constraint Programming the relationships between variables are stated in the form of constraints. Constraints specify the properties of a solution to be found. A key insight for Constraint Programming is to understand that a constraint is simply a logical relationship among several finite unknowns (or variables), each taking a value in a finite domain. A constraint thus restricts the possible values that the variables can simultaneously take, it represents some partial information about the variables of interest.

An example of a scheduling problem described using the Constraint Programming approach is below All tasks relationships are of type “FS” which means “finish-to-start” and can be used to indicate which task precedes another one:

Forall (j in Jobs)

/* Indicates which task precedes another one */
Forall (t in 1..nbTasks-1)

task [j,t]   FS   task[j, t+1];

forall ( j in Jobs)

/* Indicates which tools to be used */

forall ( t in Tasks)

requires task[j,t] = (tool[j,t];

In this scheduling problem, the goal is to find the task sequence for each job while satisfying the constraints on task precedence and tool availability.

More formally, a Constraint Program can be defined using a triple X, D, C, where

  • X = { X1, …, Xn}  is a finite set of variables
  • D = {D1, …, Dn}  is a finite set of domains, where Di is a finite set of possible values that the variable Xi can take. Di is known as the domain of variable Xi
  • C = {C1, …, Cn}  is a finite set of constraints that restrict the values that the variables can simultaneously take.

Constraint solvers find an assignment to the variables that satisfies all the constraints using constraint propagation, backtracking, branch and bound algorithms or local search. There are many specialized resources (books, articles, etc.) that describe these methods.

Many times for complex problems, a hybrid approach is used, that is, an approach that uses Integer Programming, Constraint Programming and Heuristic procedures.

Let’s solve the simple Send More Money and the Sudoku puzzles to make clear the formal Constraint Program formulation given above.

Send More Money Puzzle

The Send More Money puzzle consists of finding unique digits for the letters D, E, M, N, O, R, S, and Y such that S and M are different from zero (no leading zeros) and the following equation is satisfied:

S E N D
+   M O R E

M O N E Y

Step #1: Define the variables:

S, E, N, D, M, O, R, E, Y

Step #2: Define the Domain of those variables

  1. S, E, N, D, M, O, R, E, Y must take integer values between 1 and 9
  2. S can’t be zero
  3. M can’t be zero

Step #2: Define the Domain of those variables

  1. S * 1000 + E * 100 + N * 10 + D + M * 1000 + O * 100 + R * 10 + E =
    10000 * M + O * 1000 + N * 100 + E * 10 + Y
  2. All variables must be different

The unique solution to this problem is

S E N D M O R Y
9 5 6 7 1 0 8 2

 

And can be found using the CLP procedure in SAS Optimization, with this code

proc clp dom=[0,9] 		/* Define the default domain */
out=out; 			/* Name the output data set */
var S E N D M O R Y; 	        /* Declare the variables */
lincon 				/* Linear constraints */
 
/* SEND + MORE = MONEY */
1000*S + 100*E + 10*N + D + 1000*M + 100*O + 10*R + E
=
10000*M + 1000*O + 100*N + 10*E + Y,
 
 
 
S<>0,                           
M<>0;                          /* No leading zeros */
 
alldiff(); 		/* All variables have pairwise
 				   distinct values*/
run;

The Sudoku Puzzle

Step #1: Define your variables.

We are searching for 81 variables that are arranged in a 9×9 matrix, let Cij represent the value of the cell in the ith row and the jth column, where i=1, …, 9 and j=1, …, 9

Step # 2: Define the Domain of those variables

Cij can take any integer value between 1 and 9

Step # 3: Define the Constraints.

  1. For each row i, all values in that row must be different.
  2. For each column j, all values in that column must be different.
  3. For each 3×3 block Bb all values in that block must be different.

If we start with the initial values

Constraint Programming in SAS Optimization

Then the solution is

Constraint Programming in SAS Optimization

This solution can be found using the CLP procedure in SAS Optimization, with this code (note that the initial puzzle is entered in the step data indata and the final solution is nicely printed with the macro printSol).

data indata;
input C1-C9;
datalines;
. . 5 . . 7 . . 1
. 7 . . 9 . . 3 .
. . . 6 . . . . .
. . 3 . . 1 . . 5
. 9 . . 8 . . 2 .
1 . . 2 . . 4 . .
. . 2 . . 6 . . 9
. . . . 4 . . 8 .
8 . . 1 . . 5 . .
;
run;
%macro store_initial_values;
/* store initial values into macro variable C_i_j */
data _null_;
set indata;
 
array C{9};
do j = 1 to 9;
i = _N_;
call symput(compress("C_" ||  put(i,best.)  || "_"  || put(j,best.)),
put(C[j],best.));
end;
run;
%mend store_initial_values;
%store_initial_values;
 
%macro solve;
proc clp out=outdata;
 
%do i = 1 %to 9;
var (X_&i._1-X_&i._9) = [1,9];
alldiff(X_&i._1-X_&i._9);
%end;
 
%do j = 1 %to 9;
alldiff(
%do i = 1 %to 9;
X_&i._&j
%end;
);
%end;
 
%do s = 0 %to 2;
%do t = 0 %to 2;
alldiff(
%do i = 3*&s + 1 %to 3*&s + 3;
%do j = 3*&t + 1 %to 3*&t + 3;
X_&i._&j
%end;
%end;
);
%end;
%end;
 
 
%do i = 1 %to 9;
%do j = 1 %to 9;
%if &&C_&i._&j ne . %then %do;
lincon X_&i._&j = &&C_&i._&j;
%end;
%end;
%end;
run;
%put &_ORCLP_;
%mend solve;
%solve;
 
%macro printSol;
data final (keep= A1 A2 A3 A4 A5 A6 A7 A8 A9);
set outdata;
array A{9};
%do i = 1 %to 9;
%do j = 1 %to 9;
A(&j)=X_&i._&j;
%end;
output ;
%end;
run;
%mend printSol;
%printSol;

Conclusion

Every optimization person could benefit from using Constraint programming. It is a powerful tool, which can be used in hybrid approaches with Integer Programming and heuristic procedures.

Solving Sudoku puzzles using Constraint Programming in SAS Optimization was published on SAS Users.