With the first debate between the two candidates behind us and the culmination of the US presidential election drawing near, who wouldn’t love to predict the winner? I don't have a crystal ball, but I do have the power of unstructured text analytics at my fingertips. With the help of […]

Presidential election quiz was published on SAS Voices.


Last week I showed how to find the nearest neighbors for a set of d-dimensional points. A SAS user wrote to ask whether something similar could be done when you have two distinct groups of points and you want to find the elements in the second group that are closest to each element in the first group.

The answer is yes, and this problem occurs frequently in applications. Typically the first group contains the locations of randomly placed individuals and the second contains the locations of resources. For each individual, you want to find the closest resources. Examples include:

  • The first group contains the positions of houses. The second group contains the locations of stores. A retailer might want to identify the stores that are closest to customers, so that shipping costs are minimized.
  • The first group contains the locations of a vehicle accidents on a highway. The second group contains the locations of hospitals who can treat the injured victims.
  • The first group contains the location of cell phone users. The second group contains the locations of cell towers.

This article solves the problem in a straightforward way: compute all the pairwise distances between the points in each group. Distances within groups are not needed, so they are not computed.

Distances between observations in two groups
Click To Tweet

Compute the pairwise distances between points

I previously showed how to compute the pairwise distance between points in different sets. Let S be a set of n d-dimensional points and let R be another set of m points. The PairwiseDist function in SAS/IML (shown below) returns an n x m matrix, D, of distances such that D[i,j] is the distance from the i_th point in S to the j_th point in R. The PairwiseNearestNbr function is the same algorithm that was used to find nearest neighbors, so I will not re-explain how the function works. Suffice it to say that it returns two matrices: a matrix of row numbers and a matrix or distances.

proc iml;
/* http://blogs.sas.com/content/iml/2013/03/27/compute-distance.html */
/* compute Euclidean distance between points in S and points in R.
   S is a n x d matrix, where each row is a point in d dimensions.
   R is a m x d matrix.
   The function returns the n x m matrix of distances, D, such that
   D[i,j] is the distance between S[i,] and R[j,].
start PairwiseDist(S, R);
   if ncol(S)^=ncol(R) then return (.);       /* different dimensions */
   n = nrow(S);  m = nrow(R);
   idx = T(repeat(1:n, m));                   /* index matrix for S   */
   jdx = shape(repeat(1:m, n), n);            /* index matrix for R   */
   diff = S[idx,] - R[jdx,];
   return( shape( sqrt(diff[,##]), n ) );     /* sqrt(sum of squares) */
/* Compute indices (row numbers) of k nearest neighbors.
   INPUT:  S    an (n x d) data matrix
           R    an (m x d) matrix of reference points
           k    specifies the number of nearest neighbors (k>=1) 
   OUTPUT: idx  an (n x k) matrix of row numbers. idx[,j] contains the
                row numbers (in R) of the j_th closest elements to S
           dist an (n x k) matrix. dist[,j] contains the distances
                between S and the j_th closest elements in R
start PairwiseNearestNbr(idx, dist, S, R, k=1);
   n = nrow(S);
   idx = j(n, k, .);
   dist = j(n, k, .);
   D = PairwiseDist(S, R);      /* n x m */
   do j = 1 to k;
      dist[,j] = D[ ,><];       /* smallest distance in each row */
      idx[,j] = D[ ,>:<];       /* column of smallest distance in each row */
      if j < k then do;         /* prepare for next closest neighbors */
         ndx = sub2ndx(dimension(D), T(1:n)||idx[,j]);
         D[ndx] = .;            /* set elements to missing */

An example of finding nearest points

To illustrate how you can use these functions, consider two sets S and R. The set R is the set of resources locations (stores, hospitals, towers,...). For this example, R contains four points that are the vertices of a square of side length 2 centered at the origin. For each point in S, the following SAS/IML statements compute the closest vertex in R and the second-closest-vertex in R:

/* R = vertices of a square */
R = {-1 -1,      /* Pt 1: lower left corner   */
      1 -1,      /* Pt 2: lower right corner  */
      1  1,      /* Pt 3: upper  right corner */
     -1  1};     /* Pt 4: upper left corner   */
S = {-0.3 -0.2,  /* points inside square */
      0.9  0.6,
     -0.5  0.8};
k = 2;          /* find nearest and second nearest points */
run PairwiseNearestNbr(idx, dist, S, R, k);
print idx[c={"Closest" "2nd Closest"} r=("S1":"S3")];
Points in a reference group that are closest to points in another group

The first column in the table shows the indices (row numbers) of the vertices that are nearest to each point of S. The second column shows the second-closest vertices. For example, Pt 1 (the lowest left corner) is the nearest vertex to the first row of S, and Pt 4 (upper left corner) is the second nearest. In a similar way, Pt 3 is the closest vertex to the second row of S, and Pt 4 is the closest vertex to the third row of S.

The PairwiseNearestNbr module also returns the dist matrix, which contains the corresponding distances. The i_th row of dist contains the distances from the i_th row of S to the two closest vertices.

Coloring points by the closest reference point

In a similar way, you can generate random points in the square and color each point in a scatter plot according to the nearest point in the reference set. The ODS GRAPHICS statement with ATTRPRIORITY=NONE forces the ODS style to cycle though colors and symbols.

ods graphics / attrpriority=NONE width=400px height=400px; /* cycle symbols and colors */
/* Generate 500 random points in square [-1,1] x [-1,1] */
call randseed(12345);
S = R // randfun({500 2}, "Uniform", -1, 1);
k = 2;
run PairwiseNearestNbr(idx, dist, S, R, k);
Corner = idx[,1];                            /* index of nearest vertex */
title "Points colored by nearest corner";
call scatter(S[,1], S[,2]) group=Corner grid={x y} procopt="aspect=1";
Random points colored according to the nearest vertex of a square

The scatter plot shows what you already knew: points in the first quadrant (green Xs) are closest to the vertex at (1,1). Points in the second quadrant (brown triangles) are closest to the vertex at (-1, 1), and so forth. Any points that are equidistant from two or more vertices (such as the origin) are assigned one of the colors arbitrarily. The points were sorted (not shown) so that the order in the legend agrees with the order of the points in R.

The call to the PairwiseNearestNbr subroutine used k=2, which requests the closest and the second closest points. The second-closest indices are located in the second column of the idx matrix. If you change the title and set Corner = idx[,2], you can create a scatter plot in which each point is colored by the second-closest vertex. That coloration is shown in the scatter plot below. See if you can understand why each marker in the following scatter plot is colored the way it is. For example, for the point in the first quadrant, their second-closest vertex is either the second vertex (1, -1) or the fourth vertex (-1, 1).

Random points colored according to the second-nearest vertex of a square

In summary, last week I showed how you can find the nearest neighbors among a single set of observations. This article solves a similar problem in which observations are split into two different groups. The functions in this article find the nearest observation in a "reference" or "resource" group for each observation in another group.

I'll conclude by mentioning that you can use these computations to compute the nearest distance between two groups of observations. Simply use k=1 and take the minimum distance that is computed by the PairwiseNearestNbr subroutine. This gives the smallest distance between any point in the first group and any point in the second group.

tags: Data Analysis

The post Distances between observations in two groups appeared first on The DO Loop.


open_source_models_using_sasWith my first open source software (OSS) experience over a decade ago, I was ecstatic. It was amazing to learn how easy it was to download the latest version on my personal computer, with no initial license fee. I was quickly able to analyse datasets using various statistical methods.

Organisations might feel similar excitement when they first employ people with predominantly open source programming skills. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. Decision makers within many organisations are now coming to realize the value of investing in both OSS and vendor provided, proprietary software. Very often, open source has been utilized widely to prototype models, whilst proprietary software, such as SAS, provides a stable platform to deploy models in real time or for batch processing, monitor  changes and update - directly in any database or on a Hadoop platform.

Industries such as pharma and finance have realised the advantages of complementing open source software usage with enterprise solutions such as SAS.

A classic example is when pharmaceutical companies conduct clinical trials, which must follow international good clinical practice (GCP) guidelines. Some pharma organisations use SAS for operational analytics, taking advantage of standardized macros and automated statistical reporting, whilst R is used for the  planning phase (i.e. simulations), for the peer-validation of the results (i.e. double programming) and for certain specific analyses.

In finance, transparency is required by ever demanding regulators, intensified after the recent financial crisis. Changing regulations, security and compliance are mitigating factors to using open source technology exclusively. Basel’s metrics such as PD, LGD and EADs computation must be properly performed. A very well-known bank in the Nordics, for example, uses open source technology to build all type of models including ensemble models, but relies on SAS’ ability to co-exist and extend open source on its platform to deploy and operationalise open source models.

Open source software and SAS working together – An example

The appetite of deriving actionable insight from data is very crucial. It is often believed that when data is thoroughly tortured, the required insight will become obvious to drive business growth. SAS and open source technology is used by various organisations to achieve maximum business opportunities and ROI on all analytics investment made.

Using the flexibility of prototyping predictive model in R and the power and stable platform of SAS to handle massive dataset, parallelize analytic workload processing, a well-known financial institution is combining both to deliver instant results from analytics and take quick actions.

How does this work?

SAS embraces and extends open source in different ways, following the complete analytics lifecycle of Data, Discovery and Deployment.


An ensemble model, built in R is used within SAS for objective comparison within SAS Enterprise Miner (Enterprise Miner is a drag and drop, workflow modelling application which is easy to use without the need to code) – including an R model within the ‘open source integration node.’


Once this model has been compared and the best model identified from automatically generated fit statistics, the model can be registered into the metadata repository making it available for usage on all SAS platform.

We used SAS Model Manager to monitor Probability of Default(PD) and Loss Given Default(LGD) model. All models are also visible to everyone within the organization depending on system rights and privileges and can be used to score and retrain new dataset when necessary. Alerts can also be set to monitor model degradation and automated message sent for real time intervention.


Once champion model was set and published, it was used in Real Time Decision Manager(RTDM) flow to score new customers coming in for loan. RTDM is a web application which allows instant assessment of new applications without the need to score the entire database.

As a result of this flexibility the bank was able to manage their workload and modernize their platform in order to make better hedging decisions and cost saving investments. Complex algorithms can now be integrated into SAS to make better predictions and manage exploding data volumes.


tags: open source, SAS Enterprise Miner

Operationalising Open Source Models Using SAS was published on SAS Users.


Discovery Summit 2016 wrapped up with an announcement of the top-rated papers and posters. The selection is based on the ratings attendees submitted in the conference app. After the last breakout presentations of the conference were finished, we exported the ratings data from the app and brought it into JMP (of course), where we ran an add-in that did the calculation.

We wanted to recognize the top conference content here in the JMP Blog and share it with the larger JMP community. So, below you will find the list with links to the top-rated content, which most of the presenters uploaded to the Discovery Summit 2016 site in the JMP User Community. Contributed papers are those by customers, while invited papers are those by SAS employees. For the first time ever, we gave awards for best student posters.

While you're at the Discovery Summit site, look around. In addition to paper and posters uploaded by presenters, you'll see full-length videos of some plenary and breakout sessions. Many presenters have shared their JMP files, in addition to their slides.

Congratulations to all!

Top 3 in Best Contributed Paper

Top 3: Best Invited Paper

Top 3: Best Poster

Best Student Posters

P.S. If you missed the conference and want a sense of what went on, visit our highlights page for photos, tweets and links to live blogs.

tags: Discovery Summit, JMP User Community

The post Best in show at Discovery Summit 2016 appeared first on JMP Blog.


“Every morning in Africa, a gazelle wakes up. It knows it must run faster than the fastest lion, or it will be killed. Every morning a lion wakes up. It knows it must outrun the slowest gazelle, or it will starve to death. It doesn't matter whether you are a […]

Our world is flat, but our employees aren’t: How to manage and hire with analytics was published on SAS Voices.


I recently read a very interesting article describing how analytics is being used to detect cheating/copying/re-use in crossword puzzle creation, in some of the major news publications. This inspired me to try my hand at creating a totally new & unique crossword puzzle ... of course using SAS software! :) My grandmother […]

The post A statistical crossword puzzle to exercise your brain appeared first on SAS Learning Post.


Editor’s note: This post is part of a series excerpted from Adele Sweetwood’s book, The Analytical Marketer: How to Transform Your Marketing Organization. Each post is a real-world case study of how to improve your customers’ experience and optimize your marketing campaigns.

When in doubt, one of the easiest things marketers can do is send an email blast. The approach is predicated on a strength-in-numbers mentality. If you send out enough messages, somebody, somewhere, will receive it and take the desired action.

While marketers still use blast messages, their value is waning. Why? You are competing for attention with your emails, website, advertisements, collateral, events and any other initiative. People are using their phones, computers, tablets and TVs to consume information. It’s harder than ever to reach, much less sway, a customer.

The challenge

By 2010, SAS marketing efforts included a blend of blasts and more personalized emails. The marketing team’s goal was to find the right mix of messages and communications methods that would anticipate customers’ needs and turn emails into a conversation with them on their journeys.

The advent of a new customer-journey approach at SAS gave us an opportunity to rethink our email strategy and see what approaches worked best at different phases of the journey.

The marketing team looked at historical data and asked some questions. For example, where along the path is thought leadership more effective than something conversationproduct-specific? And where is third-party content more compelling than internal content?

The approach

The marketing team members began assembling data on the customer journey and behavior across each phase. They found examples of customers receiving messages that were out of sync with their actual buying stage. For instance, a contact would receive messages designed for the early stages of a journey even after the deal was won (or lost).

Marketing analysts also evaluated and identified content gaps across the customer journey. Looking at the totality of interactions, it was clear that building a conversation with the customer would require an overhaul of the email marketing strategy. Here are some key takeaways from the analysis:

  • Scoring allowed the team to assign a value to all actions, not just registrations. Each interaction with SAS was tracked and added to the score. With more pervasive – and more realistic – scoring of these behaviors, the team could further analyze the relative value of different messages and offers.
  • Segmentation identified the stage of the customer journey. Once scoring was complete and applied to contacts, the team could choose which message to send based on the stage.
  • Automation provided the foundation for faster, analytics-driven communications. With segments in place, the team created targeted and relevant email communications to provide the right message at the right stage of the customer journey.
  • Analytics delivered the right business strategy based on the desired outcome. Marketing analysts could evaluate how the entire marketing mix was working to move customers through different stages.

The results

After this analysis, the team created and refined email campaigns to fit the stages of the customer journey. The content for the phases included:

  • Need. High-level messaging, including industry-specific content and thought leadership strategies. Blogs and articles at this phase explain the problem and provide a path forward.
  • Research. Content that validates the customer’s need to solve the problem. Material here focuses on specific business issues and includes third-party resources like analyst reviews and research reports.
  • Decide. Deeper content that provides more product-specific information. This material validates the proposed solution through customer success stories, research reports, product fact sheets and so on.
  • Adopt. On-board and self-service content. This stage focuses on introducing customers to support resources and online communities, as well as do-it-yourself material that introduces the customer to the solution.
  • Use. Adoption content, such as advanced educational information, user conferences, and product-specific webinars. At this stage, users turn to more technical resources to expand their knowledge.
  • Recommend. Content specific to extending the relationship with the customer. This includes speaking opportunities, focus group participation and sales references.

When customers reach the buy phase, interactions occur primarily between sales and the customer. As a result, customers are typically excluded from email communications.

Eventually, our entire online experience will be personalized as a way to best engage our customers and prospects and to help ensure we are communicating with them in a way that they prefer. How do we do this? By using customer experience analytics to track, analyze and then take action when appropriate based on behavior, instead of simply when we want to promote something. In other words, we have adopted an analytical mindset.

How SAS can help

We've created a practical ebook to modernizing a marketing organization with marketing analytics: Your guide to modernizing the marketing organization.

SAS Customer Intelligence 360 enables the delivery of contextually relevant emails, ensuring their content is personalized and timely.  Emails sent with SAS Customer Intelligence 360 are backed by segmentation, analytics and scoring behind the scenes to help ensure messaging matches the customer journey.

Whether you're just getting started or want to add new skills, we offer a variety of free tutorials and other training options: Learn SAS Customer Intelligence 360


tags: customer journey, email marketing, marketing analytics, marketing campaigns, SAS Customer Intelligence 360, segmentation, The Analytical Marketer

Moving from blasts to conversations was published on Customer Intelligence.


Recently, I was talking to a director of analytics from a large telecommunications company, and I asked her, “Do you think we have a skills shortage?” She replied, “NO, I think we’re just looking in the wrong place.” I wanted to hear more as this analytics expert may have just […]

More data journalists – not data scientists was published on SAS Voices.


Are you ready to broaden your programming skills to land a new job or be a more versatile programmer at your current job? Then this new (and free!) course might be for you. SAS Programming for R Users is a free course developed to allow you to easily transfer your […]

The post Free training course: SAS Programming for R Users appeared first on SAS Learning Post.


In JMP 11, we built interactive HTML technology into JMP to enable customers to share results. You can publish JMP results to the Web, post them to a corporate intranet or shared drive, or share them with colleagues via e-mail.

In JMP 12, we added support for Bubble Plots, Profilers, and Mobile devices.  Unlike Flash applications, the interactive HTML reports in JMP can run on iPads and similar devices.

In JMP 13, we've added support for reports created with Graph Builder.  The most frequently used features are enabled, for Points, Smoothers, Ellipses, Lines, Bars, Areas, Box Plots, Histograms, Heatmaps, Mosaic Plots, Caption Boxes, and Map Shapes. These Graph Builder elements are highlighted in the figure below.


In addition, we've received a number of feature requests from customers over the years. You know who you are ;-)  So beyond Graph Builder, we've added the following in JMP 13:

  1. Dashboard Support
  2. More Profilers
  3. Reference Ranges
  4. Value Labels
  5. Value Ordering
  6. Pinned Tooltips
  7. Hover Pictures

In this post, I'll give a high-level overview of some of these features.

Graph Builder Bar Charts

Bar charts are among the most frequently used graphs. Graph Builder provides a dozen different styles of bar charts, of which six are supported in interactive HTML.

The example below shows the same data drawn with stacked and side-by-side bars.  For this market share example, the bar sections all sum to 100%, so arguably the stacked style communicates more clearly.

Interactive HTML

Bullet charts provide a highly space-efficient presentation. These charts were developed specifically for use in dashboards. The example below shows a dashboard for a hospital interested in patients' and doctors' wait times and emergency room occupancy.


Range bars are useful for showing two values. In the stock market example below, hover tips display the high and low prices for each date.

Interactive HTML

Graph Builder supports many combinations of layouts for grouping. The example below shows diamond prices vs. carat weights, grouped by cut, in a wrapped layout.  The larger diamonds are more expensive, and the cut also matters. The ideal cut is considered to give the most brilliant sparkle to the diamond, so it generally fetches higher prices.


Bar Charts Outside Graph Builder

Implementing bar charts for Graph Builder gave us a bonus: Bar charts are now also supported in all other JMP reports. In the Partial Least Squares analysis, for example, bar charts interactively display X and Y coordinates.

Interactive HTML

Improved Dashboard Support

Dashboards combine related information in custom layouts for efficient communication. Dashboards are popular on the web, so we've improved support for custom layouts in Interactive HTML. The dashboard below uses a particularly efficient layout with tab controls to show profits per employee for different types of companies.


Besides improving layout, we can support many more kinds of dashboards in JMP 13, because we support many more kinds of graphs. This dashboard of regional air quality combines four separate Graph Builder reports. Along with Bar Charts, Heatmaps, Mosaic Plots and Map Shapes are all new graph types supported in JMP 13. My colleagues John Powell and Josh Markwordt will describe these new graphs in future blog posts.


Interactive Examples

In this blog post, I've shown static images and simple animations, but that is no substitute for interacting with the web pages themselves. All of our examples are available as Interactive HTML pages at http://www.jmp.com/jmphtml5/.

 JMP 13 HTML5 Examples

We built these examples with our colleague Michael Goff's excellent new web report generator, available in JMP 13 under the View menu "Create Web Report."

We hope our work helps you, and we look forward to your comments and suggestions!

tags: Data Visualization, Graph Builder, Interactive HTML, JMP 13

The post Interactive HTML: Graph Builder and more appeared first on JMP Blog.