Clustering made simple
In my new book, I explain how segmentation and clustering can be accomplished in three ways: coding in SAS, point-and-click in SAS Visual Statistics, and point-and-click in SAS Visual Data Mining and Machine Learning using SAS Model Studio. These three analytical tools allow you to do many diverse types of segmentation, and one of the most common methods is clustering. Clustering is still among the top 10 machine learning methods used based on several surveys across the globe.
One of the best methods for learning about your customers, patrons, clients, or patients (or simply observations in almost any data set) is to perform clustering to find clusters that have similar within-cluster characteristics and each cluster has differing combinations of attributes. You can use this method to aid in understanding your customers or profile various data sets. This can be done in an environment where SAS and open-source software work in a unified platform seamlessly. (While open source is not discussed in my book, stay tuned for future blog posts where I will discuss more fun and exciting things that should be of interest to you for clustering and segmentation.)
Let’s look at an example of clustering. The importance of looking at one’s data quickly and easily is a real benefit when using SAS Visual Statistics.
Initial data exploration and preparation
To demonstrate the simplicity of clustering in SAS Visual Statistics, the data set CUSTOMERS is used here and also throughout the book. I have loaded the CUSTOMERS data set into memory, and it is now listed in the active tab. I can easily explore and visualize this data by right-mouse-clicking and selecting Actions and then Explore and Visualize. This will take you to the SAS Visual Analytics page.
I have added four new compute items by taking the natural logarithm of four attributes and will use these newly transformed attributes in a clustering.
Performing simple clustering
Clustering in SAS Visual Statistics can be found by selecting the Objects icon on the left and scrolling down to see the SAS Visual Statistics menus as seen below. Dragging the Cluster icon onto the Report template area will allow you to use that statistic object and visualize the clusters.
Once the Cluster object is on the template, adding data items to the Data Roles is simple by checking the four computed data items.
Click the OK icon, and immediately the four data items that are being clustered will look like the report below where five clusters were found using the four data items.
There are 105,456 total observations in the data set, however, only 89,998 were used for the analysis. Some observations were not used due to the natural logarithm not being able to be computed. To see how to handle that situation easily, please pick up a copy of Segmentation Analytics with SAS Viya. Let me know if you have any questions or comments.
Clustering made simple was published on SAS Users.