SAS Visual Statistics

7月 212021
 

In my new book, I explain how segmentation and clustering can be accomplished in three ways: coding in SAS, point-and-click in SAS Visual Statistics, and point-and-click in SAS Visual Data Mining and Machine Learning using SAS Model Studio. These three analytical tools allow you to do many diverse types of segmentation, and one of the most common methods is clustering. Clustering is still among the top 10 machine learning methods used based on several surveys across the globe.

One of the best methods for learning about your customers, patrons, clients, or patients (or simply observations in almost any data set) is to perform clustering to find clusters that have similar within-cluster characteristics and each cluster has differing combinations of attributes. You can use this method to aid in understanding your customers or profile various data sets. This can be done in an environment where SAS and open-source software work in a unified platform seamlessly. (While open source is not discussed in my book, stay tuned for future blog posts where I will discuss more fun and exciting things that should be of interest to you for clustering and segmentation.)

Let’s look at an example of clustering. The importance of looking at one’s data quickly and easily is a real benefit when using SAS Visual Statistics.

Initial data exploration and preparation

To demonstrate the simplicity of clustering in SAS Visual Statistics, the data set CUSTOMERS is used here and also throughout the book. I have loaded the CUSTOMERS data set into memory, and it is now listed in the active tab. I can easily explore and visualize this data by right-mouse-clicking and selecting Actions and then Explore and Visualize. This will take you to the SAS Visual Analytics page.

I have added four new compute items by taking the natural logarithm of four attributes and will use these newly transformed attributes in a clustering.

Performing simple clustering

Clustering in SAS Visual Statistics can be found by selecting the Objects icon on the left and scrolling down to see the SAS Visual Statistics menus as seen below. Dragging the Cluster icon onto the Report template area will allow you to use that statistic object and visualize the clusters.

Once the Cluster object is on the template, adding data items to the Data Roles is simple by checking the four computed data items.

Click the OK icon, and immediately the four data items that are being clustered will look like the report below where five clusters were found using the four data items.

There are 105,456 total observations in the data set, however, only 89,998 were used for the analysis. Some observations were not used due to the natural logarithm not being able to be computed. To see how to handle that situation easily, please pick up a copy of Segmentation Analytics with SAS Viya. Let me know if you have any questions or comments.

 

 

Clustering made simple was published on SAS Users.

12月 062017
 

Are you struggling to kick start your organization’s analytics journey, especially when it comes to leveraging advanced analytics and machine learning techniques? If the answer is yes then you’re definitely not alone. Whilst most organisations today recognise the benefit of analytics and data science, many are still struggling to kick [...]

4 Reasons to Kick Start your Analytics Journey with SAS Visual Analytics was published on SAS Voices by Felix Liao

1月 262017
 

SAS® Viya™ 3.1 represents the third generation of high performance computing from SAS. Our journey started a long time ago and, along the way, we have introduced a number of high performance technologies into the SAS software platform:

Introducing Cloud Analytic Services (CAS)

SAS Viya introduces Cloud Analytic Services (CAS) and continues this story of high performance computing.  CAS is the runtime engine and microservices environment for data management and analytics in SAS Viya and introduces some new and interesting innovations for customers. CAS is an in-memory technology and is designed for scale and speed. Whilst it can be set up on a single machine, it is more commonly deployed across a number of nodes in a cluster of computers for massively parallel processing (MPP). The parallelism is further increased when we consider using all the cores within each node of the cluster for multi-threaded, analytic workload execution. In a MPP environment, just because there are a number of nodes, it doesn’t mean that using all of them is always the most efficient for analytic processing. CAS maintains node-to-node communication in the cluster and uses an internal algorithm to determine the optimal distribution and number of nodes to run a given process.

However, processing in-memory can be expensive, so what happens if your data doesn’t fit into memory? Well CAS, has that covered. CAS will automatically spill data to disk in such a way that only the data that are required for processing are loaded into the memory of the system. The rest of the data are memory-mapped to the filesystem in an efficient way for loading into memory when required. This way of working means that CAS can handle data that are larger than the available memory that has been assigned.

The CAS in-memory engine is made up of a number of components - namely the CAS controller and, in an MPP distributed environment, CAS worker nodes. Depending on your deployment architecture and data sources, data can be read into CAS either in serial or parallel.

What about resilience to data loss if a node in an MPP cluster becomes unavailable? Well CAS has that covered too. CAS maintains a replicate of the data within the environment. The number of replicates can be configured but the default is to maintain one extra copy of the data within the environment. This is done efficiently by having the replicate data blocks cached to disk as opposed to consuming resident memory.

One of the most interesting developments with the introduction of CAS is the way that an end user can interact with SAS Viya. CAS actions are a new programming construct and with CAS, if you are a Python, Java, SAS or Lua developer you can communicate with CAS using an interactive computing environment such as a Jupyter Notebook. One of the benefits of this is that a Python developer, for example, can utilize SAS analytics on a high performance, in-memory distributed architecture, all from their Python programming interface. In addition, we have introduced open REST APIs which means you can call native CAS actions and submit code to the CAS server directly from a Web application or other programs written in any language that supports REST.

Whilst CAS represents the most recent step in our high performance journey, SAS Viya does not replace SAS 9. These two platforms can co-exist, even on the same hardware, and indeed can communicate with one another to leverage the full range of technology and innovations from SAS. To find out more about CAS, take a look at the early preview trial. Or, if you would like to explore the capabilities of SAS Viya with respect to your current environment and business objectives speak to your local SAS representative about arranging a ‘Path to SAS Viya workshop’ with SAS.

Many thanks to Fiona McNeill, Mark Schneider and Larry LaRusso for their input and review of this article.

 

tags: global te, Global Technology Practice, high-performance analytics, SAS Grid Manager, SAS Visual Analytics, SAS Visual Statistics, SAS Viya

A journey of SAS high performance was published on SAS Users.

12月 122014
 

In case you haven’t read about SAS Visual Statistics, let’s start with a quick overview.

  • First, it’s an add-on to SAS Visual Analytics.
  • Second, it’s a web-based solution with an interactive, drag-and-drop interface that helps you rapidly build descriptive and predictive models.
  • Lastly, SAS Visual Analytics and SAS Visual Statistics are a powerful duo, supporting a logical flow of analysis from exploration tasks to modeling tasks.

Because of their ability to access shared data stored in SAS LASR Analytic Server, working with data in SAS Visual Analytics and SAS Visual Statistics is streamlined. To model your data in SAS Visual Statistics, it’s a recommended practice to start your analysis in SAS Visual Analytics Explorer. At first glance, this step may seem odd. But remember, in the usual course of predictive analytics, you’ll want to handle tasks such as investigating the distribution of the different variables, understanding relationships among variables or handling data manipulations before you do any actual modeling.

For one-step exploratory modeling, I find it simpler to start with the data from particular visualization types that you have already created in SAS Visual Analytics Explorer. Once you switch to a view of that data in SAS Visual Statistics, you can continue to refine the baseline model, add more variables, evaluate the model’s fitness and perform model comparisons.

Let’s look at some simple examples of how this process works using just a few visualizations that were created in SAS Visual Analytics then modeled in SAS Visual Statistics.

Scatter Plot

Figure 1 is a scatter plot visualization created in SAS Visual Analytics Explorer. To model the data from the scatter plot to SAS Visual Statistics, right-click on the scatter plot, or use the drop-down list to select Extended Features => Model Responses in SAS Visual Statistics. The variable on the y axis will be assigned to the response role.

VA2VS1

Figure 2 shows the initial linear model displayed in SAS Visual Statistics. One thing that is important to note is that you must start with a scatter plot, not a heat map. When using SAS Visual Analytics Explorer with large volumes of data, it will create heat maps rather than scatter plots, so just be sure that the visualization type is a scatter plot.

VA2VS2

Box Plot

Figure 3 is a box plot visualization created in SAS Visual Analytics Explorer. The box plot has income_group on the x axis, which has eight distinct levels or values, and house_income and age on the y axis. To model the data from the box plot, right-click on the box plot, or use the drop-down list to select Extended Features => Model Responses in SAS Visual Statistics.

In SAS Visual Statistics, the category variable on the x axis (income_group) will be assigned to the response role, and house_income and age will be assigned as continuous effects. When modeling data where the response variable that has more than two levels, you are prompted to select an event level to model.

VA2VS3

Figure 4 shows the initial logistic model displayed in SAS Visual Statistics. In this example, there are some missing values for the variable age, and the analysis will not use any observations where there are missing values for any of the model variables. Once the initial model is complete, you can improve on it by adding other effects.

VA2VS4

Correlation Matrix

Figure 5 is a correlation matrix visualization created in SAS Visual Analytics Explorer 6.4. The correlation matrix was created using several measures, making sure to select donation_amount first as that will be the response variable in SAS Visual Statistics.

To model the data from the correlation matrix, select one or more cells in the correlation matrix from the same row or column. They do not have to be contiguous cells, but they must be from the same row or column. With the cells selected, right-click on the correlation matrix, or use the drop-down list to select Extended Features => Model Responses in SAS Visual Statistics.

VA2VS5

Figure 6 shows the initial logistic model. Note: there are missing values for the variable donation_amount.

VA2VS6

Additional resources

Interested in a trial run? Visit the SAS Visual Statistics Try Before You Buy site.

You can learn more about SAS Visual Statistics from these resources:

tags: SAS Visual Analytics, SAS Visual Statistics
10月 062014
 
One of the hottest trends today in the business intelligence and analytics spaces is “self-service”. The word self-service is thrown around lightly in many situations and often carries different expectations for different people and organizations. Before we go into the details of self-service analytics it is important to have a […]