4月 172018
 

If you think machine learning will replace demand planners, then don’t read this post. If you think machine learning will automate and unleash the power of insights allowing demand planners to drive more value and growth, then this article is a must read. The impressive advances in artificial intelligence (AI) and machine learning  [...]

How machine learning is disrupting demand planning was published on SAS Voices by Charlie Chase

3月 302018
 

Gradient boosting is one of the most widely used machine learning models in practice, with more and more people like to use it in Kaggle competitions. Are you interested in seeing how to use gradient boosting model for classification in SAS Visual Data Mining and Machine Learning? Here I play with the classification of Fisher’s Iris flower dataset using gradient boosting, and this may serve as a start point to those interested in trying the classification models in SAS Visual Data Mining and Machine Learning product.

Fisher’s Iris data is a well-known dataset in data mining. Per Wikipedia, Fisher developed a linear discriminant model to distinguish the species from each other by the features provided in the dataset. You may already see people run different classification models on this dataset, such as neural network. What I am interested in, is to see how well SAS gradient boosting model will do the species classification.

#1  Explore the dataset

We can easily load Fisher’s Iris dataset from SASHelp.Iris into SAS Viya. The dataset consists of 50 samples each species of Iris Setosa Virginica and Versicolor, totally 150 records with five attributes: Petal Length, Petal Width, Sepal Length, Sepal width and Iris Species. The dataset itself is already well-formed, with neither missing values, nor outliers. Take a quick look of the dataset in SAS Visual Analytics as below.

Gradient boosting

From the chart, we see that the iris species of ‘Setosa’ can be easily distinguished from the ‘Versicolor’ and ‘Virginica’ species by the length and width of their petals and sepals. However, this is not the case for the latter two species, some of them are staggered closely, which makes it a little hard to distinguish each other by these features.

#2  Prepare Data

There is not much effort needed to prepare the data for the prediction. But one thing I’d like to mention here is about the standardization of measure variables. By viewing the measure details in SAS Visual Analytics, we see that neither Petal Length distribution nor Petal Width distribution is normal. You may wonder if we need to normalize the data before applying it to the model for analysis, but this leads to one great thing I like the Gradient Boosting model. Users do not need to explicitly standardize quantitative data. Tree-base models should be robust to such problem in an input feature, since the algorithm is based on node splits. (Here is an article discussing a similar problem.)

So, here my data preparation is just doing the data partitioning before starting the classification on iris species. I need to make sure each partition will follow the same distribution on different species in the iris dataset. This can be achieved easily in SAS Visual Analytics by adding a partition data item - by setting the Sampling method to ‘Stratified sampling’ and add the ‘Iris Species’ as the column to be stratified by. I define two partitions so I have training partition, validation partition. I set 60% for training, and 40% for validation partition, with random seed 1234. Thus, a categorical data item ‘Partition’ is added, with value of 0 for validation, 1 for training partition. (For easier understanding in the charts, I’ve created a custom category called ‘Partitions’ based on the ‘Partition’ data item values.)

The charts below show that the 150 rows in Fisher’s Iris dataset are distributed equally into three species, and the created partitions are sampled with the same percentage among the three species.

#3  Train the gradient boosting model

Training various models in SAS Visual Data Mining and Machine Learning allows us to appreciate the advantages of visualization, and it’s very straight-forward for users. In ‘Objects’ tab, drag and drop the ‘Gradient Boosting’ to the canvas. Assign the ‘Iris Species’ as response variable, and ‘Petal Length, Petal Width, Sepal Length, Sepal width’ as predictors. Then set the ‘Partition’ data item for Partition ID. After that, the system will train the model and show the model assessment. I’ve taken a screenshot for ‘Virginica’ event as below.

The response variable of Iris Species has three event levels – ‘Setosa’, ‘Versicolor’ and ‘Virginica’, and we can choose desired event level to have a look of the model output. In addition, we may switch the assessment plot of Lift to ROC plot, or to Misclassification plot (Note: the misclassification plot is based on event level, thus it will show the ‘Setosa’ and ‘NOT Setosa’ species if we choose the ‘Setosa’ event.). Below is a screenshot with ROC plot and the model assessment statistics.

In practice, training models usually cost a lot of effort in tuning model parameters. SAS Visual Data Mining and Machine Learning has provided the ‘Autotune’ feature that can help this, users may decide some settings like maximum iterations, seconds, and evaluations and the product will choose the optimal values for the hyperparameters of the model. Considering that this dataset only has 150 samples, I won’t bother to do the hyperparameters tuning.

#4  Make prediction by the model

Now I can start to make predictions from the gradient boosting model for the data in testing partition. There are several ways to go here. In Visual Data Mining and Machine Learning, on the right-button mouse menu, either click the ‘Export model…’ or click the ‘Derive predicted…’ menu. The first one will export the model codes, so you can run it in SAS Studio with your data to be predicted. The latter one is very straight-forward in SAS Visual Data Mining and Machine Learning. It will pop up the ‘New Prediction Items’ page, where you may choose to get the predicted value and its probability values for all the levels of Iris Species. These data items will be added to the iris CAS table for further evaluation. Since the iris dataset has three species in the sample, I need to set ‘All levels’ so the prediction will give out the classification in three species and their probabilities.

#5  Review the prediction result

In the model assessment tab, we already see the model assessment statistics for model evaluation. We may also switch to ‘Variable Importance’ tab, or ‘Lift’ tab, ‘ROC’ tab, and ‘Misclassification’ tab to see more about the model. Here I’d like to visually compare the predicted species value with the iris species value provided in the dataset.

To show how many failures of the classification visually, I perform following actions:

  • In SAS Visual Analytics, create a list table to show all 150 rows of the iris dataset. Since there is no primary key in the dataset, the SAS Visual Analytics list table will do aggregation for measure variables by default, so be sure to set the ‘Detail data’ option in the Options tab.
  • Create a calculated item (named ‘equals’) to compare if the values of ‘Iris Species’ and ‘Predicted: Iris Species’ columns are equal: {IF ( 'Iris Species'n = 'Predicted: Iris Species'n ) RETURN 1 ELSE 0. }
  • Define a display rule with the calculated item to highlight the misclassified rows. I’ve sorted the table by above ‘equals’ value so those rows without equal value of ‘Iris Species’ and ‘Predicted : Iris Species’ columns are shown on top.

We see four rows are misclassified by the model, 3 of them are from training partition and 1 from validation partition. So far, the result looks not bad, right?

We may continue to tune the parameters of gradient boosting model easily in SAS Visual Data Mining and Machine Learning, to improve the model. For example, if I set smaller leaf size value to 2 instead of the default value of 5, the model accuracy will be improved (too good to be true?). See below screenshot for a comparison.

Of course, people may like to try tuning other parameters, or to generate more features to refine the model. Anyway, it is easy-to-use and straight-forwarded to do classification using gradient boosting model in SAS Visual Data Mining and Machine Learning. In addition, there are many other models in SAS Visual Data Mining and Machine Learning people may like to run for classification. Do you like to play with the other models for practicing?

Play with classification of Iris data using gradient boosting was published on SAS Users.

3月 102018
 

I bet many of you didn’t even know the term machine learning five years ago. But Gartner did. The Gartner Magic Quadrant for Data Science and Machine Learning Platforms, 2018 was just released, and SAS has been in the leader’s quadrant for five years straight. According to Gartner, “This Magic [...]

Gartner names data science and machine learning leaders was published on SAS Voices by David Tareen

2月 152018
 

How effective is your organization at leveraging data and analytics to power your business models?

This question is surprising hard for many organizations to answer.  Most organizations lack a roadmap against which they can measure their effectiveness for using data and analytics to optimize key business processes, uncover new business opportunities or deliver a differentiated customer experience. They do not understand what’s possible with respect to integrating big data and data science into the organization’s business model (see Figure 1).

the economic value of data

Figure 1: Big Data Business Model Maturity Index

My SAS Global Forum 2018 presentation on Tuesday April 10, 2018 will discuss the transformative potential of big data and advanced analytics, and will leverage the Big Data Business Model Maturity Index as a guide for helping organizations understand where and how they can leverage data and analytics to power their business models.

Digital Twins, Analytics Profiles and the Power of One

We all understand that the volume and variety of data are increasing exponentially.  Your customers are leaving their digital fingerprints across the Internet via their website, social media, and mobile devices usage.  The Internet of Things will unleash an estimated 44 Zettabytes of data across 7 billion connected people by 2020.

However, big data isn’t really about big; it’s about small. It’s about understanding your customer and product behaviors at the level of the individual.  Big Data is about building detailed behavioral or analytic profiles for each individual (see Figure 2).

Figure 2: Building Individual Behavioral or Analytic Profiles

If you want to better serve your customers, you need to understand their tendencies, behaviors, inclinations, preferences, interests and passions at the level of each individual customer.

Customers’ expectations of their vendors are changing due to their personal experiences.  From recommending products, services, movies, music, routes and even spouses, customers are expecting their vendors to understand they well enough that these vendors can provide a hyper-personalized customer experience.

Demystifying Data Science (AI | ML | DL)

Too many organizations are spending too much time confusing too many executives on the capabilities of data science.  The concept of data science is simple; data science is about identifying the variables and metrics that might be better predictors of business and operational performance (see Figure 3).

Figure 3: A Moneyball Definition of Data Science

Whether using basic statistics, predictive analytics, data mining, machine learning, or deep learning, almost all of data science benefits are achieved from the simple formula of: Input (A) → Response (B).

Source: Andrew Ng, “What Artificial Intelligence Can and Can’t Do Right Now”

By collaborating closely with the business subject matter experts to choosing Input (A), those variables and metrics that might be better predictor of performance, the data science team can achieve more accurate, more granular, lower latency Response (B).  And the creative creation and selection of Input (A) creatively has already revolutionized many industries, and is poised to revolutionize more.

Data Monetization and the Economic Value of Data

Data is an unusual asset – it doesn’t deplete, it doesn’t wear out and it can be used across an infinite number of use cases at near zero marginal cost.  Organizations have no other assets with those unique characteristics.  And while traditional accounting methods of valuing assets works well with physical assets, account methods fall horribly – dangerously – short in properly determining the economic value of data.

Instead of using traditional accounting techniques to determine the value of the organization’s data, apply economic and data science concepts to determine the economic value of the data based upon it’s ability to optimize key business and operational processes, reduce compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated customer experience (see Figure 4).

Figure 4: Data Lake 3.0: Collaborative Value Creation Platform

The data lake, which can house both data and analytic models, is transformed from a simple data repository into a “collaborative value creation platform” that facilities the capture, refinement and sharing of the data and analytic digital assets across the enterprise.

Creating the Intelligent Enterprise

When you add up all of these concepts and advancements – Big Data, Analytic Profiles, Data Science and the Economic Value of Data – organizations are poised for digital transformation (see Figure 5).

Figure 5: Achieving Digital Transformation

And what is Digital Transformation?

Digital Transformation is application of digital capabilities to processes, products, and assets to improve efficiency, enhance customer value, manage risk, and uncover new monetization opportunities.

Looking forward to seeing you at my SAS Global Forum 2018 session and helping your organizations on its digital transformation!

Data monetization and the economic value of data was published on SAS Users.

1月 262018
 

Let’s lay down some fundamentals. In business you want to achieve the highest revenues with the best margins and the lowest costs. More specifically, in manufacturing, you want your products to be the highest quality (relative to specification) when you make the item. And you want it shipped to the [...]

How do you take your manufacturing business to the next level? was published on SAS Voices by Tim Clark

1月 192018
 

Technology is changing rapidly: autonomous vehicles, connected devices, digital transformation, the Internet of Things (IoT), machine learning, artificial intelligence (AI), automation. The list goes on. And it has only begun. I do not try to predict the future. Instead, I examine the trends in technology and look for disruptive forces [...]

Two tech trends shaping 2018 and beyond was published on SAS Voices by Oliver Schabenberger

1月 042018
 

What will 2018 unveil for the data management market? I searched expert opinions on technology trends for 2018 and matched them against my own to uncover the five major trends that I think we’ll see in data management this year: 1. Data movement becomes more important. Cloud providers have proven [...]

Data management predictions for 2018 was published on SAS Voices by Helmut Plinke

12月 192017
 

If you’ve ever used Amazon or Netflix, you’ve experienced the value of recommendation systems firsthand. These sophisticated systems identify recommendations autonomously for individual users based on past purchases and searches, as well as other behaviors. By supporting an automated cross-selling approach, they empower brands to offer additional products or services [...]

Customer Intelligence 360: The digital shapeshifter of recommendation systems was published on Customer Intelligence Blog.

9月 262017
 

In Part 1 and Part 2 of this blog posting series, we discussed: Our current viewpoints on marketing attribution and conversion journey analysis in 2017. The selection criteria of the best measurement approach. Introduced our vision on handling marketing attribution and conversion journey analysis. We would like to conclude this [...]

Algorithmic marketing attribution and conversion journey analysis [Part 3] was published on Customer Intelligence Blog.

9月 192017
 

In Part 1 of this blog posting series, we discussed our current viewpoints on marketing attribution and conversion journey analysis in 2017. We concluded on a cliffhanger, and would like to return to our question of which attribution measurement method should we ultimately focus on. As with all difficult questions [...]

Algorithmic marketing attribution and conversion journey analysis [Part 2] was published on Customer Intelligence Blog.