SAS Event Stream Processing

6月 292020
 

Companies have recognized that the key to maintaining customer loyalty and increasing engagement is to anticipate customer’s needs and desires. To that end, companies have invested heavily in AI technologies to create recommendation engines that present offers, communications, and products to fulfill those needs. Nowadays, recommendation engines can be found just about everywhere from news websites to e-commerce sites to online streaming services. However, customers frequently encounter recommendations that are inappropriate or repetitive. Just because you accidentally clicked on something 6 months ago, doesn’t mean that you should receive 100 variations of that same article/product today.

The challenge is that companies have often focused too much on building advanced AI algorithms to power their recommendation systems, but frequently miss out on rapid changes in the marketplace. Since recommendation algorithms are based heavily on historical behavior, they may fail under rapidly changing environments where new products are introduced, consumer tastes change rapidly, or market conditions deteriorate.

Recommendation engines also frequently fail to account for real-time events and context. For example, during holidays, people’s tastes can be highly seasonal. Using recommendations based on purchases made during other times of the year may have no relevance to what people want today.

Companies are also under pressure to recommend products and content that are most profitable and high value. Unfortunately, the most profitable products may not be the products most preferred by customers. This can result in conflict between marketers who want to push products and data scientists who want to create good recommendation engines. If the marketers override the recommendations, this can result in consumers losing trust in the recommendations.

To address these limitations, companies need to think about developing an approach to recommendation engines that account for the following factors:

  • Business objectives: Which products are most profitable? How do we optimize the recommendations to generate the highest revenue?
  • Context: What do we know about the customer before we deliver a recommendation? Are they at home/school/work/vacation? Did a significant life event occur such as getting married, having a baby, or buying a house?
  • Historical behavior: Analyzing past transactions to generate recommendations. This involves using AI and machine learning techniques such as collaborative filtering, market basket analysis, factorization machines to look at previous purchase history and compare them to others who purchased similar products.
  • Real-time trends: Using real-time information to address sudden changes in consumer demand. This real-time information can come from social media feeds or by analyzing real-time streaming data.

To address these different aspects of a recommendation engine, I want to walk you through an example of building a recommendation engine that incorporates these different aspects.

Scenario: A cable company would like to develop an app for subscribers that provides real-time recommendations for live TV. To improve the relevancy of the recommendations, they would like to consider several factors. Firstly, they would like to better predict if a family or child is watching at that moment and make age-appropriate recommendations. Secondly, if a show or content is extremely popular right now with other viewers (such as a breaking news event or a sports event in their area), they would like to override the default recommendation with the show that is extremely popular. This can be accomplished in 5 steps:

1. Use factorization machines to analyze historic viewing behavior and come up with personalized recommendations

Factorization machines are one of the most powerful recommendation algorithms currently available. It uses matrix factorization to project ratings on a very sparse matrix of users and products. SAS Viya provides a powerful distributed in-memory engine to train factorization machines on extremely large, sparse datasets consisting of thousands of products (or TV shows) and millions of users.

In the example below, we trained a factorization machine on set-top box viewing data. Our target variable was viewing seconds of the show. The factorization machine attempts to predict how long people will watch a program that they haven’t seen before based on the viewing habits of similar viewers. After training the factorization machine, we can generate a prediction for every program that an individual hasn’t watched before. Using this prediction, we can then rank-order all shows by the predicted viewing time from the factorization machine algorithm.

2. Build predictive models to predict who’s watching

To determine who is watching at any given time, you can use predictive models to best predict whether a child or family is watching. By collecting data on when and where users were historically watching family-friendly content, we can train a model that will predict the likelihood that a family or a child may be watching TV at that time. Using SAS Visual Data Mining and Machine Learning, users can build scalable modeling pipelines that take in historical viewing data, transform data for modeling, and build out a series of candidate models (such as a gradient boosting, neural network, random forest, etc.). After evaluating the modeling performance on a hold-out sample, the champion model can be published in production and leveraged within a decisioning flow.

3. Use SAS Event Stream Processing to capture what is popular right now

To better calculate what’s happening right now, we need a tool that can analyze real-time streaming data and act on it. SAS Event Stream Processing was designed precisely to analyze real-time streaming results before it lands in a data lake or database. Real-time tuning records from set-top boxes, mobile apps, websites, and smart TVs can be aggregated and analyzed in real-time to determine the most popular shows that are playing in real-time for a demographic, region, or genre.

4. Use SAS Intelligent Decisioning to deploy business rules

SAS Intelligent Decisioning is a solution for orchestrating real-time decisions that incorporate business rules and predictive models. It allows non-technical users to design decision flows using an easy GUI interface. After a decision flow is created, it can then be published as a REST API that can be called in real-time from edge devices (such as set-top boxes, mobile apps, Smart TVs, etc.) to receive a real-time recommendation. We can also export these decisions and embed them directly within SAS Event Stream Processing Engine. For more sophisticated users with a strong programming background, these business rules can also be coded directly in SAS Event Stream Processing without using Intelligent Decisioning.

In the example below, we can orchestrate a decision flow that determines what to recommend given certain circumstances. If the predictive model predicts that a child or family is watching, then a family-friendly recommendation will be presented. If event stream processing determines that a certain show is extremely popular right now, then it will override the baseline recommendation with the popular show. Otherwise, it will send the recommendation that was generated by the factorization machine.

5. Orchestrate the entire decisioning process using SAS Event Stream Processing

To bring this all together into a single flow that can work in real-time, we need a tool that can ingest real-time streaming data, enrich the data with all the relevant information we need to make an intelligent recommendation, aggregate real-time data, and execute the decisioning flow. This will result in a final recommendation. Event stream processing can orchestrate all these elements into a single project.

In the example below, SAS Event Stream Processing takes in real-time streaming set-top box records and then enriches it with data from the customer data warehouse. The event stream processing engine then aggregates real-time TV viewing across all devices and determines which shows are most popular right now. Then it scores the data using the predictive model to determine whether a child or family is watching. Finally, it executes the decision flow created in SAS Intelligent Decisioning to determine what the final recommendation will be. Event Stream Processing has a REST API that allows third-party applications or devices to connect to this flow and receive the requested recommendation.

Conclusion

The example above demonstrates how an organization can design sophisticated recommendation engines that incorporate not only AI algorithms, but also business rules, real-time streaming, and predictive models. This allows businesses to provide far superior recommendations than using AI algorithms alone. It allows context, real-time information, and business objectives to be incorporated when making the final recommendation. By leveraging tools like Event Stream Processing and SAS Intelligent Decisioning, business users can design, orchestrate, and operationalize the entire recommendation process. To learn more, check out these additional resources:

How to improve recommendation engines with real-time context and business rules was published on SAS Users.

7月 262019
 

In a previous post, Zero to SAS in 60 Seconds- SAS Machine Learning on SAS Analytics Cloud, I documented my experience with a SAS free trial on the SAS Analytics Cloud. Well, the engineers at SAS have been busy and created another free trial. The new trial covers SAS Event Stream Processing (ESP).

This time last year (when just starting at SAS), I only knew ESP as extrasensory perception. I'm more enlightened now. Working through this exercise introduced me to how event stream processing is a powerful and effective tool for analyzing data using machine learning and streaming analytics to uncover insights for real-time decision making. In a nutshell, you create a model, stream your data, process the results, and make timely decisions based on the results.

The trial uses SAS ESPPy, allowing you to embed an ESP project inside a Python pipeline. To see ESPPy in action take a look at this video. To learn more about ESP and IoT see this article on the SAS Communities Library. In this article I chronicle my journey through the trial while introducing key concepts and operations of ESP.

Register and get started

The process to register and initial login are identical to the machine learning article. You must have a SAS Profile to participate in the trial. The only difference is you need to follow this link to sign up for the ESP trial. Please refer to the machine learning article for detailed steps of signing up and logging in.

The use case

SAS Solar Farm in Cary

The SAS Solar Farm sits on almost 12 acres of SAS Headquarters property. There are 10,276 solar panels producing more than 3.6 million kilowatt hours annually. That’s enough power for more than 325 average sized U.S. homes.

As part of the environment management, it is important to continuously monitor the operation of the solar panels to optimize configuration parameters, detect potential equipment failure, and accurately forecast the amount of energy generated. Factors considered include panel angles, time of day, seasons, and weather patterns as the energy generated depends directly of the amount of sun available to the panels.

The ESP project in this demo is pre-loaded in the trial and is run through a Jupyter notebook. The project shows the monitoring of energy (kWh) and power (kW) generated during a specific time interval eliminating localized outlier effects and triggering alerts when there is a pre-defined difference in the energy generated between subsequent time intervals.

Solar Farm Data represented as digital art

Take two minutes and watch this video on how SAS uses SAS software to create a work of art with solar farm data.

Disclaimer: no sheep were harmed during data collection or writing of this article.

Navigating the trial

Once logged into the trial, you see the Applications screen.

ESP trial Applications screen

The Data and Team options in the left pane behave exactly as those in the machine learning trial. These sections allow you to access data and manage a multi-user system. Select the SAS Event Stream Processing icon to start a JupyterLab session.

JupyterLab home screen

I will not go into the details of JupyterLab here. The left pane contains menus, file management, and other options. The pane on the right displays three options:

Python 3 Notebook - a blank Jupyter notebook - documents that combine live, runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output
Python 3 Console - a blank Python console - code consoles enable you to run code interactively in a kernel
Text File - basic text editor - enables you to edit text files in JupyterLab

For this article we're going to follow along and interact with the pre-loaded demo Solar Farm ESP project. To locate the Jupyter notebook double click the demo directory from the left pane.

Select the demo directory from the left pane

Next select Event_Stream_Processing. Before proceeding with the demo, I'd highly suggest opening the README.ipynb file.

Contents of the README notebook

Here you will find overview and environment organization information for the trial. The trial uses SAS ESPPy for designing, testing, and deploying projects on ESP Servers.

Step through the demo

Before starting the trial, I needed a little background on event stream processing. I located the SAS ESP product documentation. I recommend referring to it for details on the ESP model, objects, and workflow.

To access the demo, double click the demo directory from the left pane. The trial comes with five pre-loaded demos. Feel free to try any/all of them. Double click on ESP Basic Project - Solar Farm.ipynb to display the Solar Farm notebook. The notebook walks you through the ESP model creation and execution. To run a command place the cursor in a command cell and select the 'Run' button (triangle-shaped button at the top of the notebook). If no response returns when running the cell block, assume the commands ran successfully.

Below is a brief description of the steps in the project:

  1. Create the project and query used - this creates dedicated space and objects where the ESP process takes place
  2. Create input and aggregate windows - this action extracts desired data and creates data subsets from the stream
  3. Add a join window - this brings together lag and current values into the project
  4. Add a compute window - this calculates the difference between the previous and current event
  5. Add a filter window - this action filters occurrences outside a threshold value; this creates an alert for potential mechanical issues
  6. Define workflow connections - this defines the workflow between the various windows in the project
  7. Save the project - this generates an XML file for the project
  8. Load the project to the ESP Server - this loads the project and produces a graphical representation of the workflow

    Solar Farm project workflow

  9. Start streaming data - in this example, rather than streaming data in real time, the stream derives from the solar farm table data
  10. View solar farm data - this creates a graphical representation of streaming data

    Solar Farm graph for kW and kWh

While not included in the demo, the streaming data would pass through the filter and if a threshold breach occurs, an alert is created. Considering the graph above, alerts could very well have occurred just before 1:15 pm (IntkW drops from 185 to 150) and just before 2:30 pm (IntkW drops from 125 to 35).

Your turn

Now that you have a taste of ESP, feel free to step through the rest of the demos. You may also load your own data and create your own ESP models. Feel free to share your experience and what you create by leaving a comment.

SAS Event Stream Processing on SAS Analytics Cloud - my journey was published on SAS Users.

12月 152017
 

What is blockchain and how can you analyze data in a blockchain? This article will discuss various forms of blockchain analytics from a tactical or heuristic perspective. I’ll explain how SAS technologies can provide advanced analytics for operational, value/asset and regulatory viewpoints in the diverse world of open source blockchain [...]

A practical approach to blockchain analytics was published on SAS Voices by Sam Penfield

4月 042017
 

Two minutes in, I knew the 2017 SAS Global Forum Technology Connection would be anything but typical or average. Maybe that’s because SAS’ Chief Technology Officer Oliver Schabenberger was running the show, and nothing he does is ever typical or average. His first surprise of the morning was his entrance. [...]

Impressive technology, surprising connections was published on SAS Voices by Marcie Montague

10月 202016
 

SAS Quality Knowledge Base locales in a SAS event stream processing compute windowIn a previous blog post, I demonstrated combining the power of SAS Event Stream Processing (ESP) and the SAS Quality Knowledge Base (QKB), a key component of our SAS Data Quality offerings. In this post, I will expand on the topic and show how you can work with data from multiple QKB locales in your event stream.

To illustrate how to do this I will review an example where I have event stream data that contains North American postal codes.  I need to standardize the values appropriately depending on where they are from – United States, Canada, or Mexico – using the Postal Code Standardization definition from the appropriate QKB locale.  Note: This example assumes that the QKB for Contact Information has been installed and the license file that the DFESP_QKB_LIC environment variable points to contains a valid license for these locales.

In an ESP Compute window, I first need to initialize the call to the BlueFusion Expression Engine Language function and load the three QKB locales needed – ENUSA (English – United States), ENCAN (English – Canada), and ESMEX (Spanish – Mexico).

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window01

Next, I need to call the appropriate Postal Code QKB Standardization definition based on the country the data is coming from.  However, to do this, I first need to standardize the Country information in my streaming data; therefore, I call the Country (ISO 3-character) Standardization definition.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window02

After that is done, I do a series of if/else statements to standardize the Postal Codes using the appropriate QKB locale definition based on the Country_Standardized value computed above.  The resulting standardized Postal Code value is returned in the output field named PostalCode_STND.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window04

Here is the XML code for the SAS ESP project reviewed in this blog:

sas-quality-knowledge-base-locales-in-a-sas-event-stream-processing-compute-window05

Now that the Postal Code values for the various locales have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:

Learn more about a sustainable approach to data quality.

tags: data management, SAS Data Quality, SAS Event Stream Processing, SAS Professional Services

Using multiple SAS Quality Knowledge Base locales in a SAS Event Stream Processing compute window was published on SAS Users.

10月 052016
 

streamviewerSAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).

Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.

Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).

SAS ESP Twitter Adapters background

SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.

Both two adapters are publisher only and include:

  • Twitter Publisher Adapter
  • Twitter Gnip Publisher Adapter

The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.

In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities. The default access level of a Twitter account allows us to use the following methods:

  • Sample: Starts listening on random sample of all public statuses.
  • Filter: Starts consuming public statuses that match one or more filter predicates.

SAS ESP Text Analytics background

SAS ESP 4.1/4.2 provides three window types (event transformation nodes) to perform Text Analytics in real time on incoming events.

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.

text-analytics-on-twitter

Here are the SAS ESP Text Analytics features:

  • Text Category” window:
    • Content categorization or document classification into topics
    • Automatically identify or extract content that matches predefined criteria to more easily search by, report on, and model/segment by important themes
    • Relies on “.mco” binary files coming from SAS Contextual Analysis solution
  • Text Context” window:
    • Contextual extraction of named entities (people, titles, locations, dates, companies, etc.) or facts of interest
    • Relies on “.li” binary files coming from SAS Contextual Analysis solution
  • Text Sentiment” window:
    • Sentiment analysis of text coming from documents, social networks, emails, etc.
    • Classify documents and specific attributes/features as having positive, negative, or neutral/mixed tone
    • Relies on “.sam” binary files coming from SAS Sentiment Analysis solution

Binary files (“.mco”, “.li”, “.sam”) cannot be reverse engineered. The original projects in their corresponding solutions (SAS Contextual Analysis or SAS Sentiment Analysis) should be used to perform modifications on those binaries.

The ESP project

The following ESP project is aimed to:

  • Wait for events coming from Twitter in the source Twitter window (this is a source window, the only entry point for streaming events)
  • Perform basic event processing and counting
  • Perform text analytics on tweets (in the input streaming, the tweet text is injected as a single field)

text-analytics-on-twitter02

Let’s have a look at potential text analytics results.

Here is a sample of the Twitter stream that SAS ESP is able to catch (the tweet text is collected in a field called tw_Text):

text-analytics-on-twitter03

The “Text Category” window, with an associated “.mco” file, is able to classify tweets into topics/categories with a related score:

text-analytics-on-twitter04

The “Text Context” window, with an associated “.li” file, is able to extract terms and their corresponding entity (person, location, currency, etc.) from a tweet:

text-analytics-on-twitter05

The “Text Sentiment” window, with an associated “.sam” file, is able to determine a sentiment with a probability from a tweet:

text-analytics-on-twitter06

Run the Twitter adapter

In order to inject events into a running ESP model, the Twitter adapter should be started and is going to publish live tweets into the sourceTwitter window of our model.

text-analytics-on-twitter07

Here we search for tweets containing “iphone”, but you can change to any keyword you want to track (assuming people are tweeting on that keyword…).

There are many additional options: -f allows to follow specific user ids, -p allows to specify locations of interest, etc.

Consume enriched events with SAS ESP Streamviewer

SAS ESP provides a way to render events in real-time graphically. Here is an example of how to consume real-time events in a powerful dashboard.

streamviewer

Conclusion

With SAS ESP, you can bring the power of SAS Analytics into the real-time world. Performing Text Analytics (content categorization, sentiment analysis, reputation management, etc.) on the fly on text coming from tweets, documents, emails, etc. and triggering consequently some relevant actions have never been so simple and so fast.

tags: SAS Event Stream Processing, SAS Professional Services, SAS Text Analytics, twitter

How to perform real time Text Analytics on Twitter streaming data in SAS ESP was published on SAS Users.

9月 162016
 

The SAS Quality Knowledge Base (QKB) is a collection of files which store data and logic that define data cleansing operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching. Various SAS software products reference the QKB when performing data quality operations on your data. One of these products is SAS Event Stream Processing (ESP). SAS ESP enables programmers to build applications that quickly process and analyze streaming events. In this blog, we will look at combining the power of these two products – SAS ESP and the SAS QKB.

SAS Event Stream Processing (ESP) Studio can call definitions from the SAS Quality Knowledge Base (QKB) in its Compute window. The Compute window enables the transformation of input events into output events through computed manipulations of the input event stream fields. One of the computed manipulations that can be used is calling the QKB definitions by using the BlueFusion Expression Engine Language function.

Before QKB definitions can be used in ESP projects the QKB must be installed on the SAS ESP server. Also, two environment variables must be set: DFESP_QKB and DFESP_QKB_LIC. The environment variable DFESP_QKB should be set to the path where the QKB data was installed. The environment variable DFESP_QKB_LIC should be set to the path and filename that contains the license(s) for the QKB locale(s).

In this post, I will explore the example of calling the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale in the ESP Compute window. The Source window is reading in events that contain US State data that may or may not be standardized in the 2-character US State abbreviation.

SASEventStreamProcessing

As part of the event stream analysis I want to perform, I need the US_State values to be in a standard format. To do this I will utilize the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale.

First, I need to initialize the call to the BlueFusion Expression Engine Language function and load the ENUSA (English – United States) locale. Note: The license file that the DFESP_QKB_LIC environment variable points to must contain a license for this locale.

SASEventStreamProcessing02

Next, I need to call the QKB definition and return its result. In this case, I am calling the BlueFusion standardize function. This function expects the following inputs: Definition Name, Input Field to Standardize, Output Field for Standardized Value. In this case the Definition Name is State/Province (Abbreviation), the Input Field to Standardize is US_State, and the Output Field for the Standardized Value is result. Note: The field result was declared in the Initialize expression pictured above. This result value is returned in the output field named US_State_STND.

SASEventStreamProcessing03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

SASEventStreamProcessing04

Here is the XML code for the SAS ESP project reviewed in this blog:

SASEventStreamProcessing05

Now that the US_State values have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:
SAS Event Stream Processing
DataFlux Expression Engine Language
SAS Quality Knowledge Base

tags: DataFlux Data Management Studio, SAS Event Stream Processing, SAS Professional Services

Using SAS Quality Knowledge Base Definitions in a SAS Event Stream Processing Compute Window was published on SAS Users.

9月 292015
 
SAS has updated its data management suite of software to help data scientists and IT personnel easily identify and use the right data at the right time.
9月 292015
 
SAS has updated its data management suite of software to help data scientists and IT personnel easily identify and use the right data at the right time.