10月 052016
 

streamviewerSAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).

Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.

Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).

SAS ESP Twitter Adapters background

SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.

Both two adapters are publisher only and include:

  • Twitter Publisher Adapter
  • Twitter Gnip Publisher Adapter

The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.

In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities. The default access level of a Twitter account allows us to use the following methods:

  • Sample: Starts listening on random sample of all public statuses.
  • Filter: Starts consuming public statuses that match one or more filter predicates.

SAS ESP Text Analytics background

SAS ESP 4.1/4.2 provides three window types (event transformation nodes) to perform Text Analytics in real time on incoming events.

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.

text-analytics-on-twitter

Here are the SAS ESP Text Analytics features:

  • Text Category” window:
    • Content categorization or document classification into topics
    • Automatically identify or extract content that matches predefined criteria to more easily search by, report on, and model/segment by important themes
    • Relies on “.mco” binary files coming from SAS Contextual Analysis solution
  • Text Context” window:
    • Contextual extraction of named entities (people, titles, locations, dates, companies, etc.) or facts of interest
    • Relies on “.li” binary files coming from SAS Contextual Analysis solution
  • Text Sentiment” window:
    • Sentiment analysis of text coming from documents, social networks, emails, etc.
    • Classify documents and specific attributes/features as having positive, negative, or neutral/mixed tone
    • Relies on “.sam” binary files coming from SAS Sentiment Analysis solution

Binary files (“.mco”, “.li”, “.sam”) cannot be reverse engineered. The original projects in their corresponding solutions (SAS Contextual Analysis or SAS Sentiment Analysis) should be used to perform modifications on those binaries.

The ESP project

The following ESP project is aimed to:

  • Wait for events coming from Twitter in the source Twitter window (this is a source window, the only entry point for streaming events)
  • Perform basic event processing and counting
  • Perform text analytics on tweets (in the input streaming, the tweet text is injected as a single field)

text-analytics-on-twitter02

Let’s have a look at potential text analytics results.

Here is a sample of the Twitter stream that SAS ESP is able to catch (the tweet text is collected in a field called tw_Text):

text-analytics-on-twitter03

The “Text Category” window, with an associated “.mco” file, is able to classify tweets into topics/categories with a related score:

text-analytics-on-twitter04

The “Text Context” window, with an associated “.li” file, is able to extract terms and their corresponding entity (person, location, currency, etc.) from a tweet:

text-analytics-on-twitter05

The “Text Sentiment” window, with an associated “.sam” file, is able to determine a sentiment with a probability from a tweet:

text-analytics-on-twitter06

Run the Twitter adapter

In order to inject events into a running ESP model, the Twitter adapter should be started and is going to publish live tweets into the sourceTwitter window of our model.

text-analytics-on-twitter07

Here we search for tweets containing “iphone”, but you can change to any keyword you want to track (assuming people are tweeting on that keyword…).

There are many additional options: -f allows to follow specific user ids, -p allows to specify locations of interest, etc.

Consume enriched events with SAS ESP Streamviewer

SAS ESP provides a way to render events in real-time graphically. Here is an example of how to consume real-time events in a powerful dashboard.

streamviewer

Conclusion

With SAS ESP, you can bring the power of SAS Analytics into the real-time world. Performing Text Analytics (content categorization, sentiment analysis, reputation management, etc.) on the fly on text coming from tweets, documents, emails, etc. and triggering consequently some relevant actions have never been so simple and so fast.

tags: SAS Event Stream Processing, SAS Professional Services, SAS Text Analytics, twitter

How to perform real time Text Analytics on Twitter streaming data in SAS ESP was published on SAS Users.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)