SAS Text Analytics

10月 052016
 

streamviewerSAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).

Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.

Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).

SAS ESP Twitter Adapters background

SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.

Both two adapters are publisher only and include:

  • Twitter Publisher Adapter
  • Twitter Gnip Publisher Adapter

The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.

In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities. The default access level of a Twitter account allows us to use the following methods:

  • Sample: Starts listening on random sample of all public statuses.
  • Filter: Starts consuming public statuses that match one or more filter predicates.

SAS ESP Text Analytics background

SAS ESP 4.1/4.2 provides three window types (event transformation nodes) to perform Text Analytics in real time on incoming events.

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.

text-analytics-on-twitter

Here are the SAS ESP Text Analytics features:

  • Text Category” window:
    • Content categorization or document classification into topics
    • Automatically identify or extract content that matches predefined criteria to more easily search by, report on, and model/segment by important themes
    • Relies on “.mco” binary files coming from SAS Contextual Analysis solution
  • Text Context” window:
    • Contextual extraction of named entities (people, titles, locations, dates, companies, etc.) or facts of interest
    • Relies on “.li” binary files coming from SAS Contextual Analysis solution
  • Text Sentiment” window:
    • Sentiment analysis of text coming from documents, social networks, emails, etc.
    • Classify documents and specific attributes/features as having positive, negative, or neutral/mixed tone
    • Relies on “.sam” binary files coming from SAS Sentiment Analysis solution

Binary files (“.mco”, “.li”, “.sam”) cannot be reverse engineered. The original projects in their corresponding solutions (SAS Contextual Analysis or SAS Sentiment Analysis) should be used to perform modifications on those binaries.

The ESP project

The following ESP project is aimed to:

  • Wait for events coming from Twitter in the source Twitter window (this is a source window, the only entry point for streaming events)
  • Perform basic event processing and counting
  • Perform text analytics on tweets (in the input streaming, the tweet text is injected as a single field)

text-analytics-on-twitter02

Let’s have a look at potential text analytics results.

Here is a sample of the Twitter stream that SAS ESP is able to catch (the tweet text is collected in a field called tw_Text):

text-analytics-on-twitter03

The “Text Category” window, with an associated “.mco” file, is able to classify tweets into topics/categories with a related score:

text-analytics-on-twitter04

The “Text Context” window, with an associated “.li” file, is able to extract terms and their corresponding entity (person, location, currency, etc.) from a tweet:

text-analytics-on-twitter05

The “Text Sentiment” window, with an associated “.sam” file, is able to determine a sentiment with a probability from a tweet:

text-analytics-on-twitter06

Run the Twitter adapter

In order to inject events into a running ESP model, the Twitter adapter should be started and is going to publish live tweets into the sourceTwitter window of our model.

text-analytics-on-twitter07

Here we search for tweets containing “iphone”, but you can change to any keyword you want to track (assuming people are tweeting on that keyword…).

There are many additional options: -f allows to follow specific user ids, -p allows to specify locations of interest, etc.

Consume enriched events with SAS ESP Streamviewer

SAS ESP provides a way to render events in real-time graphically. Here is an example of how to consume real-time events in a powerful dashboard.

streamviewer

Conclusion

With SAS ESP, you can bring the power of SAS Analytics into the real-time world. Performing Text Analytics (content categorization, sentiment analysis, reputation management, etc.) on the fly on text coming from tweets, documents, emails, etc. and triggering consequently some relevant actions have never been so simple and so fast.

tags: SAS Event Stream Processing, SAS Professional Services, SAS Text Analytics, twitter

How to perform real time Text Analytics on Twitter streaming data in SAS ESP was published on SAS Users.

2月 202016
 

Don’t get me wrong. I have no doubt in the capabilities of our SAS products and SAS  solutions! But I wanted to get a firsthand experience of our new solution for text analytics, SAS Contextual Analysis 14.1. And the result is very convincing! But let’s start from the beginning. Functions […]

The post SAS® Text Analytics and Text Mining in Action: Experiences From a ‘Self-Trial’ With SAS® Contextual Analysis appeared first on The Text Frontier.

2月 112016
 

This is the first of two articles looking at how to listen to what your customers are saying and act upon it – that is, how to understand the voice of the customer. Over the last few years, one of the big uses  for SAS® Text Analytics has been to […]

The post Voice of the customer analysis (Part 1) appeared first on The Text Frontier.

2月 022016
 

Is cognitive computing an application of text mining? If you have asked this question, you are not alone. In fact, lately I have heard it quite often. So what is cognitive computing, really? A cognitive computing system, as stated by Dr. John E. Kelly III, is one that has the […]

The post Cognitive Computing - Part 1 appeared first on The Text Frontier.

1月 232016
 

Hi, there! First of all, let me introduce myself, as this is my first blog. I am Simran Bagga, and three weeks ago I became the Product Manager for Text Analytics at SAS. This role might be new to me, but text analytics is not. For the past 12 years, […]

The post To data scientists and beyond! One of many applications of text analytics appeared first on The Text Frontier.

12月 232015
 

In today’s world of instant gratification, consumers want – and expect – immediate answers to their questions. Quite often, that help comes in the form of a live chat session with a customer service agent. The logs from these chats provide a unique analysis opportunity. Like a call center transcript, […]

The post Come chat with us! appeared first on The Text Frontier.

12月 012015
 

Recently, I have been thinking about how search can play more of a part in discovery and exploration with SAS Text Miner. Unsupervised text discovery usually begins with a look at the frequent or highly weighted terms in the collection, perhaps includes some edits to the synonym and stop lists, […]

The post Focusing your Text Mining with Search Queries appeared first on The Text Frontier.

11月 042015
 

Analyzing text is like a treasure hunt. It is hard to tell what you will end up with before you start digging and the things you find out can be quite unique, invaluable and in many cases full of surprises. It requires a good blend of instruments like business knowledge, […]

The post Widening the Use of Unstructured Data appeared first on The Text Frontier.

9月 012015
 

When I ask people what they know about Denmark they often mention Hans Christian Andersen. He was born in Denmark in 1805 and is one of the most adored children’s authors of all time. Many of his fairy tales are known worldwide as they have been translated into more than […]

The post Detect the expected and discover the unexpected - Text analytics in health care appeared first on The Text Frontier.

7月 142015
 

~ This article is co-authored by Biljana Belamaric Wilsey and Teresa Jade, both of whom are linguists in SAS' Text Analytics R&D. When I learned to program in Python, I was reminded that you have to tell the computer everything explicitly; it does not understand the human world of nuance […]

The post Text analytics through linguists’ eyes: When is a period not a full stop? appeared first on The Text Frontier.