6月 122009
I saw the following comment on Twitter yesterday about sentiment analysis limitations and decided it would make a good topic for a blog update:

@concannon: Can anybody explain to me why automated sentiment analysis is anything more than flaky, snake-oil BS? The technology just isn't ready yet.

I’m going make a bold statement here – automated sentiment analysis using the right methodology – is actually superior to human sentiment analysis. Bear with me and read through.

The available approaches to analyzing sentiment/satisfaction vary based on the data provided. I would categorize the approaches based on the availability of three types of data:
1. Customer feedback (free-form text) with customer ranked satisfaction (discrete value), like Amazon product reviews.
2. Customer feedback (free-form text) with manually ranked satisfaction (discrete value), where human readers subjectively score the content.
3. Customer feedback only, no ranked satisfaction, as with blog posts and comments

For the first data type, machine learning algorithms do a good job of measuring overall sentiment (say, +ve/neutral/-ve). Examples of data suitable for this approach are: survey data and product review forums. The problem is that not a lot of text is gathered this way (with a purpose in mind). Even if it is, the machine learning algorithms struggle with distinguishing positive elements from negative. It's one thing to know if a customer is dissatisfied, it is another to know about what!

Given no customer ranked satisfaction, it is possible to build a statistical model using a sample of manually ranked documents, then automatically score the remaining unranked documents. Not many companies are willing to do this. It also doesn't truly represent the customer’s opinion - just the reader’s interpretation of what the customer thinks.

For the third option, customer opinion with no ranking, you can derive sentiment from the context of the text using natural language processing or NLP. This data is most common and hence so are the approaches to analyzing it. It’s not easy, but it’s the sweet spot for gain value from the massive volumes of consumer generated text.

One widely available, cheap technology assigns an overall positive or negative sentiment based on assigning positive or negative values to individual words then summing them to get an overall sentiment rating. This approach fails in situations like the following:
"It's not bad" (two negatives that actually suggest a positive)
"I'm not going to say this sucks" (sarcasm or humor)
“The keyboard is impossibly small but the display is the best I’ve seen.” (combination)

The most recent advances in sentiment analysis technology use a combination of techniques:
(1) statistics
(2) rule-based definitions and
(3) human intervention, e.g. a final review of the machine scoring.

The results are less expensive than human-only sentiment analysis, but more consistent. Why? Because the automation adds consistency, while the human verifies the result. When put in the right workflow then it clearly increases scalability by a substantial factor.

Teragram, a division of SAS, announced the Teragram Sentiment Analysis Manager at the Text Analytics Summit early June. More to come on that!
6月 112009
I mentioned the buzz around Social Media Analysis (SMA) at the Text Analytics Summit. If we took all the speakers content and produced a tag cloud, Twitter would have the biggest 'floor space'. I don't think there was a single presentation that did NOT mention Twitter.

While doing some background research for SMA, I ran across an article entitled State of the Twittersphere, that HubSpot blogged about just this week (that's @HubSpot for the 55.5% of Twitter users that don't follow anyone). There's a lot of really great Twitter usage statistics in this report. It's amazing how many people sign up with Twitter but are very inactive (I have multiple Twitter accounts and one is definitely contributing to inactivity). I'm more interested in those users that are very active. It would be good to connect with other users who post materials similar to my own (like a document recommendation system) and Text Mining can definitely help with this. I'd also like to see something like a “users who posted materials like this, also connected with these users:" - like the recommendations you get from Amazon. Ranking the tweets of users you follow based on content would also be fabulous. Some users post about both personal and business related materials. I personally prefer not to read the personal posts (sorry y'all). Having personal tweets, or topics less interesting to me appear further down the list (if at all) would be another desirable feature...

I have a bunch of other recommendations for Twitter product management - as do many other Twitter users. How about using Text Analytics/Text Mining for managing product requirements...
6月 042009
I am back in my office after a thoroughly enjoyable time at the annual Text Analytics Summit in Boston. I have to admit I was in my element rubbing shoulders with thought leaders, end users, analysts and press.

Jim Cox and I arrived Sunday afternoon to attend two preconference presentations: "Text Analytics for Dummies" by Conference Chair, Seth Grimes of Alta Plana, and a vendor comparison presentation by Nick Patience of technology industry analyst company, 451group.

The themes dominating the conference were: sentiment analysis, social media analysis, social network analysis, voice of the customer, eDiscovery, Web search, visualization, SaaS and Cloud.

We heard keynote presentations:
“Discover and Drive Brand Activity in Social Networks” by Emmanuel Roche, Teragram and Jim Cox, SAS

“A Tale of Two Search Engines – The Evolution of Search Technology and the Role of Social Networking in Marketing” – Usama Fayyad, Open Insights

“Sentiment Analysis” – Bing Liu, University of Illinois

We also saw end user case studies, analyst and end user panels, a Text Analytics Market Report by IDC, vendor presentations and a group of very active roundtable discussions.

sentiment analysis. Key capabilities focused on product and feature level sentiment extraction. Sentiment is also considered a key component to Social Media Analysis. While many vendors play in the social media analysis space, not many vendors provide all the necessary capabilities on their own. Tracking social networks, reach, promoters, detractors, key influencers/key opinion leaders (KOL) and key themes/trends were put forth as valuable.

Voice of the customer / customer feedback continued to play a key role of text input to text analytics models that look to find key issues being reported by customers.

eDiscovery is probably the top text analytics application area at this year’s summit. Several law firms were represented and the ability to mine legal documents crucial.

Web search in relation to advertising was shown to be very powerful due to the user indication of intent. Advertising based on Web search and user behavior improves click-through ratio (CTR) by an average of 652%! Also mentioned was the mammoth effort required to tag massive volumes of rapidly changing Web content. There are numerous Web sites who employ user bases to do this for them. The new look of Web search goes far beyond providing lists of documents. Document facets, snippets, images, sentiment and more can be derived from search results.

Sue Feldman of IDC indicated the Text Analytics and Search market is moving in direct opposition to the current economic market. The analysts represented at the summit all agreed that visualization of huge volumes of text should be an area that all vendors pay more attention to. Other sentiments echoed by the analysts included the desirability of Software as a Service (SaaS) applications, and the overwhelming need (and analyst amazement) that Text Analytics vendors had not provided Cloud Computing yet.

On the whole, conference goers imparted a great amount of valuable information. I will wrap up my commentary with these overheard statements:
“Search doesn’t help you discover things you are unaware of.”
“TA technology can solve problems we don’t even know about yet.”
“Text analytics puts humanity into statistics.” (Thanks to Chris Bowman for that one!)
“The most common search on Monster is: Find me a job!” (followed by another that Blog Administrator refuses to post)
"Missing a piece of a puzzle is frustrating, can anyone spot the missing piece to my wardrobe?" [shoes]

Additional conference commentary can be found on twitter.com #textsummit. My colleague Anne Milley also summarized Day 1 and Day 2 wrote about it on our sascom voices blog.
Curt Monash, we missed you this year!
SAS and Teragram would like to thank conference goers. It was a pleasure seeing you all!
6月 022009
The last time I visited blogs.sas.com, there were a handful of interesting blogs listed down the right side of the page. Over the last few weeks, I have seen and heard about new blogs coming online but it didn't really sink in. Then today, I visited blogs.sas to find a long and growing list of bloggers. The following highlights some of the new bloggers:

  • In Other Words (by Senior Vice President and Chief Marketing Officer, Jim Davis)
    Davis' initial posts offer insight into how to retain and motivate employees and how those efforts create an environment for innovation and loyalty. Read Davis' most recent post Loyalty insurance.

  • The Business Forecasting Deal
    Michael Gilliland is a product marketing manager at SAS who focuses on business forecasting. Gilliland's blog offers practical solutions for common mistakes and bad practices. He is currently blogging about interesting activity at F2009. Visit Gilliland's forecasting blog.

  • In the Final Analysis (by executive Vice President, EMEA and Asia Pacific, SAS, Mikeal Hagstroem)
    Hagstroem is interested in optimizing business performance and uses his blog to discuss issues facing organizations today. Visit the blog for a global view into business.

You can access all blogs by SAS employees at visiting blogs.sas.com. And don't forget the blogs written by SAS users. I'm sure that we don't have a comprehensive list, but Alison Bolen does her best to keep a current list of blogs by SAS users.
5月 232009
SAS campus has this great art installation titled Frightened Deer by Richard Rothschild. (That's it in the picture below.) As you can see, this is a large art installation. I run or drive by these deer almost every day. Most days I am oblivious to their existence, but some mornings, I find myself pulling up short and sucking in my breath startled by what is about to attack me. How can I forget that these deer are there? It's easy really. They are a part of the background of my life.

Sometimes the things we are the most blind to are the things that we look at everyday. I haven't found a trick that helps me open my eyes and mind and take in my everyday surroundings. But I'm trying.

What fades into the background on support.sas.com

My team monitors the comments that come from visitors to the Website and actively solicits input from you. From your comments, I have created a list of support.sas.com elements that have faded into the background for many site visitors. The remainder of this post will introduce you to a few of the features that you might be missing.
Continue reading "What you don't see is right in front of you"
5月 212009
We're on a roll with discussion forums. We launched yet another customer-requested forum this week; it focuses on data mining and text mining. Mining is all about digging through vast amounts of data to find trends that enable creating predictive and descriptive models. The SAS Data Mining and Text Mining forum will be most helpful to those who have SAS Enterprise Miner, SAS Text Miner, and SAS Credit Scoring. However, you don't have to have these products to explore large data sets. Note: If you are unfamiliar with the data and text mining offerings from SAS, review the material provided in the Products & Solutions section of the SAS Web site.

As always, we hope that you will use this forum to share experiences, post questions and suggestions, offer solutions, and interact with other SAS data miners. Remember that you can follow the conversation in e-mail by setting a watch or in an RSS feed by subscribing to items that interest you. Instructions for both of these tasks are provided in Watching a forum.
5月 192009
I wrote a piece for eWeek about the Voice of the Customer. In it, I talk about how conversational data collected in call centers is growing faster than our ability to deal with it. Those who don't want to miss insights buried in their data, can now turn to predictive modeling (data mining and text mining) to help them perform voice analytics. Armed with these emerging technologies you can decipher key messages from all the noise and really listen to what customers are saying. Those who learn quickly can respond first (before competitors do) and can deliver better service, better products resulting in happier customers!

Component No. 3: Voice mining your own business

The best way to overcome this obstacle is to secure access to analytic experts at the same time you address any voice mining software purchases. A trained analytical expert will ensure you not only "see" insights, but actually move on them and get the value out of predictive analytic workbenches.

Where have you seen these technologies implemented? do share!
5月 102009
If you are applying these technologies today - or are considering implementing Text Analytic into your organization in the near future - we invite you to take a few moments and take a survey here.

As Manya and others have stated , interest in this field is indeed growing, however there remain many unanswered challenges for our R&D groups to pursue. With your inputs here you can help craft the direction of the next enhancements and guide future application direction. This is an opportunity for all of you out there to share your Perceptions & Plans for text analytics.

Seth Grimes' text-analytics survey will close tommorrow - May 10. He'll write up his findings on how organizations are dealing with unstructured sources and the role text mining/analytics plays as a free report, available in early June.

The survey will take you 5-10 minutes. Thanks for responding!

PS - new members are welcome to the YAHOO group on text analytics.

read about and join us here http://tech.groups.yahoo.com/group/TextAnalytics/
5月 042009
I was reading Alison's Friday Fast Links post in the sascom voices blog and was amazed at the growing list of blogs by SAS users. If you are interested in what other SAS users are doing with SAS, be sure to check out the running list of blogs that Alison maintains.

You will also find user blogs in the Bloggers Corner on sasCommunity.org.

Note: If you have a blog that includes SAS tips, ideas, and musings, leave a comment so that you can be added to the list.