text analytics

7月 202010
Back in April we hosted Text Analytics 101, the first in this year’s Applying Business Analytics Webinar Series. As many of you know, organizations today are faced with a flood of text-based content, a shortage of domain and subject-matter experts and an inability to analyze data in an automated, consistent manner. So we created this 101 session to provide practical, accessible advice about the methods and technologies that will enable you to improve efficiencies, ease staff resources and seamlessly incorporate text-based insights for better decisions. SAS’ Kathy Lange & Fiona McNeill walked 350+ attendees through a Text Analytics overview.

We had registrations from the US and 40 other countries and attendees joined us from a vast array of industries including communications, education, Federal Government, Financial Services, Health and Life Sciences, Retail & Manufacturing and State and Local Governments. Of the 350+ attendees about 40% of them responded to the post survey questions and shared some interesting feedback with us regarding the stage of text analytics adoption that they are in. From the results it looks to me like majority are in initial investigation & some more 101 sessions might just be what folks need to help understand the landscape.

Other attendees were assessing vendors, enhancing existing methods, some were reviewing technologies and several were already implementing and a couple were already in a testing phase. How about your organization? What phase are you in?
7月 102010
You would think that a in discipline focused on information retrieval, where understanding the meaning of words and phrases is critical, everyone would know the difference between a taxonomy, an ontology, a thesauri and semantics. Oddly this is not always the case.

It's not that the terms are poorly defined; the terms are extremely well defined - with fun words like entity, hyponym, or zeugma. Now you have some new words for Scrabble. I think, could be wrong, the confusion comes from the fact that information retrieval relies on each term in the acronym TOTS.

Some asides:
  • The phrase “I think, could be wrong,” is a zeugma, the noun "I" is a noun that links "could be wrong" with "I think"—the I is implied in "could be wrong."

  • The game Scrabble is an example of an entity, a separate and distinct object or concept.

  • Information Retrieval is a hyponym of the more general term Search.

Now, I will use one of my favorite guilty pleasures, tater tots, to help explain the differences between, a taxonomy, an ontology, a thesauri and semantics: aka TOTS.

I recently read an article about new, trendy gourmet tots: everything from blue cheese tots to truffle tots. Another trendy thing is foraging for wild food. Depending on where you live this might include: clams, wild mushrooms, herbs, thistle, blueberries and much more. In order to understand the definitions contained in the acronym TOTS, I will try to discover a recipe for the ultimate trendy food "Gourmet forage tater tots," and define what I'm doing as I go.

To start, how do search engines and text analytics products know what I am talking about when I type the word “tots?" Am I talking about the Tater Tots? Or an 8th grade class? What are the semantics of what I am talking about in the document?
Continue reading "TOTS: Taxonomy, Ontology, Thesauri and Semantics"

Corpus Callosum .. Where Right and Left Brain Meet

 Barry DeVille, linguistics, text, text analytics  Corpus Callosum .. Where Right and Left Brain Meet已关闭评论
6月 042010
The Corpus Callosum is a huge switching station in the middle of our brains that connects the right and left hemisphere. Without it we would not be able to reason about what we are looking at (reasoning is a left brain function while vision is in the right brain).

Similarly, in Text Analytics, the "Corpus" is the "huge switching station" that tells us the meaning of words and how to associate different forms of words to the items of interest that we are trying to extract from text.

The Wall Street Journal’s “Numbers Guy” -- Carl Bialik -- quoted Mike Calcagno, general manager of the Microsoft group that manages Word. Calgagno says "Text corpora is the lifeblood of most of our development and testing processes."

"Microsoft has licensed over one trillion words of English text in each of the past two years, and bolsters its collection with emails exchanged on its Hotmail program, with identifying details removed", according to a Microsoft spokeswoman.

SAS's own Enterprise Content Categorization maintains huge corpora in various languages. As Bialik notes in the Wall Street Journal article ("Making Every Word Count" , Sept. 12, 2008): "Without enough spoken-language data, subtleties may not emerge."

"The word 'rife' only occurs in negative contexts," says Anne O'Keeffe, a linguist at Mary Immaculate College, the University of Limerick, Ireland. "We are never rife with money," despite that affliction's appeal.

In spite of their utility, publicly-available Corpora are hard to come by and even harder to update.
The largest public collection may be the British National Corpus, which was assembled in the early 1990s. The BNC included the recorded conversations of 200 Britons. The intended American counterpart to the BNC --the American National Corpus -- is a collection of text that includes the 9/11 Commission Report and Berlitz travel guides. With only 22 million words, the ANC is small when compared to the BNC.

Copora and associated taxonomies are extremely valuable components of a robust text mining/text analytics solution. We are fortunate to have these assets available to us in support of our text mining/analytics tasks.
4月 212010

No more teachers, no more books!! Now in its second year, SAS' "Applying Business Analytics" Webinar series has proven to be a powerful resource for many people. In 2009, more than 4,500 customers and prospects participated in the nine live and on-demand webinars, representing more than 3,200 organizations around the world on topics including Analytics, Data Management & Reporting.

So, why should you pay attention to the series this year? Beginning April 21 with, Text Analytics 101, and running through November, the Applying Business Analytics Webinar series will enable you to learn from the best business, product marketing and tech experts as they highlight the value of a complete business analytics framework.

You are welcome to join the sessions live or view on demand at your leisure. The Webinars will each follow a "101" format, helping set a foundation around each of the Webinar topics, which include: Hope you can join us & I'll see you online soon!


The SGF Text Analytics Conversation — SAS Global Forum and Text Analytics

 search, sentiment analysis, sgf, text analytics, text mining, |t:  The SGF Text Analytics Conversation — SAS Global Forum and Text Analytics已关闭评论
4月 192010
WOW, just got back from the SAS Users Group meeting, SAS Global Forum 2010. What makes SAS Global Forum (SGF) so special are the people, the networking and the conversations. Because of the conversations I have with customers and the papers that are presented, SGF is very informative, and talking to customers, getting reacquainted with customers I’ve seen met previously and meeting new customers is always exciting to me. But SGF10 is going to be hard to beat.

Faisal Dosani in his Blog My Little Piece of the Internet said it best:

After 4 days of learning, listening and sharing knowledge SAS Global forum is over :-(
I have to say it was an awesome conference this time around and there were some really interesting topics.
Faisal goes on to discuss some papers he went to, thanks for the review Faisal, he also gives a good amount to coverage to SAS’ new offering, SAS Social Media Analytics.

SAS launched SAS Social Media Analytics; the new Customer Intelligence blog—Get, Grow, Keep—has great information about the launch of SAS Social Media Analytics and how it uses SAS Text Analytics, and a great amount of information on SAS Global Forum.

Social Media involves a lot of Text Analytics, one reason I was really busy talking to people, and I missed the presentations I was hoping to go see.

There is a lot to Text Analytics. For a great overview, sign up and attend Text Analytics 101 Webinar on Wednesday Apr 21st, by Kathy Lang and Fiona McNeil.

Oddly enough my paper “The Twitter Conversation – Analyzing Twitter” was a very appropriate and timely topic. In the paper I compare looking at topics on Twitter to attending a conference, like SGF. Why? Because everyone is talking about various things, from the latest paper to the best resteraunt in Seattle, but they all have one thing in common: they are mentioning things about SGF, even if it is reconnecting and meeting new people and talking about shared interests.

This year my paper really reflected what was going at the conference. I got to talk to some really great people. I am always amazed at how smart, and ingenious people are. I get to hear the problems customers are trying to solve; the things I hear that people want to do with Text Analytics, to quote Faisal again, is awesome. So just like Twitter, I am going to let you listen in on a little bit of the conversation:

Improve healthcare by looking at medical records: The analysis of what people are saying to their doctors, what the actual effects are, based on structured data. How does Sentiment Analysis play a role in medicine? What are other factors influencing healthcare and categorizing that information?

The use of search to find the proper information: How can ontology and search help my people improve their information retrieval? We have problems finding what we need and many times there is information out there that we miss, because we don’t know what to ask. Ontology to the rescue, I say.

Understanding collaborative networks within and outside the organization: The use of log files from IT systems, reports, help desk information and social media, both internal and external. It's an interesting twist and uses Content Categorization, Sentiment Analysis and Text Mining, not to mention other SAS products like Social Link Analysis.

Improving Collaboration:Discover topics of research from disjointed areas: “If I know what group of people are working on topics that could help from collaboration then bringing the people together can really help all parties...”
And this is only a fraction of what people are talking about when they refer to Text Analytics.

In my paper I talked about understanding influencers by how focused they are on the topic of interest. And these customers were not only focused on Text Analytics they were passionate about it.

I can’t wait to connect with them and have them be a part of the Text Analytics “re-tweet network”…OK, to understand that comment see the paper, The Twitter Conversation – Analyzing Twitter , my plug for Text Analytics everywhere.