text analytics

Corpus Callosum .. Where Right and Left Brain Meet

 Barry DeVille, linguistics, text, text analytics  Corpus Callosum .. Where Right and Left Brain Meet已关闭评论
6月 042010
 
The Corpus Callosum is a huge switching station in the middle of our brains that connects the right and left hemisphere. Without it we would not be able to reason about what we are looking at (reasoning is a left brain function while vision is in the right brain).

Similarly, in Text Analytics, the "Corpus" is the "huge switching station" that tells us the meaning of words and how to associate different forms of words to the items of interest that we are trying to extract from text.

The Wall Street Journal’s “Numbers Guy” -- Carl Bialik -- quoted Mike Calcagno, general manager of the Microsoft group that manages Word. Calgagno says "Text corpora is the lifeblood of most of our development and testing processes."

"Microsoft has licensed over one trillion words of English text in each of the past two years, and bolsters its collection with emails exchanged on its Hotmail program, with identifying details removed", according to a Microsoft spokeswoman.

SAS's own Enterprise Content Categorization maintains huge corpora in various languages. As Bialik notes in the Wall Street Journal article ("Making Every Word Count" , Sept. 12, 2008): "Without enough spoken-language data, subtleties may not emerge."

"The word 'rife' only occurs in negative contexts," says Anne O'Keeffe, a linguist at Mary Immaculate College, the University of Limerick, Ireland. "We are never rife with money," despite that affliction's appeal.

In spite of their utility, publicly-available Corpora are hard to come by and even harder to update.
The largest public collection may be the British National Corpus, which was assembled in the early 1990s. The BNC included the recorded conversations of 200 Britons. The intended American counterpart to the BNC --the American National Corpus -- is a collection of text that includes the 9/11 Commission Report and Berlitz travel guides. With only 22 million words, the ANC is small when compared to the BNC.

Copora and associated taxonomies are extremely valuable components of a robust text mining/text analytics solution. We are fortunate to have these assets available to us in support of our text mining/analytics tasks.
4月 212010
 

No more teachers, no more books!! Now in its second year, SAS' "Applying Business Analytics" Webinar series has proven to be a powerful resource for many people. In 2009, more than 4,500 customers and prospects participated in the nine live and on-demand webinars, representing more than 3,200 organizations around the world on topics including Analytics, Data Management & Reporting.

So, why should you pay attention to the series this year? Beginning April 21 with, Text Analytics 101, and running through November, the Applying Business Analytics Webinar series will enable you to learn from the best business, product marketing and tech experts as they highlight the value of a complete business analytics framework.

You are welcome to join the sessions live or view on demand at your leisure. The Webinars will each follow a "101" format, helping set a foundation around each of the Webinar topics, which include: Hope you can join us & I'll see you online soon!

@kristinevick

The SGF Text Analytics Conversation — SAS Global Forum and Text Analytics

 search, sentiment analysis, sgf, text analytics, text mining, |t:  The SGF Text Analytics Conversation — SAS Global Forum and Text Analytics已关闭评论
4月 192010
 
WOW, just got back from the SAS Users Group meeting, SAS Global Forum 2010. What makes SAS Global Forum (SGF) so special are the people, the networking and the conversations. Because of the conversations I have with customers and the papers that are presented, SGF is very informative, and talking to customers, getting reacquainted with customers I’ve seen met previously and meeting new customers is always exciting to me. But SGF10 is going to be hard to beat.

Faisal Dosani in his Blog My Little Piece of the Internet said it best:

After 4 days of learning, listening and sharing knowledge SAS Global forum is over :-(
I have to say it was an awesome conference this time around and there were some really interesting topics.
Faisal goes on to discuss some papers he went to, thanks for the review Faisal, he also gives a good amount to coverage to SAS’ new offering, SAS Social Media Analytics.

SAS launched SAS Social Media Analytics; the new Customer Intelligence blog—Get, Grow, Keep—has great information about the launch of SAS Social Media Analytics and how it uses SAS Text Analytics, and a great amount of information on SAS Global Forum.

Social Media involves a lot of Text Analytics, one reason I was really busy talking to people, and I missed the presentations I was hoping to go see.

There is a lot to Text Analytics. For a great overview, sign up and attend Text Analytics 101 Webinar on Wednesday Apr 21st, by Kathy Lang and Fiona McNeil.

Oddly enough my paper “The Twitter Conversation – Analyzing Twitter” was a very appropriate and timely topic. In the paper I compare looking at topics on Twitter to attending a conference, like SGF. Why? Because everyone is talking about various things, from the latest paper to the best resteraunt in Seattle, but they all have one thing in common: they are mentioning things about SGF, even if it is reconnecting and meeting new people and talking about shared interests.

This year my paper really reflected what was going at the conference. I got to talk to some really great people. I am always amazed at how smart, and ingenious people are. I get to hear the problems customers are trying to solve; the things I hear that people want to do with Text Analytics, to quote Faisal again, is awesome. So just like Twitter, I am going to let you listen in on a little bit of the conversation:

Improve healthcare by looking at medical records: The analysis of what people are saying to their doctors, what the actual effects are, based on structured data. How does Sentiment Analysis play a role in medicine? What are other factors influencing healthcare and categorizing that information?

The use of search to find the proper information: How can ontology and search help my people improve their information retrieval? We have problems finding what we need and many times there is information out there that we miss, because we don’t know what to ask. Ontology to the rescue, I say.

Understanding collaborative networks within and outside the organization: The use of log files from IT systems, reports, help desk information and social media, both internal and external. It's an interesting twist and uses Content Categorization, Sentiment Analysis and Text Mining, not to mention other SAS products like Social Link Analysis.

Improving Collaboration:Discover topics of research from disjointed areas: “If I know what group of people are working on topics that could help from collaboration then bringing the people together can really help all parties...”
And this is only a fraction of what people are talking about when they refer to Text Analytics.

In my paper I talked about understanding influencers by how focused they are on the topic of interest. And these customers were not only focused on Text Analytics they were passionate about it.

I can’t wait to connect with them and have them be a part of the Text Analytics “re-tweet network”…OK, to understand that comment see the paper, The Twitter Conversation – Analyzing Twitter , my plug for Text Analytics everywhere.