012017
 

In 2011, Loughran and McDonald applied a general sentiment word list to accounting and finance topics, and this led to a high rate of misclassification. They found that about three-fourths of the negative words in the Harvard IV TagNeg dictionary of negative words are typically not negative in a financial context. For example, words like “mine”, “cancer”, “tire” or “capital” are often used to refer to a specific industry segment. These words are not predictive of the tone of documents or of financial news and simply add noise to the measurement of sentiment and attenuate its predictive value. So, it is not recommended to use any general sentiment dictionary as is.

Extracting domain-specific sentiment lexicons in the traditional way is time-consuming and often requires domain expertise. Today, I will show you how to extract a domain-specific sentiment lexicon from movie reviews through a machine learning method and construct SAS sentiment rules with the extracted lexicon to improve sentiment classification performance. I did the experiment with the help from my colleagues Meilan Ji and Teresa Jade, and our experiment with the Stanford Large Movie Review Dataset showed around 8% increase in the overall accuracy with the extracted lexicon. Our experiment also showed that the lexicon coverage and accuracy could be improved a lot with more training data.

SAS Sentiment Analysis Studio released domain-independent Taxonomy Rules for 12 languages and domain-specific Taxonomy Rules for a few languages. For English, SAS has covered 12 domains, including Automotive, Banking, Health and Life Sciences, Hospitalities, Insurance, Telecommunications, Public Policy Countries and others. If the domain of your corpus is not covered by these industry rules, your first choice is to use general rules, which sometimes lead to poor classification performance, as Loughran and McDonald found. Automatically extracting domain-specific sentiment lexicons has been studied by researchers and three methods were proposed. The first method is to create a domain-specific word list by linguistic experts or domain experts, which may be expensive or time-consuming. The second method is to derive non-English lexicons based on English lexicons and other linguistic resources such as WordNet. The last method is to leverage machine learning to learn lexicons from a domain-specific corpus. This article will show you the third method.

Because of the emergence of social media, researchers are able to relatively easily get sentiment data from the internet to do experiments. Dr. Saif Mohammad, a researcher in Computational Linguistics, National Research Council Canada, proposed a method to automatically extract sentiment lexicons from tweets. His method provided the best results in SemEval13 by leveraging emoticons in large tweets, using the PMI (pointwise mutual information) between words and tweet sentiment to define the sentiment attributes of words. It is a simple method, but quite powerful. At the ACL 2016 conference, one paper introduced how to use neural networks to learn sentiment scores, and in this paper I found the following simplified formula to calculate a sentiment score.

Given a set of tweets with their labels, the sentiment score (SS) for a word w was computed as:
SS(w) = PMI(w, pos) − PMI(w, neg), (1)

where pos represents the positive label and neg represents the negative label. PMI stands for pointwise mutual information, which is
PMI(w, pos) = log2((freq(w, pos) * N) / (freq(w) * freq(pos))), (2)

Here freq(w, pos) is the number of times the word w occurs in positive tweets, freq(w) is the total frequency of word w in the corpus, freq(pos) is the total number of words in positive tweets, and N is the total number of words in the corpus. PMI(w, neg) is calculated in a similar way. Thus, Equation 1 is equal to:
SS(w) = log2((freq(w, pos) * freq(neg)) / (freq(w, neg) * freq(pos))), (3)

The movie review data I used was downloaded from Stanford; it is a collection of 50,000 reviews from IMDB. I used 25,000 reviews in train and test datasets respectively. The constructed dataset contains an even number of positive and negative reviews. I used SAS Text Mining to parse the reviews into tokens and wrote a SAS program to calculate sentiment scores.

In my experiment, I used the train dataset to extract sentiment lexicons and the test dataset to evaluate sentiment classification performance with each sentiment score cutoff value from 0 to 2 with increment of 0.25. Data-driven learning methods frequently have an overfitting problem, and I used test data to filter out all weak-predictive words whose absolute value of sentiment scores are less than 0.75. In Figure-1, there is an obvious drop in the accuracy line plot of test data when the cutoff value is less than 0.75.

extract domain-specific sentiment lexicons

Figure-1 Sentiment Classification Accuracy by Sentiment Score Cutoff

Finally, I got a huge list of 14,397 affective words; 7,850 positive words and 6,547 negative words from movie reviews. The top 50 lexical items from each sentiment category as Figure-2 shows.

Figure-2 Sentiment Score of Top 50 Lexical Items

Now I have automatically derived the sentiment lexicon, but how accurate is this lexicon and how to evaluate the accuracy? I googled movie vocabulary and got two lists from Useful Adjectives for Describing Movies and Words for Movies & TV with 329 adjectives categorized into positive and negative. 279 adjectives have vector data in the GloVe word embedding model downloaded from http://nlp.stanford.edu/projects/glove/ and the T-SNE plot as Figure-3 shows. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus. T-SNE is a machine learning algorithm for dimensionality reduction developed by Geoffrey Hinton and Laurens van der Maaten.[1]  It is a nonlinear dimensionality reduction technique that is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. So two words are co-located or located closely in the scatter plot, if their semantic meanings are close or their co-occurrence in same contexts is high. Besides semantic closeness, I also showed sentiment polarity via different colors. Red stands for negative and blue stands for positive.

Figure-3 T-SNE Plot of Movie Vocabularies

From Figure-3, I find the positive vocabulary and the negative vocabulary are clearly separated into two big clusters, with very little overlap in the plot.

Now, let me check the sentiment scores of the terms. 175 of them were included in my result and Figure-4 displays the top 50 terms of each category. I compared sentiment polarity of my result with the list, 168 of 175 terms are correctly labelled as negative or positive and the overall accuracy is 96%.

Figure-4 Sentiment Score of Top 50 Movie Terms

There are 7 polarity differences between my prediction and the list as the Table-1 shows.

Table-1 Sentiment Polarity Difference between Predictions and Actual Labels

One obvious prediction mistake is coherent. I checked the raw movie reviews that contain “coherent”, and only 25 of 103 reviews are positive. This is why its sentiment score is negative rather than positive. I went through these reviews and found most of them had a sentiment polarity reversal, such as “The Plot - Even for a Seagal film, the plot is just stupid. I mean it’s not just bad, it’s barely coherent. …” A possible solution to make the sentiment scores more accurate is to use more data or add a special manipulation for polarity reversals. I tried first method, and it did improve the accuracy significantly.

So far, I have evaluated the sentiment scores’ accuracy with public linguistic resources and next I will test the prediction effect with SAS Sentiment Analysis Studio. I ran sentiment analysis against the test data with the domain-independent sentiment rules developed by SAS and the domain-specific sentiment rules constructed by machine learning, and compared the performance of two methods. The results showed an 8% increase in the overall accuracy. Table-2 and Table-3 show the detailed information.

Test data (25,000 docs)

Table-2 Performance Comparison with Test Data

Table-3 Overall Performance Comparison with Test Data

After you get domain-specific sentiment lexicons from your corpora, only a few steps are required in SAS Sentiment Analysis Studio to construct the domain-specific sentiment rules. So, next time you are processing domain-specific text for sentiment, you may want to try this method to get a listing of terms that are positive or negative polarity to augment your SAS domain-independent model.

Detailed steps to construct domain-specific sentiment rules as follows.

Step 1. Create a new Sentiment Analysis project.
Step 2. Create intermediate entities named “Positive” and “Negative”, then put the learned lexicons to the two entities respectively.

Step 3. Besides the learned lexicons, you may add an entity named “Negation” to handle the negated expressions. You can list some negations you are familiar with, such as “not, don’t, can’t” etc.

Step 4. Create positive and negative rules in the Tonal Keyword. Add the rule “CONCEPT: _def{Positive}” to Positive tab, and the rule “CONCEPT: _def{Negative}” and “CONCEPT: _def{Negation} _def{Positive}” to Negative tab.

Step 5. Build rule-based model, and now, you can use this model to predict the sentiment of documents.

How to extract domain-specific sentiment lexicons was published on SAS Users.

252017
 

Machine learning is a type of artificial intelligence that uses algorithms to iteratively learn from data and finds hidden insights in data without being explicitly programmed where to look or how to find the answer. Here at SAS, we hear questions every day about machine learning: what it is, how it compares to [...]

12 machine learning articles to catch you up on the latest trend was published on SAS Voices by Alison Bolen

112017
 

People come from all over the world to attend this highlight of the season. It’s been a tradition for decades. Hotels book months in advance. Traffic is horrendous in the city center. The coveted tickets can cost thousands of dollars, but tens of thousands of people are lucky enough to score them. In […]

It's February. Game On! was published on SAS Voices.

112017
 

People come from all over the world to attend this highlight of the season. It’s been a tradition for decades. Hotels book months in advance. Traffic is horrendous in the city center. The coveted tickets can cost thousands of dollars, but tens of thousands of people are lucky enough to score them. In […]

It's February. Game On! was published on SAS Voices.

282017
 

Digital intelligence is a trending term in the space of digital marketing analytics that needs to be demystified. Let's begin by defining what a digital marketing analytics platform is:

Digital marketing analytics platforms are technology applications used by customer intelligence ninjas to understand and improve consumer experiences. Prospecting, acquiring, and holding on to digital-savvy customers depends on understanding their multidevice behavior, and derived insight fuels marketing optimization strategies. These platforms come in different flavors, from stand-alone niche offerings, to comprehensive end-to-end vehicles performing functions from data collection through analysis and visualization.

However, not every platform is built equally from an analytical perspective. According to Brian Hopkins, a Forrester analyst, firms that excel at using data and analytics to optimize their digital businesses will together generate $1.2 trillion per annum in revenue by 2020. And digital intelligence — the practice of continuously optimizing customer experiences with online and offline data, advanced analytics and prescriptive insights — supports every insights-driven business. Digital intelligence is the antidote to the weaknesses of analytically immature platforms, leaving the world of siloed reporting behind and maturing towards actionable, predictive marketing. Here are a couple of items to consider:

  • Today's device-crazed consumers flirt with brands across a variety of interactions during a customer life cycle. However, most organizations seem to focus on website activity in one bucket, mobile in another, and social in . . . you see where I'm going. Strategic plans often fall short in applying digital intelligence across all channels — including offline interactions like customer support or product development.
  • Powerful digital intelligence uses timely delivery of prescriptive insights to positively influence customer experiences. This requires integration of data, analytics and the systems that interact with the consumer. Yet many teams manually apply analytics and deliver analysis via endless reports and dashboards that look retroactively at past behavior — begging business leaders to question the true value and potential impact of digital analysis.

As consumer behavioral needs and preferences shifts over time, the proportion of digital to non-digital interactions is growing. With the recent release of Customer Intelligence 360, SAS has carefully considered feedback from our customers (and industry analysts) to create technology that supports a modern digital intelligence strategy in guiding an organization to:

  • Enrich your first-party customer data with user level data from web and mobile channels. It's time to graduate from aggregating data for reporting purposes to the collection and retention of granular, customer-level data. It is individual-level data that drives advanced segmentation and continuous optimization of customer interactions through personalization, targeting and recommendations.
  • Keep up with customers through machine learning, data science and advanced analytics. The increasing pace of digital customer interactions requires analytical maturity to optimize marketing and experiences. By enriching first-party customer data with infusions of web and mobile behavior, and more importantly, in the analysis-ready format for sophisticated analytics, 360 Discover invites analysts to use their favorite analytic tool and tear down the limitations of traditional web analytics.
  • Automate targeting, channel orchestration and personalization. Brands struggle with too few resources to support the manual design and data-driven design of customer experiences. Connecting first-party data that encompasses both offline and online attributes with actionable propensity scores and algorithmically-defined segments through digital channel interactions is the agenda. If that sounds mythical, check out a video example of how SAS brings this to life.

The question now is - are you ready? Learn more here of why we are so excited about enabling digital intelligence for our customers, and how this benefits testing, targeting, and optimization of customer experiences.

 

tags: Customer Engagement, customer intelligence, Customer Intelligence 360, customer journey, data science, Digital Intelligence, machine learning, marketing analytics, personalization, predictive analytics, Predictive Personalization, Prescriptive Analytics

Digital intelligence for optimizing customer engagement was published on Customer Intelligence.

092017
 

I've long been fascinated by both science and the natural world around us, inspired by the amazing Sir David Attenborough with his ever-engaging documentaries and boundless enthusiasm for nature, and also by the late, great Carl Sagan and his ground-breaking documentary series, COSMOS. The relationships between the creatures, plants and […]

Intelligent ecosystems and the intelligence of things was published on SAS Voices.

十二 062016
 

As data-driven marketers, you are now challenged by senior leaders to have a laser focus on the customer journey and optimize the path of consumer interactions with your brand. Within that journey there are three trends (or challenges) to focus on:

  • Deeply understanding your target audience to anticipate their needs and desires.
  • Meeting customers’ expectations (although aiming higher can help differentiate your brand from the pack).
  • Addressing their pain points to increase your brand's relevance.

customer journey

No matter who you chat with, or what marketing conference you recently attended, it's safe to say that the intersection of digital marketing, analytics, optimization and personalization is a popular subject of conversation. Let's review the popular buzzwords at the moment:

  • Predictive personalization
  • Data science
  • Machine learning
  • Self-learning algorithms
  • Segment of one
  • Contextual awareness
  • Real time
  • Automation
  • Artificial intelligence

It's quite possible you have encountered these words at such a high frequency, you could make a drinking game out of it.drinking-game

There’s a lot of confusion created by these terms and what they mean. For instance, there is hubbub around so-called ‘easy button’ solutions that marketing cloud companies are selling for customer analytics and data-drive personalization. In reaction to this, I set off on a personal quest to research questions like:

  1. Does every technology perform analytics and personalization equally?
    • What are the benefits and drawbacks to analytic automation?
    • What are the downstream impacts to the predictive recommendations marketers depend on for personalized interactions across channels?
    • Should I be comfortable trusting a black-box algorithm and how it impacts the facilitated experiences my brand delivers to customers and prospects?
  2. Do you need a data scientist to be successful in modern marketing?
    • Is high quality analytic talent extremely difficult to find?
    • How valid is the complaint of a data science talent shortage?
    • How do I balance the needs of my marketing organization with recent analytic technology trends?

Have I captivated your interest? If yes, check out this on-demand webcast.

It's time to dive in deep and unleash on these questions. During the video, I share the results of my investigation into these questions, and reactive viewpoints. In addition, you will be introduced to new SAS Customer Intelligence 360 technology addressing these challenges. I believe in a future where approachable technology and analytically-curious people come together to deliver intelligent customer interactions. Analytically curious people can be data scientists, citizen data scientists, statisticians, marketing analysts, digital marketers, creative super forces and more. Building teams of these individuals armed with modern customer analytics software tools will help you differentiate and compete in today's marketing ecosystem.

marketing ecosystem

 

tags: artificial intelligence, Context-aware, customer intelligence, customer journey, Data Driven Marketing, data science, digital marketing, Digital Personalization, machine learning, marketing analytics, Predictive Personalization, Real time Automation, segment of one, Self-learning algorithms

Customer analytics: Think outside the black box was published on Customer Intelligence.

十二 062016
 

As data-driven marketers, you are now challenged by senior leaders to have a laser focus on the customer journey and optimize the path of consumer interactions with your brand. Within that journey there are three trends (or challenges) to focus on:

  • Deeply understanding your target audience to anticipate their needs and desires.
  • Meeting customers’ expectations (although aiming higher can help differentiate your brand from the pack).
  • Addressing their pain points to increase your brand's relevance.

customer journey

No matter who you chat with, or what marketing conference you recently attended, it's safe to say that the intersection of digital marketing, analytics, optimization and personalization is a popular subject of conversation. Let's review the popular buzzwords at the moment:

  • Predictive personalization
  • Data science
  • Machine learning
  • Self-learning algorithms
  • Segment of one
  • Contextual awareness
  • Real time
  • Automation
  • Artificial intelligence

It's quite possible you have encountered these words at such a high frequency, you could make a drinking game out of it.drinking-game

There’s a lot of confusion created by these terms and what they mean. For instance, there is hubbub around so-called ‘easy button’ solutions that marketing cloud companies are selling for customer analytics and data-drive personalization. In reaction to this, I set off on a personal quest to research questions like:

  1. Does every technology perform analytics and personalization equally?
    • What are the benefits and drawbacks to analytic automation?
    • What are the downstream impacts to the predictive recommendations marketers depend on for personalized interactions across channels?
    • Should I be comfortable trusting a black-box algorithm and how it impacts the facilitated experiences my brand delivers to customers and prospects?
  2. Do you need a data scientist to be successful in modern marketing?
    • Is high quality analytic talent extremely difficult to find?
    • How valid is the complaint of a data science talent shortage?
    • How do I balance the needs of my marketing organization with recent analytic technology trends?

Have I captivated your interest? If yes, check out this on-demand webcast.

It's time to dive in deep and unleash on these questions. During the video, I share the results of my investigation into these questions, and reactive viewpoints. In addition, you will be introduced to new SAS Customer Intelligence 360 technology addressing these challenges. I believe in a future where approachable technology and analytically-curious people come together to deliver intelligent customer interactions. Analytically curious people can be data scientists, citizen data scientists, statisticians, marketing analysts, digital marketers, creative super forces and more. Building teams of these individuals armed with modern customer analytics software tools will help you differentiate and compete in today's marketing ecosystem.

marketing ecosystem

 

tags: artificial intelligence, Context-aware, customer intelligence, customer journey, Data Driven Marketing, data science, digital marketing, Digital Personalization, machine learning, marketing analytics, Predictive Personalization, Real time Automation, segment of one, Self-learning algorithms

Customer analytics: Think outside the black box was published on Customer Intelligence.

072016
 

Machine learning is taking a significant role in many big data initiatives today. Large retailers and consumer packaged goods (CPG) companies are using machine learning combined with predictive analytics to help them enhance consumer engagement and create more accurate demand forecasts as they expand into new sales channels like the […]

Machine learning changes the way we forecast in retail and CPG was published on SAS Voices.

152016
 

“What we are experiencing from analytics today is nothing short of a revolution,” said CEO Jim Goodnight, who spoke at Analytics Experience 2016 and set the stage for the conference’s executive panel. “Right now, my primary mission is to ensure people understand the limitless possibilities that lie before us, given […]

An analytics revolution underway: Insights from Analytics Experience 2016 was published on SAS Voices.