4月 052017

Success in the retail space boils down to one simple function: the conversion of sales. However, retailers can only do this if they have stock readily available. Missing a sales opportunity due to poor stock management just won’t cut it in today’s marketplace. How can we resolve this basic problem [...]

In stock we trust was published on SAS Voices by Andrew Fowkes

4月 052017

Emma Warrillow, President of Data Insight Group, Inc., believes analysts add business value when they ask questions of the business, the data and the approach. “Don’t be an order taker,” she said.

Emma Warrillow at SAS Global Forum.

Warrillow held to her promise that attendees wouldn’t see a stitch of SAS programming code in her session Monday, April 3, at SAS Global Forum.

Not that she doesn’t believe programming skills and SAS Certifications aren’t important. She does.

Why you need communication skills

But Warrillow believes that as technology takes on more of the heavy lifting from the analysis side, communication skills, interpretation skills and storytelling skills are quickly becoming the data analyst’s magic wand.

Warrillow likened it to the centuries-old question: If a tree falls in a forest, and no one is around to hear it, did it make a sound? “If you have a great analysis, but no one gets it or takes action, was it really a great analysis?” she asked.

If you have a great analysis, but no one gets it or takes action, was it really a great analysis?
Click To Tweet

To create real business value and be the unicorn – that rare breed of marketing technologist who understands both marketing and marketing technology – analysts have to understand the business and its goals and operations.

She offered several actionable tips to help make the transition, including:

1. Never just send the spreadsheet.

Or the PowerPoint or the email. “The recipient might ignore it, get frustrated or, worse yet, misinterpret it,” she said. “Instead, communicate what you’ve seen in the analysis.”

2. Be a POET.

Warrillow is a huge fan of the work of Laura Warren of Storylytics.ca. who recommends an acronym approach to data-based storytelling and making sure every presentation offers:

  • Purpose: The purpose of this chart is to …
  • Observation: To illustrate that …
  • Explanation: What this means to us is …
  • Take-away or Transition: As a next step, we recommend …

3. Brand your work.

“Many of us suffer from a lack of profile in our organizations,” she said. “Take a lesson from public relations and brand yourselves. Just make sure you’re a brand people can trust. Have checks and balances in place to make sure your data is accurate.”

4. Don’t be an order taker.

Be consultative and remember that you are the expert when it comes to knowing how to structure the campaign modeling. It can be tough in some organizations, Warrillow admitted, but asking some questions and offering suggestions can be a great way to begin.

5. Tell the truth.

“Storytelling can be associated with big, tall tales,” she said. “You have to have stories that are compelling but also have truth and resonance.” One of her best resources is The Four Truths of the Storyteller” by Peter Gruber, which first appeared in Harvard Business Review December 2007.

6. Go higher.

Knowledge and comprehension are important, “but we need to start moving further up the chain,” Warrillow said. She used Bloom’s Taxonomy to describe the importance of making data move at the speed of business – getting people to take action by moving into application, analysis, synthesis and evaluation phases.

7. Prepare for the future.

“Don’t become the person who says, ‘I’m this kind of analyst,’” she said. “We need to explore new environments, prepare ourselves with great skills. In the short term, we’re going to need more programming skills. Over time, however, we’re going to need interpretation, communication and storytelling skills.” She encouraged attendees to answer the SAS Global Forum challenge of becoming a #LifeLearner.

For more from Warrillow, read the post, Making data personal: big data made small.

7 tips for becoming a data science unicorn was published on SAS Users.

4月 052017

In an industry full of word people, it's not uncommon to hear journalists lament, “Data, what are you doing here!?” But today, data is a tool in the newsroom, and reporters need to know how to analyze and present data to readers as part of their role in communicating information to the public. Amanda [...]

What can a BBC data journalist teach you about data visualization? was published on SAS Voices by Becky Graebe

4月 042017


1. 词向量表示

distributional representation vs. distributed representation 分布式表达(一类表示方法,基于统计含义),分散式表达(从一个高维空间X映射到一个低维空间Y) 分布假说(distributional hypothesis)为这一设想提供了 理论基础:上下文相似的词,其语义也相似.


1.1 one-hot encoding

In vector space terms, this is a vector with one 1 and a lot of zeroes

[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]

1.2 count-based




1.3 word embedding

基于神经网络的词向量表示 word2vec(2*2=4)四种训练方法

网络结构 CBOW,skip-gram

训练方法 Hierarchical Softmax,negative sampling

2. 词向量实现工具







from gensim.models import KeyedVectors

model = KeyedVectors.load_word2vec_format("wiki.en.vec")
words = []
for word in model.vocab:

print("word count: {}".format(len(words)))

print("Dimensions of word: {}".format(len(model[words[0]])))

demo_word = "car"

for similar_word in model.similar_by_word(demo_word):
    print("Word: {0}, Similarity: {1:.2f}".format(
        similar_word[0], similar_word[1]

word count: 2519370
Dimensions of word: 300
Word: cars, Similarity: 0.83
Word: automobile, Similarity: 0.72
Word: truck, Similarity: 0.71
Word: motorcar, Similarity: 0.70
Word: vehicle, Similarity: 0.70
Word: driver, Similarity: 0.69
Word: drivecar, Similarity: 0.69
Word: minivan, Similarity: 0.67
Word: roadster, Similarity: 0.67
Word: racecars, Similarity: 0.67



import os

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from tsne import bh_sne

def read_glove(glove_file):
    embeddings_index = {}
    embeddings_vector = []
    f = open("glove.6B.100d.txt", "rb")
    word_idx = 0
    for line in f:
        values = line.decode("utf-8").split()
        word = values[0]
        vector = np.asarray(values[1:], dtype="float64")
        embeddings_index[word] = word_idx
        word_idx += 1
    inv_index = {v : k for k, v in embeddings_index.items()}
    glove_embeddings = np.vstack(embeddings_vector)
    glove_norms = np.linalg.norm(glove_embeddings, axis=-1, keepdims=True)
    glove_embeddings_normed = glove_embeddings / glove_norms
    # glove_embeddings_normed.fill(0)

    return embeddings_index, glove_embeddings, glove_embeddings_normed, inv_index

def get_emb(word, embeddings_index, glove_embeddings):
    idx = embeddings_index.get(word)
    if idx is None:
        return None
        return glove_embeddings[idx]

def get_normed_emb(word, embeddings_index, glove_embeddings_normed):
    idx = embeddings_index.get(word)
    if idx is None:
        return None
        return glove_embeddings_normed[idx]

def most_similar(words, inv_index, embeddings_index, glove_embeddings, glove_embeddings_normed, topn=10):
    query_emb = 0

    if type(words) == list:
        for word in words:
            query_emb += get_emb(word, embeddings_index, glove_embeddings)
        query_emb = get_emb(words, embeddings_index, glove_embeddings)

    query_emb = query_emb / np.linalg.norm(query_emb)

    cosin = np.dot(glove_embeddings_normed, query_emb)

    idxs = np.argsort(cosin)[::-1][:topn]

    return [(inv_index[idx], cosin[idx]) for idx in idxs]

def plot_tsne(glove_embeddings_normed, inv_index, perplexity, img_file_name, word_cnt=100):
    #word_emb_tsne = TSNE(perplexity=30).fit_transform(glove_embeddings_normed[:word_cnt])
    word_emb_tsne = bh_sne(glove_embeddings_normed[:word_cnt], perplexity=perplexity)
    plt.figure(figsize=(40, 40))
    axis = plt.gca()
    plt.scatter(word_emb_tsne[:, 0], word_emb_tsne[:, 1], marker=".", s=1)
    for idx in range(word_cnt):
                     xy=(word_emb_tsne[idx, 0], word_emb_tsne[idx, 1]),
                     xytext=(0, 0), textcoords='offset points')

def main():
    glove_input_file = "glove.6B.100d.txt"
    embeddings_index, glove_embeddings, glove_embeddings_normed, inv_index = read_glove(glove_input_file)
    print(get_emb("computer", embeddings_index, glove_embeddings))
    print(most_similar("cpu", inv_index, embeddings_index, glove_embeddings, glove_embeddings_normed))
    print(most_similar(["river", "chinese"], inv_index, embeddings_index, glove_embeddings, glove_embeddings_normed))
    # plot tsne viz
    plot_tsne(glove_embeddings_normed, inv_index, 30.0, "tsne.png", word_cnt=1000)

if __name__ == "__main__":

3. papers

  1. Neural Word Embeddings as Implicit Matrix Factorization
  2. Linguistic Regularities in Sparse and Explicit Word Representation
  3. Random Walks on Context Spaces Towards an Explanation of the Mysteries of Semantic Word Embeddings
  4. word2vec Explained Deriving Mikolov et al.’s Negative Sampling Word Embedding Method
  5. Linking GloVe with word2vec
  6. Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective
  7. Hierarchical Probabilistic Neural Network Language Model
  8. Notes on Noise Contrastive Estimation and Negative Sampling
  9. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
  10. Distributed Representations of Words and Phrases and their Compositionality
  11. Efficient Estimation of Word Representations in Vector Space
  12. GloVe Global Vectors forWord Representation
  13. Neural probabilistic language models
  14. Natural language processing (almost) from scratch
  15. Learning word embeddings efficiently with noise contrastive estimation
  16. A scalable hierarchical distributed language model
  17. Three new graphical models for statistical language modelling
  18. Improving word representations via global context and multiple word prototypes
  19. A Primer on Neural Network Models for Natural Language Processing
  20. Joulin, Armand, et al. "Bag of tricks for efficient text classification." FAIR 2016
  21. P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information


wget http://nlp.stanford.edu/data/glove.6B.zip

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.zh.zip

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.zip


4. 自然语言处理应用


Natural Language Processing

Topic Classification

Topic modeling

Sentiment Analysis

Google Translate

Chatbots / dialogue systems

Natural language query understanding (Google Now, Apple Siri, Amazon Alexa)


4月 042017

Two minutes in, I knew the 2017 SAS Global Forum Technology Connection would be anything but typical or average. Maybe that’s because SAS’ Chief Technology Officer Oliver Schabenberger was running the show, and nothing he does is ever typical or average. His first surprise of the morning was his entrance. [...]

Impressive technology, surprising connections was published on SAS Voices by Marcie Montague

4月 032017

At Opening Session, SAS CEO Jim Goodnight and Alexa have a chat using the Amazon Echo and SAS Visual Analytics.

Unable to attend SAS Global Forum 2017 happening now in Orlando? We’ve got you covered! You can view live stream video from the conference, and check back here for important news from the conference, starting with the highlights from last night’s Opening Session.

While the location and record attendance made for a full house this year, CEO Jim Goodnight explained that there couldn’t be a more perfect setting to celebrate innovation than the world of Walt Disney. “Walt was a master innovator, combining art and science to create an entirely new way to make intelligent connections,” said Goodnight. “SAS is busy making another kind of intelligent connection – the kind made possible by data and analytics.”

It’s SAS’ mission to bring analytics everywhere and to make it ambient. That was exactly the motivation that drove SAS nearly four years ago when embarking on a massive undertaking known as SAS® Viya™. But SAS Viya – announced last year in Las Vegas – is more than just a fast, powerful, modernized analytics platform. Goodnight said it’s really the perfect marriage of science and art.

“Consider what would be possible if analytics could be brought into every moment and every place that data exists,” said Goodnight. “The opportunities are enormous, and like Walt Disney, it’s kind of fun to do the impossible.”

Driving an analytics economy

Executive Vice President and Chief Marketing Officer Randy Guard took the stage to update attendees on new releases available on SAS Viya and why SAS is so excited about it. And he explained the reason for SAS Viya comes from the changes being driven in the analytics marketplace. It’s what Guard referred to as an analytics economy – where the maturity of algorithms and techniques progress rapidly. “This is a place where disruption is normal, a place where you want to be the disruptor; you want to be the innovator,” said Guard. That’s exactly what you can achieve with SAS Viya.

As if SAS Viya didn’t leave enough of an impression, Guard took it one step further by inviting Goodnight back on stage to give users a preview into the newest innovation SAS has been cooking up. Using the Amazon Echo Dot – better known as Alexa – Goodnight put cognitive computing into action as he called up annual sales, forecasts and customer satisfaction reports in SAS® Visual Analytics.

Though still in its infant stages of development, the demo was just another reminder that when it comes to analytics, SAS never stops thinking of the next great thing.

AI: The illusion of intelligence

On his Segway, Executive Vice President and Chief Technology Officer Oliver Schabenberger talks AI at the SAS Global Forum Opening Session.

With his Segway Mini, Executive Vice President and Chief Technology Officer Oliver Schabenberger rolled on stage, fully trusting that his “smart legs” wouldn’t drive him off and into the audience. “I’ve accepted that algorithms and software have intelligence; I’ve accepted that they make decisions for us, but we still have choices,” said Schabenberger.

Diving into artificial intelligence, he explained that today’s algorithms operate with super-human abilities – they are reliable, repeatable and work around the clock without fatigue – yet they don’t behave like humans. And while the “AI” label is becoming trendy, true systems deserving of the AI title have two distinct things in common: they belong to the class of weak AI systems and they tend to be based on deep learning.

So, why are those distinctions important? Schabenberger explained that a weak AI system is trained to do one task only – the system driving an autonomous vehicle cannot operate the lighting in your home.

“SAS is very much engaged in weak AI, building cognitive systems into our software,” he said. “We are embedding learning and gamification into solutions and you can apply deep learning to text, images and time series.” Those cognitive systems are built into SAS Viya. And while they are powerful and great when they work, Schabenberger begged the question of whether or not they are truly intelligent.

Think about it. True intelligence requires some form of creativity, innovation and independent problem solving. The reality is, that today’s algorithms and software, no matter how smart, are being used as decision support systems to augment our own capabilities and make us better.

But it’s uncomfortable to think about fully trusting technology to make decisions on our behalf. “We make decisions based on reason, we use gut feeling and make split-second judgment calls based on incomplete information,” said Schabenberger. “How well do we expect machines to perform [in our place]when we let them loose and how quickly do we expect them to learn on the job?”

It’s those kinds of questions that prove that all we can handle today is the illusion of intelligence. “We want to get tricked by the machine in a clever way,” said Schabenberger. “The rest is just hype.”

Creating tomorrow‘s analytics leaders

With a room full of analytics leaders, Vice President of Sales Emily Baranello asked attendees to consider where the future leaders of analytics will come from. If you ask SAS, talent will be pulled from universities globally that have partnered with SAS to create 200 types of programs that teach today’s students how to work in SAS software. The commitment level to train up future leaders is evident and can be seen in SAS certifications, joint certificate programs and SAS’ track toward nearly 1 million downloads of SAS® Analytics U.

“SAS talent is continuing to building in the marketplace,” said Baranello. “Our goal is to bring analytics everywhere and we will continue to partner with universities to ready those students to be your successful employees.”

Using data for good

More than just analytics and technology, SAS’ brand is a representation of people who make the world a better place. Knowing that, SAS announced the development of GatherIQ – a customized crowdsourcing app that will begin with two International Organization for Migration (IMO) projects. One project will specifically focus on global migration, using data to keep migrants safe as they search for a better life. With GatherIQ, changing the world might be as easy as opening an app.

There's much more to come, so stay tuned to SAS blogs this week for the latest updates from SAS Global Forum!

SAS Viya, AI star at SAS Global Forum Opening Session was published on SAS Users.

4月 032017

What is SAS Global Forum if it isn't a conference that celebrates the ways that individuals can make a difference with data and analytics? Indeed, one of my favorite tweets from last night's opening session said: If the keynote sessions were just video biographies about how data people matter I would [...]

Data-for-good takes center stage at SAS Global Forum was published on SAS Voices by Alison Bolen

4月 032017

Operators of transmission networks and wholesale electric markets – ISOs (Independent System Operator) and RTOs (Regional Transmission Organizations) – have undergone sweeping changes in recent years, and the pace won’t be letting up anytime soon. With opportunities ranging from the growth of renewables to newly data-rich operating environments, and challenges [...]

Promise and uncertainty in the ISO marketplace: Four things you should know was published on SAS Voices by Mike F. Smith