data scientist

7月 252018
 

I recently joined SAS in a brand new role: I'm a Developer Advocate.  My job is to help SAS customers who want to access the power of SAS from within other applications, or who might want to build their own applications that leverage SAS analytics.  For my first contribution, I decided to write an article about a quick task that would interest developers and that isn't already heavily documented. So was born this novice's experience in using R (and RStudio) with SAS Viya. This writing will chronicle my journey from the planning stages, all the way to running commands from RStudio on the data stored in SAS Viya. This is just the beginning; we will discuss at the end where I should go next.

Why use SAS Viya with R?

From the start, I asked myself, "What's the use case here? Why would anyone want to do this?" After a bit of research discussion with my SAS colleagues, the answer became clear.  R is a popular programming language used by data scientists, developers, and analysts – even within organizations that also use SAS.  However, R has some well-known limitations when working with big data, and our SAS customers are often challenged to combine the work of a diverse set of tools into a well-governed analytics lifecycle. Combining the developers' familiarity of R programming with the power and flexibility of SAS Viya for data storage, analytical processing, and governance, this seemed like a perfect exercise.  For this purpose of this scenario, think of SAS Viya as the platform and the Cloud Analytics Server (CAS) is where all the data is stored and processed.

How I got started with SAS Viya

I did not want to start with the task of deploying my own SAS Viya environment. This is a non-trivial activity, and not something an analyst would tackle, so the major pre-req here is you'll need access to an existing SAS Viya setup.  Fortunately for me, here at SAS we have preconfigured SAS Viya environments available on a private cloud that we can use for demos and testing.  So, SAS Viya is my server-side environment. Beyond that, a client is all I needed. I used a generic Windows machine and got busy loading some software.

What documentation did I use/follow?

I started with the official SAS documentation: SAS Scripting Wrapper for Analytics Transfer (SWAT) for R.

The Process

The first two things I installed were R and RStudio, which I found at these locations:

https://cran.r-project.org/
https://www.rstudio.com/products/rstudio/download/

The installs were uneventful, so I'll won't list all those steps here. Next, I installed a couple of pre-req R packages and attempted to install the SAS Scripting Wrapper for Analytics Transfer (SWAT) package for R. Think of SWAT as what allows R and SAS to work together. In an R command line, I entered the following commands:

> install.packages('httr')
> install.packages('jsonlite')
> install.packages('https://github.com/sassoftware/R-swat/releases/download/v1.2.1/R-swat-1.2.1-> 
  linux64.tar.gz', repos=NULL, type='file')

When attempting the last command, I hit an error:

…
ERROR: dependency 'dplyr' is not available for package 'swat'
* removing 'C:/Program Files/R/R-3.5.1/library/swat'
In R CMD INSTALL
Warning message:
In install.packages("https://github.com/sassoftware/R-swat/releases/download/v1.2.1/R-swat-1.2.1-linux64.tar.gz",  :
installation of package 'C:/Users/sas/AppData/Local/Temp/2/RtmpEXUAuC/downloaded_packages/R-swat-1.2.1-linux64.tar.gz'
  had non-zero exit status

The install failed. Based on the error message, it turns out I had forgotten to install another R package:

> install.packages("dplyr")

(This dependency is documented in the R SWAT documentation, but I missed it. Since this could happen to anyone – right? – I decided to come clean here. Perhaps you'll learn from my misstep.)

After installing the dplyr package in the R session, I reran the swat install and was happy to hit a return code of zero. Success!

For the brevity of this post, I decided to not configure an authentication file and will be required to pass user credentials when making connections. I will configure authinfo in a follow-up post.

Testing my RStudio->SAS Viya connection

From RStudio, I ran the following command to connect to the CAS server:

> library(swat)
> conn <- CAS("mycas.company.com", 8777, protocol='http', user='user', password='password')

Now that I succeeded in connecting my R client to the CAS server, I was ready to load data and start making API calls.

How did I decide on a use case?

I'm in the process of moving houses, so I decided to find a data set on property values in the area to do some basic analysis, to see if I was getting a good deal. I did a quick google search and downloaded a .csv from a local government site. At this point, I was all set up, connected, and had data. All I needed now was to run some CAS Actions from RStudio.

CAS actions are commands that you submit through RStudio to tell the CAS server to 'do' something. One or more objects are returned to the client -- for example, a collection of data frames. CAS actions are organized into action sets and are invoked via APIs. You can find

> citydata <- cas.read.csv(conn, "C:\\Users\\sas\\Downloads\\property.csv", sep=';')
NOTE: Cloud Analytic Services made the uploaded file available as table PROPERTY in caslib CASUSER(user).

What analysis did I perform?

I purposefully kept my analysis brief, as I just wanted to make sure that I could connect, run a few commands, and get results back.

My RStudio session, including all of the things I tried

Here is a brief series of CAS action commands that I ran from RStudio:

Get the mean value of a variable:

> cas.mean(citydata$TotalSaleValue)
          Column     Mean
1 TotalSaleValue 343806.5

Get the standard deviation of a variable:

> cas.sd(citydata$TotalSaleValue)
          Column      Std
1 TotalSaleValue 185992.9

Get boxplot data for a variable:

> cas.percentile.boxPlot(citydata$TotalSaleValue)
$`BoxPlot`
          Column     Q1     Q2     Q3     Mean WhiskerLo WhiskerHi Min     Max      Std    N
1 TotalSaleValue 239000 320000 418000 343806.5         0    685000   0 2318000 185992.9 5301

Get boxplot data for another variable:

> cas.percentile.boxPlot(citydata$TotalBldgSqFt)
$`BoxPlot`
         Column   Q1   Q2   Q3     Mean WhiskerLo WhiskerHi Min   Max      Std    N
1 TotalBldgSqFt 2522 2922 3492 3131.446      1072      4943 572 13801 1032.024 5301

Did I succeed?

I think so. Let's say the house I want is 3,000 square feet and costs $258,000. As you can see in the box plot data, I'm getting a good deal. The house size is in the second quartile, while the house cost falls in the first quartile. Yes, this is not the most in depth statistical analysis, but I'll get more into that in a future article.

What's next?

This activity has really sparked my interest to learn more and I will continue to expand my analysis, attempt more complex statistical procedures and create graphs. A follow up blog is already in the works. If this article has piqued your interest in the subject, I'd like to ask you: What would you like to see next? Please comment and I will turn my focus to those topics for a future post.

Using RStudio with SAS Viya was published on SAS Users.

4月 052017
 

Emma Warrillow, President of Data Insight Group, Inc., believes analysts add business value when they ask questions of the business, the data and the approach. “Don’t be an order taker,” she said.

Emma Warrillow at SAS Global Forum.

Warrillow held to her promise that attendees wouldn’t see a stitch of SAS programming code in her session Monday, April 3, at SAS Global Forum.

Not that she doesn’t believe programming skills and SAS Certifications aren’t important. She does.

Why you need communication skills

But Warrillow believes that as technology takes on more of the heavy lifting from the analysis side, communication skills, interpretation skills and storytelling skills are quickly becoming the data analyst’s magic wand.

Warrillow likened it to the centuries-old question: If a tree falls in a forest, and no one is around to hear it, did it make a sound? “If you have a great analysis, but no one gets it or takes action, was it really a great analysis?” she asked.


If you have a great analysis, but no one gets it or takes action, was it really a great analysis?
Click To Tweet


To create real business value and be the unicorn – that rare breed of marketing technologist who understands both marketing and marketing technology – analysts have to understand the business and its goals and operations.

She offered several actionable tips to help make the transition, including:

1. Never just send the spreadsheet.

Or the PowerPoint or the email. “The recipient might ignore it, get frustrated or, worse yet, misinterpret it,” she said. “Instead, communicate what you’ve seen in the analysis.”

2. Be a POET.

Warrillow is a huge fan of the work of Laura Warren of Storylytics.ca. who recommends an acronym approach to data-based storytelling and making sure every presentation offers:

  • Purpose: The purpose of this chart is to …
  • Observation: To illustrate that …
  • Explanation: What this means to us is …
  • Take-away or Transition: As a next step, we recommend …

3. Brand your work.

“Many of us suffer from a lack of profile in our organizations,” she said. “Take a lesson from public relations and brand yourselves. Just make sure you’re a brand people can trust. Have checks and balances in place to make sure your data is accurate.”

4. Don’t be an order taker.

Be consultative and remember that you are the expert when it comes to knowing how to structure the campaign modeling. It can be tough in some organizations, Warrillow admitted, but asking some questions and offering suggestions can be a great way to begin.

5. Tell the truth.

“Storytelling can be associated with big, tall tales,” she said. “You have to have stories that are compelling but also have truth and resonance.” One of her best resources is The Four Truths of the Storyteller” by Peter Gruber, which first appeared in Harvard Business Review December 2007.

6. Go higher.

Knowledge and comprehension are important, “but we need to start moving further up the chain,” Warrillow said. She used Bloom’s Taxonomy to describe the importance of making data move at the speed of business – getting people to take action by moving into application, analysis, synthesis and evaluation phases.

7. Prepare for the future.

“Don’t become the person who says, ‘I’m this kind of analyst,’” she said. “We need to explore new environments, prepare ourselves with great skills. In the short term, we’re going to need more programming skills. Over time, however, we’re going to need interpretation, communication and storytelling skills.” She encouraged attendees to answer the SAS Global Forum challenge of becoming a #LifeLearner.

For more from Warrillow, read the post, Making data personal: big data made small.

7 tips for becoming a data science unicorn was published on SAS Users.

8月 012016
 

Would Taylor Swift date her suitors or not? Guess what? Data scientists may know the answer. But this time it was pupils who found the answer. Pupils? Yes, data science is for everyone, kids included. During Tech Week, a UK-wide event in July promoted by the Tech Partnership, organisations were […]

The post The Maths in the Dates – and Who Taylor Swift Will Date! appeared first on Generation SAS.

7月 202016
 

Gareth Hampson, a data scientist who graduated with an MSc in databases and web-based systems from Salford University, recently won a SAS prize for his excellent project using SAS® Enterprise Miner™. He has also been profoundly deaf since the age of 4 due to meningitis. We spoke to Gareth to find out more about […]

The post Disabilities don't mean you can't be an excellent data scientist appeared first on Generation SAS.

5月 112016
 

As the demand for analytical skills continues to grow and the data scientist has been catalogued as the sexiest job of the 21st century, more and more students are showing interest in the analytics and big data world. We asked one of our graduates to share her experiences working as […]

The post How one data scientist turns ideas into reality appeared first on Generation SAS.

4月 202016
 

Just last weekend, I was considering buying a new camera lens. I already had a few brands in mind, so I looked online at their websites to learn more about their product information. I was able to conduct a comparison on different brands and lenses to narrow down to a specific 50mm lens provided by a major brand. I added the lens to my cart online, but wanted to get a closer look of it, so I chatted online with a representative to see if there were any lenses available at stores near me. This digital channel was my first point of interaction with the brand, but what impact did that have on my buying experience? Would responsive design come479424735 into play? Would the brand proactively contact me about similar products? Or would they simply react to inquiries that I had as a consumer? But today’s consumers expect immediate, individualized messages – would this brand deliver?

The fact of the matter is that a lot of brands don’t have the capabilities to modify messages, offers and interactions across channels, devices and points in time so that they are more relevant to the end consumer.

 Enter SAS

SAS Customer Intelligence 360, launching this month to the marketplace, offers an all-encompassing view of customers no matter how they choose to engage with you across digital properties.

A complete customer view

SAS Customer Intelligence 360 can give you detailed insights from digital channels customers interact with to create the most effective and relevant actions. The solution rapidly transforms digital data into a complete 360-degree view of the customer, meeting each customer’s needs at the right time, place and in proper context. Multiple decision-making methods, such as predictive models and multivariate tests, help ensure that customers gets the most relevant and personalized offers.

Data integration

Data is also easy to integrate with many offline customer channels though SAS Customer Intelligence 360 and its customer decision hub. Customer interactions are based on previous engagements on all other platforms. The data hub is able to convert all of this into customer-focused actions. With this data integration, the Customer_decision_hubbrand is able to gather my interactions and information from all available sources; not just the website, but the call center, mobile apps, social media and point of sale.

Offline customer data can be appended to digital data to further augment the view of me as a customer. These data sources, typically demographic or transactional in nature, gives marketers valuable insight into a customer’s true needs in order to create more relevant offers, better targeted activities and more efficient use of marketing resources. This capability allows the brand to see me more than just page clicks. They’ll see me as a father with young children, interested in photography and seeking to buy a 50mm lens to capture fleeting family moments.

Insights into future actions

You don’t need to be a data scientist to harness the power of predictive marketing; SAS Customer Intelligence 360 includes guided analytics to provide marketers a forward-looking view of customer journeys. This enables them to better understand business drivers and incorporate them into segmentation, optimization and other analytic techniques. Marketers can better forecast how customers will perform in the future. The solution acts as the data scientist – enabling marketers to become more efficient and effective in the analytical techniques they embed into marketing initiatives.

Web data collection

Each web page is embedded with a single line of HTML that automatically collects page information without expensive tagging. With this feature, the webpage configuration might change simultaneously with what I click on, the order and timing of my clicks, each keystroke, etc. Dynamic data collection offers me more relevant content as I navigate through the brand’s site. Any customer activities are recorded privately and securely over time so that once a customer is identified, the information is connected automatically.

Simply put, SAS Customer Intelligence 360 offers marketers the confidence to manage their digital customer journeys in a more personalized and profitable way. Marketers gain a complete view of their customers and transform this data using analytical insight into customer-centric knowledge and future actions. With this solution, brands can interact with customers on a personalized level and customers will be more satisfied with their entire relationship with a brand, not just a single transaction. Customer loyalty goes up and attrition goes down.

And as for me, I got the lens I was looking for, and was satisfied with the customer experience. Of course I have ideas on how to improve it on behalf of this brand, and SAS Customer Intelligence 360 fits into that picture.

tags: customer decision hub, customer journey, data hub, data scientist, Digital Intelligence, Predictive Marketing, Predictive Personalization, SAS Customer Intelligence 360

SAS Customer Intelligence 360: Digital discovery and engagement brought into focus was published on Customer Intelligence.

4月 112016
 

In a few short years, the need for people with analytics skills could significantly outpace supply. In fact, recent research from MGI and McKinsey's Business Technology Office says: By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million […]

The post Three tips for developing an analytics program that prepares students for big data careers appeared first on Generation SAS.

3月 162016
 

Numerous studies and statistics point to the fact that in just a few short years the need for people with analytics skills could significantly outpace supply. With so much talk around the analytics skills gap and the growing market for analytic talent, we wanted to highlight a variety of avenues […]

The post Preparing for Big Data Careers: Interview with Robert McGrath, University of New Hampshire appeared first on Generation SAS.

2月 042016
 

Numerous studies and statistics point to the fact that in just a few short years the need for people with analytics skills could significantly outpace supply. With so much talk around the analytics skills gap and the growing market for analytic talent, we wanted to highlight a variety of avenues […]

The post Preparing for Big Data Careers: Interview with Jennifer Priestley, Kennesaw State University appeared first on Generation SAS.