Data scientists need good skills in communication, data mining, data wrangling and more. Joyce Norris-Montanari explains.

The post The data scientist: Asking the right questions appeared first on The Data Roundtable.

12月 032018

Data scientists need good skills in communication, data mining, data wrangling and more. Joyce Norris-Montanari explains.

The post The data scientist: Asking the right questions appeared first on The Data Roundtable.

11月 162018

In my first two posts of this blog series, we heard why two students chose to pursue a STEM field and what appealed to them about data science. We also heard how they put their knowledge to work on a real-world data science project. Today, we'll hear their advice to future [...]

Two students’ advice on data science, SAS and more was published on SAS Voices by Georgia Mariani

11月 072018

Phil Simon chimes in with some tips on how to set these folks loose.

The post Making life easier for your data scientists appeared first on The Data Roundtable.

10月 112018

We hear a lot about data science nowadays, but do you ever wonder how it’s being used to help solve real-world problems? In my first post of this blog series, we heard why two students chose to pursue a STEM field and what appealed to them about data science. Today, we'll hear [...]

How two students used data science to analyze ‘real-world’ problems was published on SAS Voices by Georgia Mariani

9月 052018

Are you curious? Do you have a passion for science, technology, engineering and math (STEM)? Do you enjoy robotics or statistics? Do you like to solve hard problems? If you answered yes to any of the above, you might have what it takes to be a data scientist. Recently, I [...]

Why should you learn data science? Two students’ perspectives was published on SAS Voices by Georgia Mariani

7月 192018

In a previous posting, SAS Customer Intelligence 360 was highlighted in the context of delivering relevant product, service, and content recommendations using automated machine learning within digital experiences. Shifting gears, SAS recognizes there are different user segments for our platform. This post will focus on building custom analytical recommendation models [...]

SAS Customer Intelligence 360: Factorization machines, visual analytics, and personalized marketing was published on Customer Intelligence Blog.

3月 102018

I bet many of you didn’t even know the term machine learning five years ago. But Gartner did. The Gartner Magic Quadrant for Data Science and Machine Learning Platforms, 2018 was just released, and SAS has been in the leader’s quadrant for five years straight. According to Gartner, “This Magic [...]

Gartner names data science and machine learning leaders was published on SAS Voices by David Tareen

12月 192017

If you’ve ever used Amazon or Netflix, you’ve experienced the value of recommendation systems firsthand. These sophisticated systems identify recommendations autonomously for individual users based on past purchases and searches, as well as other behaviors. By supporting an automated cross-selling approach, they empower brands to offer additional products or services [...]

Customer Intelligence 360: The digital shapeshifter of recommendation systems was published on Customer Intelligence Blog.

9月 262017

In Part 1 and Part 2 of this blog posting series, we discussed: Our current viewpoints on marketing attribution and conversion journey analysis in 2017. The selection criteria of the best measurement approach. Introduced our vision on handling marketing attribution and conversion journey analysis. We would like to conclude this [...]

Algorithmic marketing attribution and conversion journey analysis [Part 3] was published on Customer Intelligence Blog.

9月 212017

A previous entry (http://sas-and-r.blogspot.com/2017/07/options-for-teaching-r-to-beginners.html) describes an approach to teaching graphics in R that also “get[s] students doing powerful things quickly”, as David Robinson suggested.

In this guest blog entry, Randall Pruim offers an alternative way based on a different formula interface. Here's Randall:

For a number of years I and several of my colleagues have been teaching R to beginners using an approach that includes a combination of

Many data analysis operations can be executed by filling in four pieces of information (goal, y, x, and mydata) with the appropriate information for the desired task. This allows students to become fluent quickly with a powerful, coherent toolkit for data analysis.

Trouble in paradise

As the earlier post noted, the use of

-- Randall Pruim

In this guest blog entry, Randall Pruim offers an alternative way based on a different formula interface. Here's Randall:

For a number of years I and several of my colleagues have been teaching R to beginners using an approach that includes a combination of

- the
`lattice`

package for graphics, - several functions from the
`stats`

package for modeling (e.g.,`lm(), t.test()`

), and - the
`mosaic`

package for numerical summaries and for smoothing over edge cases and inconsistencies in the other two components.

Many data analysis operations can be executed by filling in four pieces of information (goal, y, x, and mydata) with the appropriate information for the desired task. This allows students to become fluent quickly with a powerful, coherent toolkit for data analysis.

Trouble in paradise

As the earlier post noted, the use of

`lattice`

has some drawbacks. While basic graphs like histograms, boxplots, scatterplots, and quantile-quantile plots are simple to make with `lattice`

, it is challenging to combine these simple plots into more complex plots or to plot data from multiple data sources. Splitting data into subgroups and either overlaying with multiple colors or separating into sub-plots (facets) is easy, but the labeling of such plots is not as convenient (and takes more space) than the equivalent plots made with `ggplot2`

. And in our experience, students generally find the look of `ggplot2`

graphics more appealing.On the other hand, introducing

`ggplot2`

into a first course is challenging. The syntax tends to be more verbose, so it takes up more of the limited space on projected images and course handouts. More importantly, the syntax is entirely unrelated to the syntax used for other aspects of the course. For those adopting a “Less Volume, More Creativity” approach, `ggplot2`

is tough to justify.ggformula: The third-and-a half way

Danny Kaplan and I recently introduced

`ggformula`

, an R package that provides a formula interface to `ggplot2 `

graphics. Our hope is that this provides the best aspects of `lattice`

(the formula interface and lighter syntax) and `ggplot2`

(modularity, layering, and better visual aesthetics).For simple plots, the only thing that changes is the name of the plotting function. Each of these functions begins with

`gf`

. Here are two examples, either of which could replace the side-by-side boxplots made with `lattice`

in the previous post.We can even overlay these two types of plots to see how they compare. To do so, we simply place what I call the "then" operator (

`%>%`

, also commonly called a pipe) between the two layers and adjust the transparency so we can see both where they overlap.Comparing groups

Groups can be compared either by overlaying multiple groups distinguishable by some attribute (e.g., color)

or by creating multiple plots arranged in a grid rather than overlaying subgroups in the same space. The

`ggformula `

package provides two ways to create these facets. The first uses `|`

very much like `lattice`

does. Notice that the `gf_lm()`

layer inherits information from the the `gf_points()`

layer in these plots, saving some typing when the information is the same in multiple layers.The second way adds facets with

`gf_facet_wrap()`

or `gf_facet_grid()`

and can be more convenient for complex plots or when customization of facets is desired.Fitting into the tidyverse work flow

`ggformala`

also fits into a tidyverse-style workflow (arguably better than `ggplot2`

itself does). Data can be piped into the initial call to a `ggformula`

function and there is no need to switch between `%>%`

and `+`

when moving from data transformations to plot operations.Summary

The “Less Volume, More Creativity” approach is based on a common formula template that has served well for several years, but the arrival of

`ggformula`

strengthens this approach by bringing a richer graphical system into reach for beginners without introducing new syntactical structures. The full range of `ggplot2`

features and customizations remains available, and the `ggformula`

package vignettes and tutorials describe these in more detail.