This post is a nod to one of my favourite plays, The Importance of Being Earnest by Oscar Wilde. As the title ‘Data Scientist’ becomes more common, what can we gain about the importance of titles and labelling from this century old play?
For those that haven't read it, the story revolves around a man who goes by the name Earnest and has a reputation for being earnest (i.e. truthful and trustworthy). He is much loved by a lady for having the name Earnest - she has always wanted to marry a man with that name, believing men named Earnest are earnest (deep breath).
Well it turns out his name isn't really Earnest (irony 1 - he's not actually earnest despite his name) and the lady considers dumping him, but by a comic twist of fate it turns out that it actually is Earnest (irony 2 - he actually was earnest even though he didn’t think he was).
So what is the importance of being <insert a name or title>? As another wit once said "a rose by any other name would smell as sweet". But is that true? Our experiences have probably told us "No". Despite whatever skills we may have, a title comes with a reputation and expectations. Whether that’s someone named Earnest actually being earnest … or a Data Scientist being a magician with big data.
There have been many attempts at explaining what a data scientist is since the term was first coined in 2008 – Wikipedia, HBR, KDnuggets, Marketing Distillery – but the general definition is someone who encompasses equally high skill levels in:
b. “Hacker” programming.
The number of people that actually satisfy this definition is a popular subject in discussion forums and papers, and it’s interesting to also ask from what perspective the attributes are judged (good communication skills from a marketer are expected to be different than good from a programmer). But what everyone agrees is that many Data Analysts, Data Miners, Statisticians, Econometricians and the myriad of other titles over the last 50 years, all have these attributes, but in varying proportions.
I've met many analytics practitioners over the years from different parts of the world. Some quantitative analysts have chosen to change their titles to Data Scientist to make them more attractive to employers as the Statistician and Data Miner titles go out of favour. Some, who are very close to the purist definition of data scientist, may not title themselves as such, adamantly sticking with the title they have had for many years.
On the other hand, in many cases, employers who advertise for Data Scientists are actually looking for:
- Quantitative analysts with innate curiosity to learn and innovate – a trait of most people from mathematics, sciences, engineering and economics disciplines.
- Candidates who meet some minimum criteria in the four attributes – many of which can be taught.
- Those who are strong in a subset of prioritized attributes to suit a function within a team.
Over the years, given the right drivers, these partially defined Data Scientists could become strictly defined Data Scientists – but in a collaborative team environment, you will likely find that having a whole team of these individuals is not important. The two realities are that there are far fewer examples of organizations looking for the latter than the former, and these tend to be for commercial research and development arms; and individuals that embody all the attributes of a Data Scientist in the “right” amounts are rare.
Therefore, for those looking to hire Data Scientists, my advice is:
- It’s much more important to first start with understanding the functions and expectations of the team within an organisation.
- Then create roles that fit the needs of that team and “be much more specific about the type of worker you want to be or hire” (Tom Davenport, WSJ).
- Be realistic of current skills in the market and tertiary education programs available.
For those looking for a role as a Data Scientist:
- Start developing the attributes you are weakest at through classroom and self-service training because all round skills are always sought after.
- Keep developing the attributes you already excel at as the big data analytics market is constantly evolving.
- Stay curious of new techniques and worldwide trends.
So, the importance of being a Data Scientist is to be more attractive to employers, but that what employers are usually looking for is some flavour thereof, rather than a strictly defined criteria. Even though a prospect may not be the purest definition of a Data Scientist, he or she may turn out to be just what an organization needs. Therefore, be sure of what you want to be and what you’re looking for and don't judge prospects and opportunities based on a title… lest you end up dumping your Earnest before he becomes an Earnest!
Learn more. Stay curious.
It was just a couple of years ago that folks were skeptical about the term "data scientist". It seemed like a simple re-branding of an established job role that carried titles such as "business analyst", "data manager", or "reporting specialist".
But today, it seems that the definition of the "Data Scientist" job role has gelled into something new. At SAS Global Forum 2014, I heard multiple experts describe data science qualifications in a similar way, including these main skills:
- Ability to manage data. Know how to access it, whether it's in Excel, relational databases, or Hadoop -- or on the Web. Data acquisition and preparation still form the critical foundation for any data analysis.
- Knowledge of applied statistics. Perhaps not PhD-level stuff, but more than the basics of counts, sums, and averages. You need to know something about predictive analytics, forecasting, and the process of building and maintaining analytical models.
- Computer science, or at least some programming skills. Point-and-click tools can help keep you productive, but it's often necessary to drop into code to achieve the flexibility you need to acquire some data or apply an analysis that's not provided "out of the box".
- And finally -- and this makes a Data Scientist the most relevant -- the ability to understand and communicate the needs of the business. You might be a data wiz and have metrics out the wazoo, but an effective data scientist must know which fields and metrics matter most to the organization he or she serves. And you must be able to ask the right questions of the stakeholders, and then communicate results that will lead to informed action.
I don't claim to be a data scientist -- I'm not strong enough in the statistical pillar -- but I do have my moments. For example, I consider my recent analysis of blog spam to be data-science-like. Even so, I'm not brave enough to change my business cards just yet.
At SAS Global Forum I talked to Wayne Thompson, Chief Data Scientist at SAS. (Yes, even SAS is capitalizing on the buzz by having a data science technologies team.) Here he is introducing SAS In-Memory Statistics for Hadoop, a programming interface that's meant to empower data scientists:
Data science isn't all just "Wayne's world" -- there were plenty of other data science practitioners at the conference. For example, check out Lisa Arney's interview with Chuck Kincaid of Experis, talking about how to be a data scientist using SAS. (See his full paper here.) And SAS' Mary Osborne, who presented on Star Wars and the Art of Data Science. (Her paper reveals the unspoken fifth pillar of a data scientist: it's good to be part nerd.)
What do you think about the "new" field of data science? Have you changed your business card to include the "data scientist" title?