“Over the last 15 months, we’ve seen the world change in countless ways,” SAS CEO Jim Goodnight said to virtual attendees at SAS Global Forum’s Opening Session. “We’re relying on math and science to make important decisions. We look at data to determine when it’s safe to travel or return [...]
Predictive models and medical image analysis have the potential to transform health care delivery. To accelerate innovative approaches in health analytics, teams around the world are participating in a global hackathon. In scientific communities, it's well-known that the fastest way to drive progress and discover new ideas is collaboration with [...]
Tis the season for my annual fun Christmas themed blog! This is the seventh year and my tenth song. I hope you enjoy this 2020 holiday song (to the tune of Rockin' around the Christmas Tree). Hackin around the Decision Tree at the SAS party hackathon Data science algorithms available [...]
Last week we announced a new strategic partnership with Microsoft to further shape the future of AI and analytics in the cloud. This commitment will make it easy for SAS customers to move their analytics workloads to the cloud. And it will introduce SAS technologies to millions of Azure customers [...]
What can you learn about wildfires when you provide a room full of analysts with 7 years of US wildfire data and the tools they need to analyze it? A lot. At a recent data dive, we plit 35 data scientists into 9 teams, provided multiple data sets containing information [...]
To succeed in any data-focused hackathon, you need a robust set of tools and skills – as well as a can-do attitude. Here's what you can expect from any hackathon:
Messy data. It might come from a variety of sources, and won't necessarily be organized for analytics or reporting. That's your job.
Nebulous problem set. Usually the goal of a hackathon is to generate insights, improve a situation, or optimize a process. But you don't know going into it which insights you need, which process is ripe for optimization, or which situations can be improved by using data. Hackathons are as much about discovering opportunities as they are about solving problems.
Team members with different viewpoints. This is a big strength of hackathons, and it can also present the biggest challenge. Team members bring different skills and ideas. To be successful, you need to be open to those ideas and to allowing team members to contribute in the way that best uses their skills. Think of yourselves as the Oceans Eleven of data analytics.
In my experience, hackathons are often a great melting pot of different tools and technologies. Whatever tech biases you might have in your day job (Windows versus Linux, SAS versus Python, JSON versus CSV) – these melt away when your teammates show up ready to contribute to a common goal using the tools that they each know best.
My favorite hackathon tools
At the Analytics Experience 2018 Hackathon, attendees have the entire suite of SAS tools available. From Base SAS, to SAS Enterprise Guide, to SAS Studio, to SAS Enterprise Miner and the entire SAS Viya framework -- including SAS Visual Analytics, SAS Visual Text Analytics, SAS Data Mining and Machine Learning. As we say here in San Diego, it's the whole enchilada. As the facilitators were presenting the whirlwind tour of all of these goodies, I could see the attendees salivating. Or maybe that was just me.
When it comes to getting my hands dirty with unknown data, my favorite path begins with SAS Enterprise Guide. If you know me, this won't surprise you. Here's why I like it.
Import Data task: Import any data
Hackathon data almost always comes as CSV or Excel spreadsheets. The Import Data task can ingest CSV, fixed-width text, and Excel spreadsheets of any version. Of course most "hackers" worth their salt can write code to read these file types, but the Import Data task helps you to discover what's in the file almost instantly. You can review all of the field names and types, tweak them as you like, and click Finish to produce a data set. There's no faster method of turning raw data into a SAS data set that feeds the next step.
Query Builder: Filter, compute, summarize, and join
The Query Builder in SAS Enterprise Guide is a one-stop shop for data management. Use this for quick filtering, data cleansing, simple recoding, and summarizing across groups. Later, when you have multiple data sources, the Query Builder provides simple methods to join these – merge on the fly.
Before heading into your next hackathon, it's worth exploring and practicing your skills with the Query Builder. It can do so much -- but some of the functions are a bit hidden. Limber up before you hack!
Characterize Data: Quick data characteristics, with ability to dive deeper
If you've never seen your data before, you'll appreciate this one-click method to report on variable types, frequencies, distinct values, and distributions. The Describe->Characterize Data task provides a good start.
Data tasks: Advanced data reworking: long to wide, wide to long
"Long" data is typically best for reporting, while "wide" data is more suited for analytics and modeling The process of restructuring data from long to wide (or wide to long) is called Transpose. SAS Enterprise Guide has special tasks called "Split Data" (for making wide tables) and "Stack Data" (for making long data). Each method has some special requirements for a successful transformation, so it's worth your time to practice with these tasks before you need them.
Have another favorite editor? You can use SAS Enterprise Guide to open your code in your default Windows editor too. That's a great option when you need to do super-fancy text manipulation. (We won't go into the "best programming editor" debate here, but I've got my defaults set up for Notepad++.)
Export and share with others
The hackathon "units of sharing" are code (of course) and data. SAS Enterprise Guide provides several simple methods to share data in a way that just about any other tool can consume:
Export data as CSV (CSV is the lingua franca of data sharing)
Export data as Excel (if that's what your teammates are using)
Send to Excel -- actually my favorite way to generate ad-hoc Excel data, as it automates Microsoft Excel and pipes the data your looking at directly into a new sheet.
Copy / paste with headers -- low-tech, but this gets you exactly the columns and fields that you want to share with another team member.
When it comes to sharing code, you can use File->Export All Code to capture all SAS code from your project or process flow. However, I prefer to assemble my own "standalone" code piecemeal, so that I can make sure it's going to run the same for someone else as it does for me. To accomplish this, I create a new SAS program node and copy the code for each step that I want to share into it...one after another. Then I test by running that code in a new SAS session. Validating your code in this way helps to reduce friction when you're sharing your work with others.
Hacking your own personal growth
The obvious benefit of hackathons is that at the end of a short, intense period of work, you have new insights and solutions that didn't have before – and might never have arrived at on your own. But the personal benefit comes in the people you meet and the techniques that you learn. I find that I'm able to approach my day job with fresh perspective and ideas – the creativity keeps flowing, and I'm energized to apply what I've learned in my business.