This morning I delivered a talk to visiting high school students at the SAS campus. The topic: using SAS to analyze Twitter content
Being teenagers, high school students are well familiar with Twitter. But this batch of students was also very familiar with SAS, as they all have taken SAS programming as a course within their schools. (In fact, at least one of the students had earned a SAS certification!)
The organizers of today's program had asked me to keep it very interactive, so I left it to the students to help select what topic we would search for and then analyze from Twitter.
Before we began, I warned them. "This is the Internet," I said. "Participants on Twitter don't always use words that are acceptable in polite conversation. If any bad words appear on the screen during our exercise, I'm going to behave just like I do when it happens in front of my young daughters at home: I'm going to pretend that it's not there."
(However, knowing that I was scheduled to present this morning, I did take precautions and asked Twitter to watch its mouth.)
The students offered several great suggestions of trending topics on Twitter; in the end we settled on #Thor (as in, god of thunder), which should provide interesting content thanks to an imminent theatrical release. (Hence, my "hammer time" title on the blog. Get it?)
And that's when I did something I've never done before. I showed SAS program code to high school students. And they understood it.
After a brief description of some of the key statements and constructs, we ran the program to retrieve 1000 tweets that were tagged #Thor (presumably mostly about the movie, but who knows? Maybe Odin is out there promoting his brand.)
We ran a frequency chart (using the SGPLOT procedure) to show the distribution of tweets over the past 90 hours. Looks like there was a spike in activity around midnight EDT. One student guessed: A new trailer being promoted, perhaps?
Because Twitter content is unstructured text data, we can use what we know about Twitter "conventions" and parse out some extra information, such as how many of the tweets are retweets, and who was the original "tweeter"? We used DATA step and regular expressions to identify the retweets, and then ran a frequency analysis (FREQ) to identify the accounts who had the most retweeted content.
We noticed that @Marvel dominated the conversation, at least for our small sample. I guess it makes sense that they have a vested interest, since they "own" the Thor character. And @DrPepper had quite a lot to say. We guessed that maybe the soft drink is a sponsor for the film and we'll see a few cans of Dr. Pepper on the big screen? (I checked later and it looks like, yes indeed, Thor is a Pepper - wouldn't you like to be a Pepper too?)
I couldn't have been more pleased with how this session went. Together, the students and I used our SAS programming skills and critical thinking to investigate a topic, and we learned things that we didn't know before. We gathered raw data that we had never seen before, and we turned it into information. And that is a life skill that will never go out of style.