Game of Thrones

4月 202019
 

Do you have a favorite television show? Or a favorite movie franchise that you follow? If you call yourself a "fan," just how much of a fan are you? Are you merely a spectator, or do you take your fanaticism to the next level by creating something new?

When it comes to fandom for franchises like Game of Thrones, the Marvel movies, or Stranger Things, there's a new kind of nerd in town. And this nerd brings data science skills. You've heard of the "second screen" experience for watching television, right? That's where fans watch a show (or sporting event or awards ceremony), but also keep up with Twitter or Facebook so they can commune with other fans of the show on social media. These fan-data-scientists bring a third screen: their favorite data workbench IDE.

I was recently lured into into a rabbit hole of Game of Thrones data by a tweet. The Twitter user was reacting to a data visualization of character screen time during the show. The visualization was built in a different tool, but the person was wondering whether it could be done in SAS. I knew the answer was Yes...as long as we could get the data. That turned out to be the easiest part.

WARNING: While this blog post does not reveal any plot points from the show, the data does contain spoilers! No spoilers in what I'm showing here, but if you run my code examples there might be data points that you cannot "unsee." I was personally conflicted about this, since I'm a fan of the show but I'm not yet keeping up with the latest episodes. I had to avert my eyes for the most recent data.

Data is Coming

A GitHub user named Jeffrey Lancaster has shared a repository for all aspects of data around Game of Thrones. He also has similar repos for Stranger Things and Marvel universe. Inside that repo there's a JSON file with episode-level data for all episodes and seasons of the show. With a few lines of code, I was able to read the data directly from the repo into SAS:

filename eps temp;
 
/* Big thanks to this GoT data nerd for assembling this data */
proc http
 url="https://raw.githubusercontent.com/jeffreylancaster/game-of-thrones/master/data/episodes.json"
 out=eps
 method="GET";
run;
 
/* slurp this in with the JSON engine */
libname episode JSON fileref=eps;

Note that I've shared all of my code for my steps in my own GitHub repo (just trying to pay it forward). Everything should work in Base SAS, including in SAS University Edition.

The JSON library reads the data into a series of related tables that show all of the important things that can happen to characters within a scene. Game of Thrones fans know that death, sex, and marriage (in that order) make up the inflection points in the show.

Building the character-scene data

With a little bit of data prep using SQL, I was able to show the details of the on-screen time per character, per scene. These are the basis of the visualization I was trying to create.

/* Build details of scenes and characters who appear in them */
PROC SQL;
   CREATE TABLE WORK.character_scenes AS 
   SELECT t1.seasonNum, 
          t1.episodeNum,
          t2.ordinal_scenes as scene_id, 
          input(t2.sceneStart,time.) as time_start format=time., 
          input(t2.sceneEnd,time.) as time_end format=time., 
          (calculated time_end) - (calculated time_start) as duration format=time.,
          t3.name
      FROM EPISODE.EPISODES t1, 
           EPISODE.EPISODES_SCENES t2, 
           EPISODE.SCENES_CHARACTERS t3
      WHERE (t1.ordinal_episodes = t2.ordinal_episodes AND 
             t2.ordinal_scenes = t3.ordinal_scenes);
QUIT;

With a few more data prep steps (see my code on GitHub), I was able to summarize the screen time for scene locations:

You can see that The Crownlands dominate as a location. In the show that's a big region and a sort of headquarters for The Seven Kingdoms, and the show data actually includes "sub-locations" that can help us to break that down. Here's the makeup of that 18+ hours of time in The Crownlands:

Screen time for characters

My goal is to show how much screen time each of the major characters receives, and how that changes over time. I began by creating a series of charts using PROC SGPLOT. These were created using a single SGPLOT step using a BY group, segmented by show episode. They appear in a grid because I used ODS LAYOUT GRIDDED to arrange them.

Here's the code segment that creates these dozens of charts. Again, see my GitHub for the intermediate data prep work.

/* Create a gridded presentation of Episode graphs CUMULATIVE timings */
ods graphics / width=500 height=300 imagefmt=svg noborder;
ods layout gridded columns=3 advance=bygroup;
proc sgplot data=all_times noautolegend ;
  hbar name / response=cumulative 
    categoryorder=respdesc  
    colorresponse=total_screen_time dataskin=crisp
    datalabel=name datalabelpos=right datalabelattrs=(size=10pt)
    seglabel seglabelattrs=(weight=bold size=10pt color=white) ;
   ;
  by epLabel notsorted;
  format cumulative time.;
  label epLabel="Ep";
  where rank<=10;
  xaxis display=(nolabel)  grid ;
  yaxis display=none grid ;
run;
ods layout end;
ods html5 close;

Creating an animated timeline

The example shared on Twitter showed an animation of screen time, per character, over the complete series of episodes. So instead of a huge grid with many plots, need to produce a single file with layers for each episode. In SAS we can produce an animated GIF or animated SVG (scalable vector graphics) file. The SVG is a much smaller file format, but you need a browser or a special viewer to "play" it. Still, that's the path I followed:

/* Create a single animated SVG file for all episodes */
options printerpath=svg animate=start animduration=1 
  svgfadein=.25 svgfadeout=.25 svgfademode=overlap
  nodate nonumber; 
 
/* change this file path to something that works for you */
ODS PRINTER file="c:\temp\got_cumulative.svg" style=daisy;
 
/* For SAS University Edition
ODS PRINTER file="/folders/myfolders/got_cumulative.svg" style=daisy;
*/
 
proc sgplot data=all_times noautolegend ;
  hbar name / response=cumulative 
    categoryorder=respdesc 
    colorresponse=total_screen_time dataskin=crisp
    datalabel=name datalabelpos=right datalabelattrs=(size=10pt)
    seglabel seglabelattrs=(weight=bold size=10pt color=white) ;
   ;
  by epLabel notsorted;
  format cumulative time.;
  label epLabel="Ep";
  where rank<=10;
  xaxis label="Cumulative screen time (HH:MM:SS)" grid ;
  yaxis display=none grid ;
run;
options animation=stop;
ods printer close;

Here's the result (hosted on my GitHub repo -- but as a GIF for compatibility.)

I code and I know things

Like the Game of Thrones characters, my visualization is imperfect in many ways. As I was just reviewing it I discovered a few data prep missteps that I should correct. I used some features of PROC SGPLOT that I've learned only a little about, and so others might suggest improvements. And my next mission should be to bring this data in SAS Visual Analytics, where the real "data viz maesters" who work with me can work their magic. I'm just hoping that I can stay ahead of the spoilers.

The post Deeper enjoyment of your favorite shows -- through data appeared first on The SAS Dummy.

7月 272016
 

Bellagio Hotel for Analytics ExperienceFirst it was the patriarch of my favorite family. That shocking Red Wedding scene meant I could cross off several more characters I’d grown to love. When the season finale of Season 5 left me asking if we’d lost yet another one of my favorites, I wasn’t sure how much more I could take. Of course, I’m talking about the often surprising deaths of some of our favorite characters in HBO’s wildly popular series Game of Thrones. If you’re a fan of the show you know that no one, regardless of how important they are to the storyline, is safe from an untimely demise.

Though I’ve grown somewhat accustomed to saying goodbye, it sure would be nice to know the likelihood a particular character will live or die, just so I can prepare for the heartache in advance if need be. Thankfully, Taylor Larkin, a student at the University of Alabama, thinks survival data mining can help. He plans to show us how in an e-poster he'll present at this year’s Analytics Experience conference, September 12 – 14, at the Bellagio hotel in Las Vegas.

Using plot points from the books the TV series is based on, along with survival data mining using the Survival node in SAS® Enterprise Miner™ 13.1, Larkin has created an analysis that estimates the probabilities popular Game of Thrones characters will survive through time. His research was inspired by the analysis and datasets created by Olin College Computer Science Professor Allen Downey and some of his students, who used Bayesian survival analysis to do something similar.

Analytics Experience conference presentations

Larkin’s sure-to-be awesome presentation is one I’m really looking forward to seeing, but it’s also just one of more than 100 talks planned for the event. The Analytics Experience conference provides attendees an in-depth look at some of the latest research, top trends and new techniques being used in the field of analytics. This is the nineteenth consecutive year that SAS has hosted the event.

This year’s conference offers six keynote addresses and dozens of session talks. Some of the topics presenters will cover include customer intelligence, business intelligence, data management, Hadoop, fraud, cybersecurity, risk analytics, and the Internet of Things. E-Poster presentations, demos, training classes and table talks provide even more insight and allow attendees to explore other creative ways to use analytics.

A number of the talks will come from university students – I always find them to be a great addition to the conference’s content. Besides Larkin’s talk, student presenters will show you how to use analytics to do things like detect sarcasm on social media, build a restaurant recommender engine or defend Steph Curry, something the rest of the NBA couldn’t seem to do last season.

Though talks are still being added every day, a large portion of conference presentations are now available.

Keynote speakers include:

  • Jared Cohen, President of Jigsaw and Chief Advisor to the Executive Chairman of Alphabet.
  • Jim Goodnight, CEO of SAS.
  • Amber MacArthur, President of Konnekt.
  • Jeremiah Owyang, Founder of Crowd Companies.
  • Jake Porway, Founder and Executive Director of DataKind.
  • R. Ray Wang, Principal Analyst, Founder and Chairman of Constellation Research.

If you’re in the field of analytics, there isn’t a better conference to help advance your knowledge. It’s an event I look forward to every year and I hope to see you there.

For more information, visit the website or join the community dedicated to the event. You can also view a number of videos from last year’s event on YouTube.

P.S. If you do make it to Vegas in September, let’s plan to meet at Larkin’s e-poster. We’ll find out together how likely Tyrion is to make it safely through Season 6!

tags: analytics, analytics conference, analytics experience, Game of Thrones

Will your favorite Game of Thrones character survive next season? was published on SAS Users.