Visualization

1月 182021
 

Recommended soundtrack for this blog post: Netflix Trip by AJR.

This week's news confirms what I already knew: The Office was the most-streamed television show of 2020. According to reports that I've seen, the show was streamed for 57 billion minutes during this extraordinary year. I'm guessing that's in part because we've all been shut in and working from home; we crave our missing office interactions. We lived vicariously (and perhaps dysfunctionally) through watching Dunder Mifflin staff. But another major factor was the looming deadline of the departure of The Office from Netflix as of January 1, 2021. It was a well-publicized event, so Netflix viewers had to get their binge on while they could.

People in my house are fans of the show, and they account for nearly 6,000 of those 57 billion streaming minutes. I can be this precise (nerd alert!) because I'm in the habit of analyzing our Netflix activity by using SAS. In fact, I can tell you that since late 2017, we've streamed 576 episodes of The Office. We streamed 297 episodes in 2020. (Since the show has only 201 episodes we clearly we have a few repeats in there.)

I built a heatmap that shows the frequency and intensity of our streaming of this popular show. In this graph each row is a month, each square is a day. White squares are Office-free. A square with any red indicates at least one virtual visit with the Scranton crew; the darker the shade, the more episodes streamed during that day. You can see that Sept 15, 2020 was a particular big binge with 17 episodes. (Each episode is about 20-21 minutes, so it's definitely achievable.)

netflix trip through The Office

Heatmap of our household streaming of The Office

How to build the heatmap

To build this heatmap, I started with my Netflix viewing history (downloaded from my Netflix account as CSV files). I filtered to just "The Office (U.S.)" titles, and then merged with a complete "calendar" of dates between late 2017 and the start of 2021. Summarized and merged, the data looks something like this:


With all of the data summarized in this way such that there is only one observation per X and Y value, I can use the HEATMAPPARM statement in PROC SGPLOT to visualize it. (If I needed the procedure to summarize/bin the data for me, I would use the HEATMAP statement. Thanks to Rick Wicklin for this tip!)

proc sgplot data=ofc_viewing;
 title height=2.5 "The Office - a Netflix Journey";
 title2 height=2 "&episodes. episodes streamed on &days. days, over 3 years";
 label Episodes="Episodes per day";
 format monyear monyy7.;
 heatmapparm x=day y=monyear 
   colorresponse=episodes / x2axis
    outline
   colormodel=(white  CXfcae91 CXfb6a4a CXde2d26 CXa50f15) ;
 yaxis  minor reverse display=(nolabel) 
  values=(&allmon.)
  ;
 x2axis values=(1 to 31 by 1) 
   display=(nolabel)  ;
run;

You can see the full code -- with all of the data prep -- on my GitHub repository here. You may even run the code in your own SAS environment -- it will fetch my Netflix viewing data from another GitHub location where I've stashed it.

Distribution of Seasons (not "seasonal distribution")

If you examine the heatmap I produced, you can almost see our Office enthusiasm in three different bursts. These relate directly to our 3 children and the moments they discovered the show. First was early 2018 (middle child), then late 2019 (youngest child), then late 2020 (oldest child, now 22 years old, striving to catch up).

The Office ran for 9 seasons, and our kids have their favorite seasons and episodes -- hence the repeated viewings. I used PROC FREQ to show the distribution of episode views across the seasons:


Season 1 is remarkably low for two reasons. First and most importantly, it contains the fewest episodes. Second, many viewers agree that Season 1 is the "cringiest" content, and can be uncomfortable to watch. (This Reddit user leaned into the cringe with his data visualization of "that's what she said" jokes.)

From the data (and from listening to my kids), I know that Season 2 is a favorite. Of the 60 episodes we streamed at least 4 times, 19 of them were in Season 2.

More than streaming, it's an Office lifestyle

Office fandom goes beyond just watching the show. Our kids continue to embrace The Office in other mediums as well. We have t-shirts depicting the memes for "FALSE." and "Schrute Farms." We listen to The Office Ladies podcast, hosted by two stars of the show. In 2018 our daughter's Odyssey of the Mind team created a parody skit based on The Office (a weather-based office named Thunder Mifflin) -- and advanced to world finals.

Rarely does a day go by without some reference to an iconic phrase or life lesson that we gleaned from The Office. We're grateful for the shared experience, and we'll miss our friends from the Dunder Mifflin Paper Company.

The post Visualizing our Netflix Trip through <em>The Office</em> appeared first on The SAS Dummy.

11月 112020
 

Data visualization has never been more widespread and consumed by a global audience as it has been this year with the Coronavirus pandemic. If you are interested in the statistics behind many of the numbers you see displayed in data visualizations then please reference my colleague’s blog series:

One visualization that is commonly used to display metrics of Coronavirus is a bar line chart where the bars display the actual values and the line is a moving average metric.

The screenshot below shows the number of web visits per week and a 5 week moving average value. I’ve named it Visits (5 Week Moving Avg Bracket) since I am including both past and present data points into this calculation.

I will demonstrate how to easily generate this moving average metric to use in your SAS Visual Analytics reports.

Bar Line With Moving Average

Moving Average

Moving average is essentially a block of data points averaged together, creating a series of averages for the data. This is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles.

Traditionally, the moving average block takes the current data point and then moves forward in the series, which is often the case in financial applications. However, it is more common in science and engineering to take equal data points before and after the current position, creating what I am calling a moving average bracket.

SAS’ Visual Analytics calculation gives you the flexibility to define how to average the data points for your moving average: how many positions prior and/or after the current data point.

Steps to generate the Moving Average

From the Data pane, right-click on the measure you want to generate the moving average and select New calculation.

New Calculation

Next, in the Create Calculation window:

  • Name: Visits (5 Week Moving Avg Bracket) (a meaningful name of your choice)
  • Type: Moving average
  • Number of cells to average: 5 (or the number of your choice)

Click OK.

New Moving Average

Notice that you did not have the option to specify how many data points to include before and after the current position. To do that, we will right-click on the generated calculation and select Edit.

Edit Moving Average Scope

From the Edit Calculated Item window, make your starting and ending point position selections. The current position is denoted as zero. In my example, I want a 5 point moving average bracket, therefore I want to average the two positions before current, (-2), current, and then the two positions after current (2).

Aggregate Cells Average

Success. We now have generated our 5 point moving average bracket. You may be wondering, where do you specify the time frame? Granted I named this metric Visits (5 Week Moving Average Bracket) but I did not specify the week time metric anywhere. This is because, for this aggregated measure, the time duration is directly dependent on the data items you have assigned to the object.

In my example, for my web visit data, I want to look at the aggregation at the week level, therefore this Aggregate Cell measure will be grouped by week. I wanted to name my measure appropriately so that any user reading the visual knows the time frame that is being averaged.

If you are unsure of how you want to aggregate your time series, or you are allowing your report viewers the ability to change the role assignments through self-service reporting (see blog or YouTube for more information) then it would be best if you name your measure as 5 Point Moving Average Bracket and leave off the aggregation of the time series.

Bar Line Role Assignments

Let’s take a look at what the numbers are behind the scene. I’ve expanded the object to its Maximize mode so that I can see the summary table of the metrics that make up this object.

The data points will average until the full bracket size is met, then it will slide that bracket down the time series. Keep in mind that our bracket size is five points, two previous data points (-2), current data point (0), and two future data points (2).

Let’s look at week 01. Week 01 is our current, or zero, position. We do not have any prior data points so the only points that can be used from the bracket are positions 1 and 2. See the first highlighted yellow Visits (5 Week Moving Avg Bracket) and how it corresponds to the Visits.

Next, week 02 is able to include a -1 position into the bracket. The full moving average bracket isn’t met till week 03 where all of the data points: -2 through 2 are available. Then the moving average bracket slides along the rest of the time series.

Moving Average Bracket Details

Seeing the object’s summary data and the breakdown of how the moving average bracket is calculated for the data should drive home the fact that the time period aggregation is solely driven by the object’s role assignments.

In my second example, I am looking at retail data and this is aggregated to the day level.

By Day Example

I chose to demonstrate a 14 day moving average bracket.

14 Day Moving Average

Bar Line Chart

In both of my examples, I used the Dual axis bar-line chart object. Default Y axis behavior for bar charts is to start at zero, but line charts since they are intended to show trend can start at a value other than zero. Use the Options pane to select your desired line chart baseline. I choose to have both my bar and line charts fixed at zero. Explore the other Options to further customize your visual.

Bar Line Chart

Summary

Calculating the moving average is offered as a SAS Visual Analytics out-of-the-box Derived Item. See the SAS Documentation for more information about Derived Items.

Once you’ve specified the window of data points to average for the moving average you can go back and edit the starting and ending points around the current position. Remember that you will not be specifying a time frame for the aggregation but it will be dynamically determined based on the role assignments to the visual.

Learn more

If you are interested in other uses for the moving average operator Aggregate Cells take a look at this article: VA Report Example: Moving 30 Day Rolling Sum.

Also, check out these resources:

SAS Visual Analytics example: moving average was published on SAS Users.

4月 102020
 

First introduced in SAS Visual Analytics 8.3, common filters are filters that can be shared between objects in your reports.

Common filter benefits include:

  • Easy to assign the same filter conditions to other report objects.
  • When you edit a common filter, it is updated everywhere that the common filter is used.
  • A common filter is available for the entire report, across pages.

Common filter limitations include:

  • Objects must share the same data source as the common filter definition.
  • Common filters are not available across multiple reports.

While the benefits speak for themselves, common filters also expedite designing reports and exploring data by quickly reusing the same filter conditions with only a few mouse clicks.

Let’s look at some examples of using common filters. In the screenshot below, I defined a filter to return data for the last 30 days. I converted that filter to a common filter and now the Last 30 Days common filter can be applied to any object in this report that uses the same data source. Then I applied it to the bottom treemap object.



In my second example, I parameterized a text input control to capture a user-defined value. Then I defined a parameter-driven filter for the bar chart object. Next, I converted that filter to a common filter named Text Input Contains Filter and applied it to the list table object below.



In my third example, I parameterized two drop-down list controls to capture date boundaries. Then I defined a parameter-driven filter for the waterfall chart object. Next, I converted that filter to a common filter named FromToWeekFilter and applied it to the key value object.



Pro tip: Embrace meaningful filter names. Recall that these common filters are associated with the data source for which they are defined. Being able to quickly identify a common filter definition by its name will save you additional mouse clicks in the long run.

Example 1: Last 30 Days Filter

Let’s pick up where we need to define the filters. Therefore, both objects have been added to the report page with roles assigned.

Next, select the line chart object and open the Filters pane. Click + New filter then select Advanced filter.

Now we will use the built-in filter conditions offered in SAS Visual Analytics. First, select the date/datetime data item, Day. Second, scroll down in the available conditions until you see Last 30 days, then double click on the condition to add it to the expression editor. Third, give the filter condition a meaningful name and as a final step, be sure the number of returned observations is expected. Click OK.

Now we need to convert this filter to a common filter. With the line chart object still selected, on the Filters pane, use the Last 30 Day overflow menu and select Change to common filter.



You should now see the Last 30 Days listed as a Common Filter in the Data pane.

Lastly, we can apply this common filter to the treemap object. Select the treemap object, then open the Filters pane. Next click + New Filter and select the common filter Last 30 Days.



It’s good practice to title your objects to reflect any filters that may be defined for them. Especially if there are no prompts, i.e. control objects, driving the subset of data. This is so your report consumers quickly understand that they are only seeing, in this case, the last 30 days of data.

Example 2: Text Input Contains Filter

This next example will define a parameter-driven common filter. If you are not familiar with using parameters in SAS Visual Analytics, start with this article and refer to the additional materials at the end.

See the screenshot below for more information on the data item role assignments for each object. The most important role that will drive the parameter-driven common filter is the parameter. The parameter, TextInputParameter, is a character parameter assigned to the text input control object and it is the only role assignment. If there is nothing entered in this prompt, i.e. control object, then the filter will return all of the data. We will define a contains expression to only return data rows that contain the entered text.

With the bar chart object selected, open the Filters pane and click + New filter and select Advanced filter.



Now we need to build our parameter-driven filter expression. In this filter we will be checking if the Product Description data item Contains the entered text stored in the TextInputParameter. I wrapped each string expression in an UpCase function so that mixed case is ignored. I could have just as easily used the LowerCase function to get the same result.

And finally, remember to give your filter a meaningful name. The returned observations number is not reflective of an applied filter since I do not have any text entered in the control object. If you had text entered there, then your returned observation number may show a subset of matched rows.



Now we need to convert this object-level filter to a common filter. With the bar chart still the active object, open the Filters pane and use the Text Input Contains Filter overflow menu and select Change to common filter.

You should now see the Text Input Contains Filter in the Data pane.



To add this common filter to the lower list table object, select the list table object and open the Filters pane. Click on the + New filter and select the Text Input Contains Filter.



Now let’s test the filter. In the screenshot below, I’ve typed the word red. Recall that our filter expression is to return rows where Product Description contains the entered text. Notice that even though I do not have the data item Product Description assigned as a data role in the bar chart object, SAS Visual Analytics is still able to apply the filter appropriately. The list table object does have a role assigned for Product Description so that filter application is easier to identify.



I used the contains operator instead of the equals to return a partial match to the Product Description to help identify trends across multiple Products. In these next examples, I entered the text (F) and (M) to return the rows where the Product Description indicates gender-specific products.

Example 3: From – To Week Filter

This last example will also define a parameter-driven common filter. If you need a more step-by-step guide to similar examples, refer to this YouTube video tutorial:

See the screenshot below for more information on the data item role assignments for each object. This example differs from the last in that these control objects, the drop-down lists, use both Category and Parameter Role assignments. The Year-Week data item will provide the available values and the selections will be stored in the parameters FromWeekParameter and ToWeekParameter. I will then create a common filter for an inclusive between of the selected year week values to apply to both the key value object and the waterfall chart.



With the waterfall chart object selected, open the Filters pane and click + New filter and select Advanced filter.



Now we need to build our parameter-driven filter expression. In this filter, I will subset the Year-Week date values which are inclusively between the FromWeekParameter and ToWeekParameter boundaries. Follow these steps:

  1. Select the date/datetime data item, Year-Week.
  2. Scroll down in the available conditions till you see Year-Week BetweenInclusive(‘x’,’y’), then double click on the condition to add it to the expression editor.
  3. Drag the FromWeekParameter and ToWeekParameter from the parameter data items list to the expression.
  4. Give the filter condition a meaningful name. Click OK.



Test the filter by adjusting the values in the drop-down list controls.



Now we need to convert this object-level filter to a common filter. With the waterfall chart still the active object, open the Filters pane and use the FromToWeekFilter overflow menu and select Change to common filter.

You should now see the FromToWeekFilter in the Data pane.



To add this common filter to the key value object; select the key value object and open the Filters pane. Click on the + New filter and select the FromToWeekFilter.

Conclusion

Remember that once a common filter is defined, it can be used for any object in the report that uses the same data source. In these examples, I applied the common filter on the same report page but you can use a common filter on any page within the report. Recall that all of the common filters are listed and available when I click to apply a new filter to an object.

Hopefully these examples have shown you new ways to explore data faster!

Additional materials for using parameters in SAS Visual Analytics:

Using common filters in SAS Visual Analytics was published on SAS Users.

11月 152019
 

Designing interactive reports can be a fun and unique challenge. As user interface experience designers can attest, there are several aspects that go into developing a successful and effective self-service tool. Granted I’m not designing the actual software, but reports require a similar approach to be sure that visualizations are clear and that users can get to the answers they are looking for. Enter prompts.

Reports prompt users to better understand trends, how their data points compare to the whole, and to narrow the scope of data. Being able to pick the placement of these prompts quickly and easily will open the possibilities of your report layouts! I’m specifically speaking about Report and Page level prompts. Traditionally, these global prompt controls were only able to be placed at the top; see the yellow highlighted areas below.

Let’s take a look at an example report with the traditional Report and Page prompt layout. The Report prompts are extremely easy to pick out, since they sit above the pages, but the Page prompts can sometimes blend in with other prompts contained in the report body.

Introduced in the SAS Visual Analytics 8.4 release is the ability to control the layout position of these prompts. Using my example report, let’s change the placement of these prompts. In Edit mode, open the Options pane and use the top level drop-down to select the report name. This will activate the report level, and the report level Options will display. Next, under the Report Controls subgroup, move the placement radio button to the west cardinal point.

Depending on the type of control objects you are using in your report, you may not like this layout yet. For instance, you can see here that my date slider is taking up too much space.

When you activate the slide control, use the Options pane to alter the Slider Direction and Layout. You can even use the Style option to change the font size. You can see that after these modifications, the Report prompt space can be configured to your liking.

Next, let’s change the placement for the Page prompts, for demonstration purposes. From the Options pane, use the top drop-down to select the page name. This will activate the page level, and the page level Options will display. Next, under the Page Controls subgroup, move the placement radio button to the west cardinal position.

You can see that the direction of the button bar control was automatically changed to vertical. Now we can clearly see which prompts belong to the page level.

If I switch to view mode, and adjust the browser size, you can get a better feel for the Report and Page prompt layout changes.

But as with many things, just because you can, doesn’t mean you should. This is where the report designer’s creativity and style can really take flight. Here is the same report, but with my preferred styling.

Notice that I kept the Report prompts along the top but moved the Page prompts to the left of the report. I also added two containers and configured a gray border for each container to better separate the objects. This helps the user quickly see that the drop-down will filter the word cloud is only. I also used the yellow highlighting through styling and a display rule to emphasize the selected continent. The bar chart is fed from an aggregated data source which is why the report prompt is not filtering out the other continents.

Feel free to send me your latest report design ideas!

Additional material related to Report and Page prompts:

New control prompt placement option in SAS Visual Analytics was published on SAS Users.

8月 312018
 

If you're good at games like Wheel of Fortune, Scrabble, or Words with Friends, you've probably figured out that certain letters appear more often than others. But do you have a cool way to figure out which letters appear most & least frequently? How about using a computer to plot [...]

The post Which keyboard keys do you use most frequently? appeared first on SAS Learning Post.

5月 242018
 

Poverty. It's a bit difficult to define who lives in poverty - I guess it's a relative thing, and depends on the standard of living of the people around you. Today we're going to take a look at the child poverty rates in several 'rich' countries (such as the United [...]

The post Comparing child poverty in 26 rich countries appeared first on SAS Learning Post.

4月 042018
 

Being the Graph Guy, I wanted to know about all the "data visualization" presentations at the upcoming SAS Global Forum 2018 conference. I tried going through the official interface to search for such sessions, but it was difficult (impossible?) to know that I had found them all. Therefore I created [...]

The post 100+ presentations about data visualization at SAS Global Forum! appeared first on SAS Learning Post.

10月 032017
 

"Coming to America" - it's been the name of a funny movie and a dramatic song (can you name the actor & singer without cheating?!?) It's also been a dream for many, and an action for some. People have been coming to America both legally and illegally for many years, but there's one special category [...]

The post Trends in U.S. refugee admissions appeared first on SAS Learning Post.

9月 212017
 

How do the North American amusement parks compare in popularity? If this question was to come up during a lunch discussion, I bet someone would pull out their smartphone and go to Wikipedia for the answer. But is Wikipedia the definitive answer - how can we tell if Wikipedia is wrong? [...]

The post Amusement park attendance (could Wikipedia be wrong?!?) appeared first on SAS Learning Post.

7月 112017
 

Carbon Dioxide ... CO2. Humans breathe out 2.3 pounds of it per day. It's also produced when we burn organic materials & fossil fuels (such as coal, oil, and natural gas). Plants use it for photosynthesis, which in turn produces oxygen. It is also a greenhouse gas, which many claim [...]

The post U.S. CO2 emissions are on the decline! appeared first on SAS Learning Post.