12月 102018

When I was growing up, there were two kinds of Sundays: regular Sundays and George Sundays. George was the proprietor of a local Italian restaurant in my hometown and hosted the extended LaRusso clan for Sunday lunch every few weeks. His restaurant, appropriately named George’s, owns some of my favorite childhood memories – and some of my worst.

Every couple of months, my aunts, uncles, a baker’s dozen of cousins, and my immediate family members would take over George’s backroom and see if we could challenge the city’s noise ordinance. George would do nothing to discourage us, appearing every so often to fire balls of uncooked dough at us or ply us with more caffeine-laced sugary drinks, despite instructions to the contrary from our parents.

Invariably, though, an otherwise pleasant afternoon took a turn for the worse as we were leaving the restaurant. That was when my parents, thinking they were doing us a favor, would let us choose one item off George’s famous “candy wall.” You see, George didn’t stock just one or two different kinds of candy, he had dozens. Every different kind of chocolate bar, brand of gum, and flavor of jelly beans beckoned from George’s Candy Wall. For a 6 or 7-year-old kid, it was just too much. All these choices literally paralyzed me. Ten minutes of indecisiveness and several ultimatums later my parents would usher me out of the restaurant, usually empty-handed and crying. Even on the rare occasions when I did settle on something, I spent the rest of the afternoon lamenting my decision, thinking I left behind something that I would have enjoyed more.

When it comes to the multitude of great support and learning resources we offer new users of SAS, I often wonder if it can feel like you’re staring at George’s Candy Wall as well. While support.sas.com remains the holy grail of SAS customer support, there are so many good choices, it can sometimes be hard to know where to start. That’s why we’ve put together a new resource to make things easier for new SAS users: the SAS Starter Kit.

Need help navigating SAS Support Resources? Here’s your guide

SAS Support ResourcesThe SAS Starter Kit is the perfect place for SAS newbies to start, outlining the five essential steps to help you learn the basics, grow your skills and connect with other users from around the world.

Step 1 invites you to create a SAS profile. A profile provides you access to things like free, on-demand training, software downloads and access to our SAS Communities, where you can ask questions, get answers and connect with SAS experts from nearly every industry and around the world. You can

Step 2 is your SAS Resource Cheat Sheet. SAS Cares is your one stop listing of all the SAS resources you’ll ever need. Add it to your web favorites or print it out and add a little color to your cube. Keep this one close; it provides quick, one-click access to some of SAS’ most helpful resources.

Step 3 is designed to expand your SAS knowledge. This step introduces you to a full menu of free tutorials to binge watch, a number of free e-courses for a deeper dive and a number of other learning resources from e-books to webinars and more.

Step 4 is the perfect resource if you’re completely new to SAS or just trying something new. Our New SAS User Community is a great place to get coding help, share ideas and best practices, or just lurk! Our SAS Communities have more than 200,000 members ready to help get you unstuck or share what they know.

Finally, Step 5 introduces you to product-specific resources to help develop your skills with your specific tools. Here you’ll find the latest product news, code samples, and step-by-step instructional resources to guide you through common tasks using your product of choice.

I hope you find the SAS Starter Kit a sweet addition to your SAS toolkit.

Five essential steps to getting started with SAS

Navigating the Candy Wall of SAS Support Resources was published on SAS Users.

12月 102018
The best way to spread Christmas cheer
is singing loud for all to hear!
-Buddy in Elf

In the Christmas movie Elf (2003), Jovie (played by Zooey Deschanel) must "spread Christmas cheer" to help Santa. She chooses to sing "Santa Claus is coming to town," and soon all of New York City is singing along.

The best sing-along songs are short and have lyrics that repeat. Jovie's choice, "Santa Claus is coming to town," satisfies both criteria. The musical structure of the song is simple:

  • Verse 1: You better watch out / You better not cry / Better not pout / I'm telling you why
  • Tag line: Santa Claus is coming to town
  • Verse 2: He's making a list / And checking it twice; / Gonna find out / Who's naughty and nice
  • Tag line repeats
  • Bridge: He sees you when you're sleeping / He knows when you're awake / He knows if you've been bad or good / So be good for goodness sake! / O!
  • Verse 1 repeats
  • Partial tags and final tag: Santa Claus is coming / Santa Claus is coming / Santa Claus is coming to town

There is a fun way to visualize repetition in song lyrics. For a song that has N words, you can define the repetition matrix to be the N x N matrix where the (i,j)th cell has the value 1 if the i_th word is the same as the j_th word. Otherwise, the (i,j)th cell equals 0. You can visualize the matrix by using a two-color heat map. Colin Morris has a web site devoted to these visualizations.

The following image visualizes the lyrics of "Santa Claus is coming to town." I have added some vertical and horizontal lines to divide the lyrics into seven sections: the verses (V1 and V2), the tag line (S), and the bridge (B).

The image shows the structure of the repetition in the song lyrics:

  • The first verse contains the repetition of the words 'you', 'better', and 'not'.
  • The second verse repeats only the word 'out' from Verse 1.
  • The bridge repeats the word 'you', which appeared three times in Verse 1. It also repeats several words ('when', 'knows', 'good', ...) within the bridge.
  • The tag line "Santa Claus is coming [to town]" is repeated a total of five times.

Now that you understand what a repetition matrix looks like and how to interpret it, let's visualize a few other classic Christmas songs that contain repetitive lyrics! To help "spread Christmas cheer," I'll use shades of red and green to visualize the lyrics, rather than the boring white and black colors.

The Twelve Days of Christmas

If you make a list of Christmas songs that have repetition, chances are "The Twelve Days of Christmas" will be at the top of the list. The song is formulaic: each new verse adds a few new words before repeating the words from the previous verse. As a result, the repetition matrix is almost boring in its regularity. Here is the visualization of the classic song (click to enlarge):

Little Drummer Boy

Another highly repetitive Christmas song is "The Little Drummer Boy," which features an onomatopoeic phrase (Pa rum pum pum pum) that alternates with the other lyrics. A visualization of the classic song is shown below:

Silver Bells

In addition to repeating the title, "Silver Bells" repeats several phrases. Most notably, the phrase "Soon it will be Christmas Day" is repeated multiple times at the end of the song. Because only certain phrases are repeated, the visualization has a pleasing structure that complements the song's lyrical qualities:

Silent Night

To contrast the hustle, bustle, and commercialism of Christmas, I enjoy hearing songs that are musically simple. One of my favorites is "Silent Night." Each verse is distinct, yet each begins with "Silent night, holy night!" and ends by repeating a phrase. The resulting visualization is devoid of clutter. It is visually empty and matches the lyrical imagery, "all is calm, all is bright."

Your turn!

You can download the SAS program that creates these images. The program also computes visualizations of some contemporary songs such as "Last Christmas" by Wham!, "Someday at Christmas" (Stevie Wonder version), "Rockin' Around the Christmas Tree" (Brenda Lee version), and "Happy XMas (War Is Over)" by John Lennon and Yoko Ono. If you have access to SAS, you can even add your own favorite lyrics to the program! If you don't have access to SAS, Colin Morris's website enables you to paste in the lyrics and see the visualization.

In a little-known "deleted scene" from Elf, Buddy says that the second-best way to spread Christmas cheer is posting images for all to share! So post a comment and share your favorite visualization of a Christmas song!

Happy holidays to all my readers. I am grateful for you. Merry Christmas to all, and to all a good night!

The post Visualize Christmas songs appeared first on The DO Loop.

12月 072018

Once again, I have chosen to take a traditional Christmas song or carol and create a fun technology-related version of it to share with you. This is the fifth year and the eighth song, so I hope you enjoy your 2018 holiday song. Grandma got over run by a neural [...]

Grandma got over run by a neural network was published on SAS Voices by David Pope

12月 072018

There is one equation every retail store, call center, traffic, airport or hospital manager should know by heart.  No, it’s not E = mc².  The one I had in mind is this: W = 1 / (μ – λ) It may not look like much, but it can mean the [...]

Analytics you can use: Manage your queue for better customer service was published on SAS Voices by Leo Sadovy

12月 072018

There is one equation every retail store, call center, traffic, airport or hospital manager should know by heart.  No, it’s not E = mc².  The one I had in mind is this: W = 1 / (μ – λ) It may not look like much, but it can mean the [...]

Analytics you can use: Manage your queue for better customer service was published on SAS Voices by Leo Sadovy

12月 062018

A typical day brings countless business decisions that affect everything from profitability to customer experience. What is a reasonable price point? Which audience segments should I personalize offers for? When should I recommend specific content earlier in a customer journey? Daily decisions like these can alter the trajectory of a [...]

SAS Customer Intelligence 360: Decision management, machine learning, and digital marketing was published on Customer Intelligence Blog.

12月 052018

Recently a SAS programmer wanted to obtain a table of counts that was based on a histogram. I showed him how you can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to obtain that information. For example, the following call to PROC UNIVARIATE creates a histogram for the MPG_City variable in the Sashelp.Cars data set. The histogram has 11 bins. The OUTHIST= option writes the counts for each bin to a SAS data set:

proc univariate data=Sashelp.Cars noprint;
   var MPG_City;
   histogram MPG_City / barlabel=count outhist=MidPtOut;
proc print data=MidPtOut label;
   label _MIDPT_ = "Midpoint" _COUNT_="Frequency";
   var _MIDPT_ _COUNT_;

Endpoints versus midpoints

As I've previously discussed, PROC UNIVARIATE supports two options for specifying the locations of bins. The MIDPOINTS option specifies that "nice" numbers (for example, multiples of 2, 5, or 10) are used for the midpoints of the bins; the ENDPOINTS option specifies that nice numbers are used for the endpoints of the bins; By default, midpoints are used, as shown in the previous section. The following call to PROC UNIVARIATE uses the ENDPOINTS option and writes the new bin counts to a data set. The histogram is not shown.

proc univariate data=Sashelp.Cars noprint;
   var MPG_City;
   histogram MPG_City / barlabel=count endpoints outhist=EndPtOut;
proc print data=EndPtOut;
   label _MINPT_ = "Left Endpoint" _COUNT_="Frequency";
   var _MINPT_ _COUNT_;

Tabulating counts in the SAS/IML language

If you want to "manually" count the number of observations in each bin, you have a few choices. If you already know the bin width and anchor position for the bins, then you can use a DATA step array to accumulate the counts. You can also use PROC FORMAT to define a format to bin the observations and use PROC FREQ to tabulate the counts.

The harder problem is when you do not have a prior set of "nice" values to use as the endpoints of bins. It is usually not satisfactory to use the minimum and maximum data values as endpoints of the binning intervals because that might result in intervals whose endpoints are long decimal values such as [3.4546667 4.0108333].

Fortunately, the SAS/IML language provides the GSCALE subroutine, which computes "nice" values from a vector of data and the number of bins. The GSCALE routine returns a three-element vector. The first element is the minimum value of the leftmost interval, the second element is the maximum value of the rightmost interval, and the third element is the bin width. For example, the following SAS/IML statements compute nice intervals for the data in the MPG_City variable:

proc iml;
use Sashelp.Cars;
   read all var "MPG_City" into X;
/* GSCALE subroutine computes "nice" tick values: s[1]<=min(x); s[2]>=max(x) */
call gscale(s, x, 10);  /* ask for about 10 intervals */
print s[rowname={"Start" "Stop" "Increment"}];

The output from the GSCALE subroutine suggests that a good set of intervals to use for binning the data are [10, 15), [15, 20), ..., [55, 60]. These are the same endpoints that are generated by using the ENDPOINTS option in PROC UNIVARIATE. (Actually, the procedure uses half-open intervals for all bins, so it adds the extra interval [60, 65) to the histogram.)

I've previously shown how to use the BIN and TABULATE functions in SAS/IML to count the observations in a set of bins. The following statements use the values from the GSCALE routine to form evenly spaced cutpoints for the binning:

cutPoints = do(s[1], s[2], s[3]);    /* use "nice" cutpoints from GSCALE */
*cutPoints = do(s[1], s[2]+s[3], s[3]);  /* ALTERNATIVE: add additional cutpoint to match UNIVARIATE */
b = bin(x, cutPoints);               /* find bin for each obs */
call tabulate(bins, freq, b);        /* count how many obs in each bin */
binLabels = char(cutPoints[bins]);   /* use left endpoint as labels for bins */
print freq[colname = binLabels label="Count"];

Except for the last interval, the counts are the same as for the ENDPOINTS option in PROC UNIVARIATE. It is a matter of personal preference whether you want to treat the last interval as a closed interval or whether you want all intervals to be half open. If you want to exactly match PROC UNIVARIATE, you can modify the definition of the cutPoints variable, as indicated in the program comments.

Notice that the TABULATE routine only reports the bins that have nonzero counts. If you prefer to obtain counts for ALL bins—even bins with zero counts—you can use the TabulateLevels module, which I described in a previous blog post.

In summary, you can use PROC UNIVARIATE or SAS/IML to create a tabular representation of a histogram. Both procedures provide a way to obtain "nice" values for the bin endpoints. If you already know the endpoints for the bins, you can use other techniques in SAS to produce the table.

The post When is a histogram not a histogram? When it's a table! appeared first on The DO Loop.