Certain North Carolina counties have been in the news lately, for suspected election fraud involving absentee ballots in the 2018 election. Let's analyze the voter registration and absentee ballot data, to see if we can detect anything suspicious! In order to definitively determine whether fraud & illegal activity occurred, investigators [...]
This article describes best practices and techniques that every data analyst should know before bootstrapping in SAS.
The bootstrap method is a powerful statistical technique, but it can be a challenge to implement it efficiently.
An inefficient bootstrap program can take hours to run, whereas a well-written program can give you an answer in an instant.
If you prefer "instants" to "hours," this article is for you! I’ve compiled dozens of resources that explain how to compute bootstrap statistics in SAS.
Overview: What is the bootstrap method?
Recall that a bootstrap analysis enables you to investigate the sampling variability of a statistic without making any distributional assumptions about the population. For example, if you compute the skewness of a univariate sample, you get an estimate for the skewness of the population. You might want to know the range of skewness values that you might observe from a second sample (of the same size) from the population. If the range is large, the original estimate is imprecise. If the range is small, the original estimate is precise. Bootstrapping enables you to estimate the range by using only the observed data.
In general, the basic bootstrap method consists of four steps:
Use the bootstrap distribution to obtain estimates for the bias and standard error of the statistic and confidence intervals for parameters.
The links in the previous list provide examples of best practices for bootstrapping in SAS. In particular, do not fall into the trap of using a macro loop to "resample, analyze, and append." You will eventually get the correct bootstrap estimates, but you might wait a long time to get them!
The remainder of this article is organized by the three ways to perform bootstrapping in SAS:
Programming: You can write a SAS DATA step program or a SAS/IML program that resamples from the data and analyzes each (re)sample. The programming approach gives you complete control over all aspects of the bootstrap analysis.
Macros: You can use the %BOOT and %BOOTCI macros that are supplied by SAS. The macros handle a wide variety of common bootstrap analyses.
Procedures: You can use bootstrap options that are built into several SAS procedures. The procedure internally implements the bootstrap method for a particular set of statistics.
Programming the basic bootstrap in SAS
The articles in this section describe how to program the bootstrap method in SAS for basic univariate analyses, for regression analyses, and for related resampling techniques such as the jackknife and permutation tests. This section also links to articles that describe how to generate bootstrap samples in SAS.
Examples of basic bootstrap analyses in SAS
The basic bootstrap in SAS:
SAS enables you to resample the data by using PROC SURVEYSELECT. When coupled with BY-group processing, you can perform a very efficient bootstrap analysis in SAS, including the estimate of standard errors and percentile-based confidence intervals.
The smooth bootstrap: As originally conceived,
a bootstrap sample contains replicates of the data. However, there are situations when "jittering" the data provides a better approximation of the sampling distribution.
Examples of bootstrapping for regression statistics
When you bootstrap regression statistics, you have two choices for generating the bootstrap samples:
You can resample the observations (cases) to obtain bootstrap samples of the responses and the explanatory variables.
Alternatively, you can bootstrap regression parameters by fitting a model and resampling from the residuals to obtain new responses.
Jackknife and permutation tests in SAS
The jackknife method: The jackknife in an alternative nonparametric method for obtaining standard errors for statistics. It is deterministic because it uses leave-one-out samples rather than random samples.
Permutation tests: A
permutation test is a resampling technique that is closely related to the bootstrap. You permute the observations between two groups to test whether the groups are significantly different.
Generate bootstrap sampling
An important part of a bootstrapping is generating multiple bootstrap samples from the data. In SAS, there are many ways to obtain the bootstrap samples:
Many SAS procedures not only compute statistics but also provide standard errors or confidence intervals that enable you to infer whether an estimate is precise. Many confidence intervals are based on distributional assumptions about the population. ("If the errors are normally distributed, then....") However, the following SAS procedures provide an easy way to obtain a distribution-free confidence interval by using the bootstrap. See the SAS/STAT documentation for the syntax for each procedure.
PROC CAUSALMED introduced the BOOTSTRAP statement in SAS/STAT 14.3 (SAS 9.4M5). The statement enables you to compute bootstrap estimates of standard errors and confidence intervals for various effects and percentages of total effects.
PROC MULTTEST supports the BOOTSTRAP and PERMUTATION options, which enable you to compute estimates of p-values that make no distributional assumptions.
PROC NLIN supports the BOOTSTRAP statement, which computes bootstrap confidence intervals for parameters and bootstrap estimates of the covariance of the parameter estimates.
Resampling techniques such as bootstrap methods and permutation tests are widely used by modern data analysts. But how you implement these techniques can make a huge difference between getting the results in a few seconds versus a few hours.
This article summarizes and consolidates many previous articles that demonstrate how to perform an efficient bootstrap analysis in SAS. Bootstrapping enable you to investigate the sampling variability of a statistic without making any distributional assumptions. In particular, the bootstrap is often used to estimate standard errors and confidence intervals for parameters.
In her role as Product Manager for SAS Platform Technologies (including the SAS Add-In for Microsoft Office), my colleague Amy Peters hears this question often. With many organizations adopting Microsoft Office 365 -- the "cloud" version of Office -- what does this mean for other processes that integrate with Microsoft Office applications?
Microsoft has used different names for these similar offerings: Office 2016, Office 365, Microsoft 365, Office Online. The bottom line is that most users of a "365" package in the cloud, also have access to the Microsoft Office tools on their Windows desktop. They can use the full version of Excel, PowerPoint, Word, etc., and they also have access to these same tools via a web browser. At SAS, we recently experienced this transition ourselves. Have the Office applications on our desktops vanished? No, they have not. While more of our data is now on the cloud (looking at you, OneDrive), it's not really changing how we work, especially when creating/maintaining content. (Like many organizations, we already had one foot in this world by using Microsoft SharePoint for collaboration.)
Collaboration on the web. Full control on the desktop
Let's look at an example of how I use SAS with Microsoft Office. First, I create a report in SAS Visual Analytics. Then I open Excel on my desktop and use the SAS Add-In for Microsoft Office to embed the shared report into my spreadsheet. Want to see what that looks like in action? Check out this video Tech Talk with SAS developer Tim Beese.
Now suppose that I share this content in Microsoft OneDrive, and my colleague views it in Excel in a web browser. Yep, the content is still there. The difference is that the content is not dynamic like it is on my desktop. So what do you do when you want to edit that spreadsheet displaying in the browser? You select Open in Excel and the document opens on your desktop. Voila! The content is dynamic and you have all the functionality the SAS Add-In for Microsoft Office provides.
How is Microsoft Office 365 changing your workflow?
Today, the expectation of most users working with "Office Online" applications in their browsers is that it's primarily for viewing and basic editing. Will this change? Probably. We're researching how to provide more of the SAS Add-In for Microsoft Office function in a browser app. If you or your colleagues need this browser-based function – you want to do something specific in Excel with your SAS content -- we want to hear from you. And do you have a plan to move completely to browser-based Office apps? Currently you can't create SAS content from a browser-based Office app. If that's a pressing need, we would like to know. For now, we're not hearing of use cases where some form of the desktop app isn't still in the picture.
SAS integration with these everyday productivity tools, like Microsoft Office, is important to us. Don't forget about these SAS programming methods to create and read your Microsoft Office content:
At the data level, with PROC IMPORT, PROC EXPORT and LIBNAME XSLX to read and write Microsoft Excel files.
We got our first 'big' snow of the season here at the SAS headquarters in Cary, NC ... therefore I thought this would be a great time to dig into some snow data! Follow along and pick up some tips & tricks as I plot our snow data - and [...]
I promised in my previous post on automated segment comparisons that I would reveal more about how SAS measures differences between segment profiles. To recap, we wanted to have a method that would determine: If two segments are different in a meaningful way. By how much? What descriptive attributes best [...]
When I was growing up, there were two kinds of Sundays: regular Sundays and George Sundays. George was the proprietor of a local Italian restaurant in my hometown and hosted the extended LaRusso clan for Sunday lunch every few weeks. His restaurant, appropriately named George’s, owns some of my favorite childhood memories – and some of my worst.
Every couple of months, my aunts, uncles, a baker’s dozen of cousins, and my immediate family members would take over George’s backroom and see if we could challenge the city’s noise ordinance. George would do nothing to discourage us, appearing every so often to fire balls of uncooked dough at us or ply us with more caffeine-laced sugary drinks, despite instructions to the contrary from our parents.
Invariably, though, an otherwise pleasant afternoon took a turn for the worse as we were leaving the restaurant. That was when my parents, thinking they were doing us a favor, would let us choose one item off George’s famous “candy wall.” You see, George didn’t stock just one or two different kinds of candy, he had dozens. Every different kind of chocolate bar, brand of gum, and flavor of jelly beans beckoned from George’s Candy Wall. For a 6 or 7-year-old kid, it was just too much. All these choices literally paralyzed me. Ten minutes of indecisiveness and several ultimatums later my parents would usher me out of the restaurant, usually empty-handed and crying. Even on the rare occasions when I did settle on something, I spent the rest of the afternoon lamenting my decision, thinking I left behind something that I would have enjoyed more.
When it comes to the multitude of great support and learning resources we offer new users of SAS, I often wonder if it can feel like you’re staring at George’s Candy Wall as well. While support.sas.com remains the holy grail of SAS customer support, there are so many good choices, it can sometimes be hard to know where to start. That’s why we’ve put together a new resource to make things easier for new SAS users: the SAS Starter Kit.
Need help navigating SAS Support Resources? Here’s your guide
The SAS Starter Kit is the perfect place for SAS newbies to start, outlining the five essential steps to help you learn the basics, grow your skills and connect with other users from around the world.
Step 1 invites you to create a SAS profile. A profile provides you access to things like free, on-demand training, software downloads and access to our SAS Communities, where you can ask questions, get answers and connect with SAS experts from nearly every industry and around the world. You can
Step 2 is your SAS Resource Cheat Sheet. SAS Cares is your one stop listing of all the SAS resources you’ll ever need. Add it to your web favorites or print it out and add a little color to your cube. Keep this one close; it provides quick, one-click access to some of SAS’ most helpful resources.
Step 3 is designed to expand your SAS knowledge. This step introduces you to a full menu of free tutorials to binge watch, a number of free e-courses for a deeper dive and a number of other learning resources from e-books to webinars and more.
Step 4 is the perfect resource if you’re completely new to SAS or just trying something new. Our New SAS User Community is a great place to get coding help, share ideas and best practices, or just lurk! Our SAS Communities have more than 200,000 members ready to help get you unstuck or share what they know.
Finally, Step 5 introduces you to product-specific resources to help develop your skills with your specific tools. Here you’ll find the latest product news, code samples, and step-by-step instructional resources to guide you through common tasks using your product of choice.
The best way to spread Christmas cheer
is singing loud for all to hear!
-Buddy in Elf
In the Christmas movie Elf (2003), Jovie (played by Zooey Deschanel) must "spread Christmas cheer" to help Santa. She chooses to sing "Santa Claus is coming to town," and soon all of New York City is singing along.
The best sing-along songs are short and have lyrics that repeat. Jovie's choice, "Santa Claus is coming to town," satisfies both criteria. The musical structure of the song is simple:
Verse 1: You better watch out /
You better not cry /
Better not pout /
I'm telling you why
Tag line: Santa Claus is coming to town
Verse 2: He's making a list /
And checking it twice; /
Gonna find out /
Who's naughty and nice
Tag line repeats
Bridge: He sees you when you're sleeping /
He knows when you're awake /
He knows if you've been bad or good /
So be good for goodness sake! /
Verse 1 repeats
Partial tags and final tag: Santa Claus is coming /
Santa Claus is coming /
Santa Claus is coming to town
The following image visualizes the lyrics of "Santa Claus is coming to town." I have added some vertical and horizontal lines to divide the lyrics into seven sections: the verses (V1 and V2), the tag line (S), and the bridge (B).
The image shows the structure of the repetition in the song lyrics:
The first verse contains the repetition of the words 'you', 'better', and 'not'.
The second verse repeats only the word 'out' from Verse 1.
The bridge repeats the word 'you', which appeared three times in Verse 1. It also repeats several words ('when', 'knows', 'good', ...) within the bridge.
The tag line "Santa Claus is coming [to town]" is repeated a total of five times.
Now that you understand what a repetition matrix looks like and how to interpret it, let's visualize a few other classic Christmas songs that contain repetitive lyrics! To help "spread Christmas cheer," I'll use shades of red and green to visualize the lyrics, rather than the boring white and black colors.
The Twelve Days of Christmas
If you make a list of Christmas songs that have repetition, chances are "The Twelve Days of Christmas" will be at the top of the list. The song is formulaic: each new verse adds a few new words before repeating the words from the previous verse. As a result, the repetition matrix is almost boring in its regularity. Here is the visualization of the classic song (click to enlarge):
Little Drummer Boy
Another highly repetitive Christmas song is "The Little Drummer Boy," which features an onomatopoeic phrase (Pa rum pum pum pum) that alternates with the other lyrics. A visualization of the classic song is shown below:
In addition to repeating the title, "Silver Bells" repeats several phrases. Most notably, the phrase "Soon it will be Christmas Day" is repeated multiple times at the end of the song. Because only certain phrases are repeated, the visualization has a pleasing structure that complements the song's lyrical qualities:
To contrast the hustle, bustle, and commercialism of Christmas, I enjoy hearing songs that are musically simple. One of my favorites is "Silent Night." Each verse is distinct, yet each begins with "Silent night, holy night!" and ends by repeating a phrase. The resulting visualization is devoid of clutter. It is visually empty and matches the lyrical imagery, "all is calm, all is bright."
In a little-known "deleted scene" from Elf, Buddy says that the second-best way to spread Christmas cheer is posting images for all to share! So post a comment and share your favorite visualization of a Christmas song!
Happy holidays to all my readers. I am grateful for you. Merry Christmas to all, and to all a good night!