12月 142018
 

Certain North Carolina counties have been in the news lately, for suspected election fraud involving absentee ballots in the 2018 election.  Let's analyze the voter registration and absentee ballot data, to see if we can detect anything suspicious! In order to definitively determine whether fraud & illegal activity occurred, investigators [...]

The post Looking for indications of fraud, in North Carolina's absentee ballots appeared first on SAS Learning Post.

12月 122018
 

This article describes best practices and techniques that every data analyst should know before bootstrapping in SAS. The bootstrap method is a powerful statistical technique, but it can be a challenge to implement it efficiently. An inefficient bootstrap program can take hours to run, whereas a well-written program can give you an answer in an instant. If you prefer "instants" to "hours," this article is for you! I’ve compiled dozens of resources that explain how to compute bootstrap statistics in SAS.

Overview: What is the bootstrap method?

Recall that a bootstrap analysis enables you to investigate the sampling variability of a statistic without making any distributional assumptions about the population. For example, if you compute the skewness of a univariate sample, you get an estimate for the skewness of the population. You might want to know the range of skewness values that you might observe from a second sample (of the same size) from the population. If the range is large, the original estimate is imprecise. If the range is small, the original estimate is precise. Bootstrapping enables you to estimate the range by using only the observed data.

In general, the basic bootstrap method consists of four steps:

  1. Compute a statistic for the original data.
  2. Use the DATA step or PROC SURVEYSELECT to resample (with replacement) B times from the data. The resampling process should respect the null hypothesis or reflect the original sampling scheme. For efficiency, you should put all B random bootstrap samples into a single data set.
  3. Use BY-group processing to compute the statistic of interest on each bootstrap sample. The BY-group approach is much faster than using macro loops. The union of the statistic is the bootstrap distribution, which approximates the sampling distribution of the statistic under the null hypothesis. Don't forget to turn off ODS when you run BY-group processing!
  4. Use the bootstrap distribution to obtain estimates for the bias and standard error of the statistic and confidence intervals for parameters.

The links in the previous list provide examples of best practices for bootstrapping in SAS. In particular, do not fall into the trap of using a macro loop to "resample, analyze, and append." You will eventually get the correct bootstrap estimates, but you might wait a long time to get them!

The remainder of this article is organized by the three ways to perform bootstrapping in SAS:

  • Programming: You can write a SAS DATA step program or a SAS/IML program that resamples from the data and analyzes each (re)sample. The programming approach gives you complete control over all aspects of the bootstrap analysis.
  • Macros: You can use the %BOOT and %BOOTCI macros that are supplied by SAS. The macros handle a wide variety of common bootstrap analyses.
  • Procedures: You can use bootstrap options that are built into several SAS procedures. The procedure internally implements the bootstrap method for a particular set of statistics.

Programming the basic bootstrap in SAS

The articles in this section describe how to program the bootstrap method in SAS for basic univariate analyses, for regression analyses, and for related resampling techniques such as the jackknife and permutation tests. This section also links to articles that describe how to generate bootstrap samples in SAS.

Examples of basic bootstrap analyses in SAS

  • The basic bootstrap in SAS: SAS enables you to resample the data by using PROC SURVEYSELECT. When coupled with BY-group processing, you can perform a very efficient bootstrap analysis in SAS, including the estimate of standard errors and percentile-based confidence intervals.
  • The basic bootstrap in SAS/IML: The SAS/IML language provides a compact language for bootstrapping, as shown in this basic bootstrap example.
  • The smooth bootstrap: As originally conceived, a bootstrap sample contains replicates of the data. However, there are situations when "jittering" the data provides a better approximation of the sampling distribution.
  • Bias-corrected and adjusted (BCa) confidence interval: For highly skewed data, the percentile-based confidence intervals are less efficient than the BCa confidence interval.
  • Bootstrap the difference of means between two groups: This example shows how to bootstrap a statistic in a two-sample t test.

Examples of bootstrapping for regression statistics

When you bootstrap regression statistics, you have two choices for generating the bootstrap samples:

  • Case resampling: You can resample the observations (cases) to obtain bootstrap samples of the responses and the explanatory variables.
  • Residual resampling: Alternatively, you can bootstrap regression parameters by fitting a model and resampling from the residuals to obtain new responses.

Jackknife and permutation tests in SAS

  • The jackknife method: The jackknife in an alternative nonparametric method for obtaining standard errors for statistics. It is deterministic because it uses leave-one-out samples rather than random samples.
  • Permutation tests: A permutation test is a resampling technique that is closely related to the bootstrap. You permute the observations between two groups to test whether the groups are significantly different.

Generate bootstrap sampling

An important part of a bootstrapping is generating multiple bootstrap samples from the data. In SAS, there are many ways to obtain the bootstrap samples:

  • Sample with replacement: The most common resampling technique is to randomly sample with replacement from the data. You can use the SAS DATA step, the SURVEYSELECT procedure, or the SAMPLE function in SAS/IML.
  • Samples in random order: It is sometimes useful to generate random samples in which the order of the observations is randomly permuted.
  • Balanced bootstrap resampling: Instead of random samples, some experts advocate a resampling algorithm in which each observation appears exactly B times in the union of the B bootstrap samples.

Bootstrap macros in SAS

The SAS-supplied macros %BOOT, %JACK, and %BOOTCI, can perform basic bootstrap analyses and jackknife analyses. However, they require a familiarity with writing and using SAS macros. If you are interested, I wrote an example that shows how to use the %BOOT and %BOOTCI macros for bootstrapping. The documentation also provides several examples.

SAS procedures that support bootstrapping

Many SAS procedures not only compute statistics but also provide standard errors or confidence intervals that enable you to infer whether an estimate is precise. Many confidence intervals are based on distributional assumptions about the population. ("If the errors are normally distributed, then....") However, the following SAS procedures provide an easy way to obtain a distribution-free confidence interval by using the bootstrap. See the SAS/STAT documentation for the syntax for each procedure.

  • PROC CAUSALMED introduced the BOOTSTRAP statement in SAS/STAT 14.3 (SAS 9.4M5). The statement enables you to compute bootstrap estimates of standard errors and confidence intervals for various effects and percentages of total effects.
  • PROC MULTTEST supports the BOOTSTRAP and PERMUTATION options, which enable you to compute estimates of p-values that make no distributional assumptions.
  • PROC NLIN supports the BOOTSTRAP statement, which computes bootstrap confidence intervals for parameters and bootstrap estimates of the covariance of the parameter estimates.
  • PROC QUANTREG supports the CI=RESAMPLING option to construct confidence intervals for regression quantiles.
  • The SURVEYMEANS, SURVEYREG, SURVEYLOGISTIC, SURVEYPHREG, SURVEYIMPUTE and SURVEYFREQ procedures introduced the VARMETHOD=BOOTSTRAP option SAS 9.4M5. The option enables you to compute bootstrap estimates of variance. With the exception of SURVEYIMPUTE, these procedures also support jackknife estimates. The jackknife is similar to the bootstrap but uses a leave-one-out deterministic scheme rather than random resampling.
  • PROC TTEST introduced the BOOTSTRAP statement in SAS/STAT 14.3. The statement enables you to compute bootstrap standard error, bias estimates, and confidence limits for means and standard deviations in t tests. In SAS/STAT 15.1 (SAS 9.4M6), the TTEST procedure provides extensive graphics that visualize the bootstrap distribution.

Summary

Resampling techniques such as bootstrap methods and permutation tests are widely used by modern data analysts. But how you implement these techniques can make a huge difference between getting the results in a few seconds versus a few hours. This article summarizes and consolidates many previous articles that demonstrate how to perform an efficient bootstrap analysis in SAS. Bootstrapping enable you to investigate the sampling variability of a statistic without making any distributional assumptions. In particular, the bootstrap is often used to estimate standard errors and confidence intervals for parameters.

Further Reading

The post The essential guide to bootstrapping in SAS appeared first on The DO Loop.

12月 122018
 

In her role as Product Manager for SAS Platform Technologies (including the SAS Add-In for Microsoft Office), my colleague Amy Peters hears this question often. With many organizations adopting Microsoft Office 365 -- the "cloud" version of Office -- what does this mean for other processes that integrate with Microsoft Office applications?

Microsoft has used different names for these similar offerings: Office 2016, Office 365, Microsoft 365, Office Online. The bottom line is that most users of a "365" package in the cloud, also have access to the Microsoft Office tools on their Windows desktop. They can use the full version of Excel, PowerPoint, Word, etc., and they also have access to these same tools via a web browser. At SAS, we recently experienced this transition ourselves. Have the Office applications on our desktops vanished? No, they have not. While more of our data is now on the cloud (looking at you, OneDrive), it's not really changing how we work, especially when creating/maintaining content. (Like many organizations, we already had one foot in this world by using Microsoft SharePoint for collaboration.)

Collaboration on the web. Full control on the desktop

Let's look at an example of how I use SAS with Microsoft Office. First, I create a report in SAS Visual Analytics. Then I open Excel on my desktop and use the SAS Add-In for Microsoft Office to embed the shared report into my spreadsheet. Want to see what that looks like in action? Check out this video Tech Talk with SAS developer Tim Beese.

Now suppose that I share this content in Microsoft OneDrive, and my colleague views it in Excel in a web browser. Yep, the content is still there. The difference is that the content is not dynamic like it is on my desktop. So what do you do when you want to edit that spreadsheet displaying in the browser? You select Open in Excel and the document opens on your desktop. Voila! The content is dynamic and you have all the functionality the SAS Add-In for Microsoft Office provides.

How is Microsoft Office 365 changing your workflow?

Today, the expectation of most users working with "Office Online" applications in their browsers is that it's primarily for viewing and basic editing. Will this change? Probably. We're researching how to provide more of the SAS Add-In for Microsoft Office function in a browser app. If you or your colleagues need this browser-based function – you want to do something specific in Excel with your SAS content -- we want to hear from you. And do you have a plan to move completely to browser-based Office apps? Currently you can't create SAS content from a browser-based Office app. If that's a pressing need, we would like to know. For now, we're not hearing of use cases where some form of the desktop app isn't still in the picture.

SAS integration with these everyday productivity tools, like Microsoft Office, is important to us. Don't forget about these SAS programming methods to create and read your Microsoft Office content:

How are you using Microsoft Office 365 with SAS? How do you think this workflow will change for you in the next year or two? Leave a comment -- we would love to hear from you.

The post Does SAS support Microsoft Office 365? appeared first on The SAS Dummy.

12月 122018
 

I promised in my previous post on automated segment comparisons that I would reveal more about how SAS measures differences between segment profiles. To recap, we wanted to have a method that would determine: If two segments are different in a meaningful way. By how much? What descriptive attributes best [...]

Segment comparisons – How to measure the difference was published on Customer Intelligence Blog.

12月 102018
 

When I was growing up, there were two kinds of Sundays: regular Sundays and George Sundays. George was the proprietor of a local Italian restaurant in my hometown and hosted the extended LaRusso clan for Sunday lunch every few weeks. His restaurant, appropriately named George’s, owns some of my favorite childhood memories – and some of my worst.

Every couple of months, my aunts, uncles, a baker’s dozen of cousins, and my immediate family members would take over George’s backroom and see if we could challenge the city’s noise ordinance. George would do nothing to discourage us, appearing every so often to fire balls of uncooked dough at us or ply us with more caffeine-laced sugary drinks, despite instructions to the contrary from our parents.

Invariably, though, an otherwise pleasant afternoon took a turn for the worse as we were leaving the restaurant. That was when my parents, thinking they were doing us a favor, would let us choose one item off George’s famous “candy wall.” You see, George didn’t stock just one or two different kinds of candy, he had dozens. Every different kind of chocolate bar, brand of gum, and flavor of jelly beans beckoned from George’s Candy Wall. For a 6 or 7-year-old kid, it was just too much. All these choices literally paralyzed me. Ten minutes of indecisiveness and several ultimatums later my parents would usher me out of the restaurant, usually empty-handed and crying. Even on the rare occasions when I did settle on something, I spent the rest of the afternoon lamenting my decision, thinking I left behind something that I would have enjoyed more.

When it comes to the multitude of great support and learning resources we offer new users of SAS, I often wonder if it can feel like you’re staring at George’s Candy Wall as well. While support.sas.com remains the holy grail of SAS customer support, there are so many good choices, it can sometimes be hard to know where to start. That’s why we’ve put together a new resource to make things easier for new SAS users: the SAS Starter Kit.

Need help navigating SAS Support Resources? Here’s your guide

SAS Support ResourcesThe SAS Starter Kit is the perfect place for SAS newbies to start, outlining the five essential steps to help you learn the basics, grow your skills and connect with other users from around the world.

Step 1 invites you to create a SAS profile. A profile provides you access to things like free, on-demand training, software downloads and access to our SAS Communities, where you can ask questions, get answers and connect with SAS experts from nearly every industry and around the world. You can

Step 2 is your SAS Resource Cheat Sheet. SAS Cares is your one stop listing of all the SAS resources you’ll ever need. Add it to your web favorites or print it out and add a little color to your cube. Keep this one close; it provides quick, one-click access to some of SAS’ most helpful resources.

Step 3 is designed to expand your SAS knowledge. This step introduces you to a full menu of free tutorials to binge watch, a number of free e-courses for a deeper dive and a number of other learning resources from e-books to webinars and more.

Step 4 is the perfect resource if you’re completely new to SAS or just trying something new. Our New SAS User Community is a great place to get coding help, share ideas and best practices, or just lurk! Our SAS Communities have more than 200,000 members ready to help get you unstuck or share what they know.

Finally, Step 5 introduces you to product-specific resources to help develop your skills with your specific tools. Here you’ll find the latest product news, code samples, and step-by-step instructional resources to guide you through common tasks using your product of choice.

I hope you find the SAS Starter Kit a sweet addition to your SAS toolkit.

Five essential steps to getting started with SAS

Navigating the Candy Wall of SAS Support Resources was published on SAS Users.

12月 102018
 
The best way to spread Christmas cheer
is singing loud for all to hear!
-Buddy in Elf

In the Christmas movie Elf (2003), Jovie (played by Zooey Deschanel) must "spread Christmas cheer" to help Santa. She chooses to sing "Santa Claus is coming to town," and soon all of New York City is singing along.

The best sing-along songs are short and have lyrics that repeat. Jovie's choice, "Santa Claus is coming to town," satisfies both criteria. The musical structure of the song is simple:

  • Verse 1: You better watch out / You better not cry / Better not pout / I'm telling you why
  • Tag line: Santa Claus is coming to town
  • Verse 2: He's making a list / And checking it twice; / Gonna find out / Who's naughty and nice
  • Tag line repeats
  • Bridge: He sees you when you're sleeping / He knows when you're awake / He knows if you've been bad or good / So be good for goodness sake! / O!
  • Verse 1 repeats
  • Partial tags and final tag: Santa Claus is coming / Santa Claus is coming / Santa Claus is coming to town

There is a fun way to visualize repetition in song lyrics. For a song that has N words, you can define the repetition matrix to be the N x N matrix where the (i,j)th cell has the value 1 if the i_th word is the same as the j_th word. Otherwise, the (i,j)th cell equals 0. You can visualize the matrix by using a two-color heat map. Colin Morris has a web site devoted to these visualizations.

The following image visualizes the lyrics of "Santa Claus is coming to town." I have added some vertical and horizontal lines to divide the lyrics into seven sections: the verses (V1 and V2), the tag line (S), and the bridge (B).

The image shows the structure of the repetition in the song lyrics:

  • The first verse contains the repetition of the words 'you', 'better', and 'not'.
  • The second verse repeats only the word 'out' from Verse 1.
  • The bridge repeats the word 'you', which appeared three times in Verse 1. It also repeats several words ('when', 'knows', 'good', ...) within the bridge.
  • The tag line "Santa Claus is coming [to town]" is repeated a total of five times.

Now that you understand what a repetition matrix looks like and how to interpret it, let's visualize a few other classic Christmas songs that contain repetitive lyrics! To help "spread Christmas cheer," I'll use shades of red and green to visualize the lyrics, rather than the boring white and black colors.

The Twelve Days of Christmas

If you make a list of Christmas songs that have repetition, chances are "The Twelve Days of Christmas" will be at the top of the list. The song is formulaic: each new verse adds a few new words before repeating the words from the previous verse. As a result, the repetition matrix is almost boring in its regularity. Here is the visualization of the classic song (click to enlarge):

Little Drummer Boy

Another highly repetitive Christmas song is "The Little Drummer Boy," which features an onomatopoeic phrase (Pa rum pum pum pum) that alternates with the other lyrics. A visualization of the classic song is shown below:

Silver Bells

In addition to repeating the title, "Silver Bells" repeats several phrases. Most notably, the phrase "Soon it will be Christmas Day" is repeated multiple times at the end of the song. Because only certain phrases are repeated, the visualization has a pleasing structure that complements the song's lyrical qualities:

Silent Night

To contrast the hustle, bustle, and commercialism of Christmas, I enjoy hearing songs that are musically simple. One of my favorites is "Silent Night." Each verse is distinct, yet each begins with "Silent night, holy night!" and ends by repeating a phrase. The resulting visualization is devoid of clutter. It is visually empty and matches the lyrical imagery, "all is calm, all is bright."

Your turn!

You can download the SAS program that creates these images. The program also computes visualizations of some contemporary songs such as "Last Christmas" by Wham!, "Someday at Christmas" (Stevie Wonder version), "Rockin' Around the Christmas Tree" (Brenda Lee version), and "Happy XMas (War Is Over)" by John Lennon and Yoko Ono. If you have access to SAS, you can even add your own favorite lyrics to the program! If you don't have access to SAS, Colin Morris's website enables you to paste in the lyrics and see the visualization.

In a little-known "deleted scene" from Elf, Buddy says that the second-best way to spread Christmas cheer is posting images for all to share! So post a comment and share your favorite visualization of a Christmas song!

Happy holidays to all my readers. I am grateful for you. Merry Christmas to all, and to all a good night!

The post Visualize Christmas songs appeared first on The DO Loop.