When making a new piece of code, I like to use the smallest font I can read. This lets me fit more text on the screen at once. When presenting code to others, especially in a classroom setting, I like to make the font large enough to see from the back of the room. Here’s how I change font size in SAS in our three programming interfaces.
Ever since the Moneyball book & movie came out, athletes have been scrambling to use data and analytics to gain a competitive advantage. One of my favorite sports is boat racing - the ones you paddle. Follow along as I lead you through some maps and graphs I created for [...]
Bootstrap resampling is a powerful way to estimate the standard error for a statistic without making any parametric assumptions about its sampling distribution. The bootstrap method is often implemented by using a sequence of calls to resample from the data, compute a statistic on each sample, and analyze the bootstrap distribution. An example is provided in the article "Compute a bootstrap confidence interval in SAS." This process can be lengthy and in Base SAS it requires reading and writing a large amount of data. In SAS/STAT 14.3 (SAS 9.4m5), the TTEST procedure supports the BOOTSTRAP statement, which automatically performs a bootstrap analysis of one-sample and two-sample t tests. The BOOTSTRAP statement also applies to two-sample paired tests.
The difference of means between two groups
The BOOTSTRAP statement makes it easy to obtain bootstrap estimates of bias and standard error for a statistic and confidence intervals (CIs) for the underlying parameter. The BOOTSTRAP statement supports several estimates for the confidence intervals, including normal-based intervals, t-based intervals, percentile intervals, and bias-adjusted intervals. This section shows how to obtain bootstrap estimates for a two-sample t test. The statistic of interest is the difference between the means of two groups.
The following SAS DATA step subsets the Sashelp.Cars data to create a data set that contains only two types of vehicles: sedans and SUVs. A call to PROC UNIVARIATE displays a comparative histogram that shows the distributions of the MPG_City variable for each group. The MPG_City variable measures the fuel efficiency (in miles per gallon) for each vehicle during typical city driving.
/* create data set that has two categories: 'Sedan' and 'SUV' */ data Sample; set Sashelp.Cars(keep=Type MPG_City); if Type in ('Sedan' 'SUV'); run; proc univariate data=Sample; class Type; histogram MPG_City; inset N Mean Std Skew Kurtosis / position=NE; ods select histogram; run;
Bootstrap estimates for a two-sample t test
Suppose that you want to test whether the mean MPG of the "SUV" group is significantly different from the mean of the "Sedan" group. The groups appear to have different variances, so you would probably choose the Satterthwaite version of the t test, which accommodates different variances. You can use PROC TTEST to run a two-sample t test for these data, but in looking at the distributions of the groups, you might be concerned that the normality assumptions for the t test are not satisfied by these data. Notice that the distribution of the MPG_City variable for the "Sedan" group has high skewness (1.3) and moderately high kurtosis (1.9). Although the t test is somewhat robust to the normality assumption, you might want to use the bootstrap method to estimate the standard error and confidence interval for the difference of means between the two groups.
If you are using SAS/STAT 14.3, you can compute bootstrap estimates for a t test by using the BOOTSTRAP statement, as follows:
title "Bootstrap Estimates with Percentile CI"; proc ttest data=Sample; class Type; var MPG_City; bootstrap / seed=123 nsamples=10000 bootci=percentile; /* or BOOTCI=BC */ run;
The BOOTSTRAP statement supports three options:
- The SEED= option initializes the internal random number generator for the TTEST procedure.
- The NSAMPLES= option specifies the number of bootstrap resamples to be drawn from the data.
- The BOOTCI= option specifies the estimate for the confidence interval for the parameter. This example uses the PERCENTILE method, which uses the α/2 and 1 – α/2 quantiles of the bootstrap distribution as the endpoints of the confidence interval. A more sophisticated second-order method is the bias-corrected interval, which you can specify by using the BOOTCI=BC option. For educational purposes, you might want to compare these nonparametric estimates with more traditional estimates such as t-based confidence intervals (BOOTCI=TBOOTSE).
The TTEST procedure produces several tables and graphs, but I have highlighted a few statistics in two tables. The top table is the "ConfLimits" table, which is based on the data and shows the traditional statistics for the t test. The estimate for the difference in means between the "SUV" and "Sedan" groups is -4.98 and is highlighted in blue. The traditional (parametric) estimate for a 95% confidence interval is highlighted in red. The interval is [-5.87, -4.10], which does not contain 0, therefore you can conclude that the group means are significantly different at the 0.05 significance level.
The lower table is the "Bootstrap" table, which is based on the bootstrap resamples. The TTEST documentation explains the resampling process and the computation of the bootstrap statistics. The top row of the table shows estimates for the difference of means. The bootstrap estimate for the standard error is 0.45. The estimate of bias (which subtracts the average bootstrap statistic from the sample statistic) is -0.01, which is small. The percentile estimate for the confidence interval is [-5.87, -4.14], which is similar to the parametric interval estimate in the top table. (For comparison, the bias-adjusted CI is also similar: [-5.85, -4.12].) Every cell in this table will change if you change the SEED= or NSAMPLES= options because the values in this table are based on the bootstrap samples.
Although the difference of means is the most frequent statistic to bootstrap, you can see from the lower table that the BOOTSTRAP statement also estimates the standard error, bias, and confidence interval for the standard deviation of the difference. Although this article focuses on the two-sample t test, the BOOTSTRAP statement also applies to one sample t tests.
In summary, the BOOTSTRAP statement in PROC TTEST in SAS/STAT 14.3 makes it easy to obtain bootstrap estimates for statistics in one-sample or two-sample t tests (and paired t tests). By using the BOOTSTRAP statement, the manual three-step bootstrap process (resample, compute statistics, and summarize) is reduced to a zero-step process. The TTEST procedure handles the details for you.
Many things in nature can be seen as chain reactions. When one action occurs, others follow suit. For example, atmospheric greenhouse gas levels are increasing, which leads to a warming of the oceans. As the oceans warm, weather and climate patterns across the globe are impacted because the amount of [...]
4 ways to visualize climate changes in the oceans and the Arctic was published on SAS Voices by Mary Osborne
I recently read an interesting article that claims "a single cremation emits as much carbon dioxide as a 1,000-mile car trip." This got me wondering about cremation data, and I ended up on the Wikipedia page about cremation rates. They had a map of the US cremation rates by state ... but the more [...]
My 2018 SAS Global Forum paper was about "how to use the random-number generators (RNGs) in SAS." You can read the paper for details, but I recently recorded a short video that summarizes the main ideas in the paper. In particular, the video gives an overview of the new RNGs in SAS, which include the following:
- MTHYBRID, MT2002, and MT64: Variants of the Mersenne twister RNG. The MTHYBRID method is the default RNG in SAS, beginning with the SAS 9.4M3.
- PCG: A 64-bit permuted congruential generator
- RDRAND: A hardware-based RNG that generates random numbers from thermal noise in the chip
- TF2 and TF4: Counter-based Threefry RNGs
If your browser does not support embedded video, you can go directly to the video on YouTube.
The following references provide more information about the random number generators in SAS:
- How to use the new random-number generators in SAS
- Independent streams of random numbers in SAS (How to use the CALL STREAM routine.)
- Sarle, W. and Wicklin, R., 2018, "Tips and Techniques for Using the Random-Number Generators in SAS," Proceedings of the SAS Global Forum Conference.
American readers may know that the World Cup is a big deal in other countries. But it's hard to grasp just how ubiquitous and important it is when there's no other event like it in the United States. The best way I can describe it is if the series finale [...]
With all the technology advancements and innovative trends driving Industry 4.0 right now, you might expect geeky topics like the Internet of Things (IoT) or artificial intelligence (AI) to be the hottest topics of discussion among industry leaders. Instead, many leaders are still more focused on workplace culture. And here’s [...]
Workplace culture still No. 1 challenge for manufacturers was published on SAS Voices by Roger Thomas
In SAS Visual Analytics 8.2 on SAS Viya 3.3, there are a number of new data features available. Some of these features are completely new, and some are features from the 7.x release that had not yet been included in the 8.1 release. I’ll cover a few of these new features in this post.
First of all, the Data pane interface has changed to enable users to access actions via fewer and better organized menus.
Data item properties can also be displayed for viewing or editing with a single click.
The new Change data source action displays a Repair report window if report data items are not in the new data source. The window enables you to replace the missing data items with replacement data items from the new data source before continuing with the change.
Speaking of mapping, in SAS Visual Analytics 8.2, linked selections and filters can automatically be add to objects, and the objects may use different data sources. In that case, you can manually map data sources from the data pane. The + icon enables you to add additional pairs of mappings.
When you create a new Geography data item in SAS Visual Analytics 8.2, in addition to using Predefined names and codes or your own custom latitude and longitude data items, you can now also use custom polygon shapes to display your own custom regions. Once you select Custom polygon shapes, you specify, in additional dialogs, the characteristics of your polygon provider. You can use a CAS table or an Esri Feature Service.
For more information on custom polygons, see my previous blog here.
If you need to use and Esri shape file for your polygon data, there are macros available in VA 8.2 to convert the data to a SAS dataset and to load the data into CAS.
- %SHPCNTNT display the contents of the shape file
- %SHPIMPRT converts the shapefile into a SAS dataset and loads it into CAS.
The Custom Sort feature is also back in SAS Visual Analytics 8.2. Just right-click the data item, select Custom sort, and then select and order your data values.
For creating a new derived data item, there are several new calculations available for measures:
And speaking of creating calculated data items, you’ll want to check out three useful new operators that are available in SAS Visual Analytics 8.2:
What do the New York Mets, the Orlando Magic and the Boston Bruins all have in common? They all use SAS analytics to gain deeper insights into athlete recruitment, retention, performance, safety and more. And after seeing the success teams like these have had using analytics, collegiate sports are turning [...]