6月 182018
 

Bootstrap resampling is a powerful way to estimate the standard error for a statistic without making any parametric assumptions about its sampling distribution. The bootstrap method is often implemented by using a sequence of calls to resample from the data, compute a statistic on each sample, and analyze the bootstrap distribution. An example is provided in the article "Compute a bootstrap confidence interval in SAS." This process can be lengthy and in Base SAS it requires reading and writing a large amount of data. In SAS/STAT 14.3 (SAS 9.4m5), the TTEST procedure supports the BOOTSTRAP statement, which automatically performs a bootstrap analysis of one-sample and two-sample t tests. The BOOTSTRAP statement also applies to two-sample paired tests.

The difference of means between two groups

The BOOTSTRAP statement makes it easy to obtain bootstrap estimates of bias and standard error for a statistic and confidence intervals (CIs) for the underlying parameter. The BOOTSTRAP statement supports several estimates for the confidence intervals, including normal-based intervals, t-based intervals, percentile intervals, and bias-adjusted intervals. This section shows how to obtain bootstrap estimates for a two-sample t test. The statistic of interest is the difference between the means of two groups.

The following SAS DATA step subsets the Sashelp.Cars data to create a data set that contains only two types of vehicles: sedans and SUVs. A call to PROC UNIVARIATE displays a comparative histogram that shows the distributions of the MPG_City variable for each group. The MPG_City variable measures the fuel efficiency (in miles per gallon) for each vehicle during typical city driving.

/* create data set that has two categories: 'Sedan' and 'SUV' */
data Sample;
set Sashelp.Cars(keep=Type MPG_City);
if Type in ('Sedan' 'SUV');
run;
 
proc univariate data=Sample;
   class Type;
   histogram MPG_City;
   inset N Mean Std Skew Kurtosis / position=NE;
   ods select histogram;
run;

Bootstrap estimates for a two-sample t test

Suppose that you want to test whether the mean MPG of the "SUV" group is significantly different from the mean of the "Sedan" group. The groups appear to have different variances, so you would probably choose the Satterthwaite version of the t test, which accommodates different variances. You can use PROC TTEST to run a two-sample t test for these data, but in looking at the distributions of the groups, you might be concerned that the normality assumptions for the t test are not satisfied by these data. Notice that the distribution of the MPG_City variable for the "Sedan" group has high skewness (1.3) and moderately high kurtosis (1.9). Although the t test is somewhat robust to the normality assumption, you might want to use the bootstrap method to estimate the standard error and confidence interval for the difference of means between the two groups.

If you are using SAS/STAT 14.3, you can compute bootstrap estimates for a t test by using the BOOTSTRAP statement, as follows:

title "Bootstrap Estimates with Percentile CI";
proc ttest data=Sample;
   class Type;
   var MPG_City;
   bootstrap / seed=123 nsamples=10000 bootci=percentile;  /* or BOOTCI=BC */
run;

The BOOTSTRAP statement supports three options:

  • The SEED= option initializes the internal random number generator for the TTEST procedure.
  • The NSAMPLES= option specifies the number of bootstrap resamples to be drawn from the data.
  • The BOOTCI= option specifies the estimate for the confidence interval for the parameter. This example uses the PERCENTILE method, which uses the α/2 and 1 – α/2 quantiles of the bootstrap distribution as the endpoints of the confidence interval. A more sophisticated second-order method is the bias-corrected interval, which you can specify by using the BOOTCI=BC option. For educational purposes, you might want to compare these nonparametric estimates with more traditional estimates such as t-based confidence intervals (BOOTCI=TBOOTSE).

The TTEST procedure produces several tables and graphs, but I have highlighted a few statistics in two tables. The top table is the "ConfLimits" table, which is based on the data and shows the traditional statistics for the t test. The estimate for the difference in means between the "SUV" and "Sedan" groups is -4.98 and is highlighted in blue. The traditional (parametric) estimate for a 95% confidence interval is highlighted in red. The interval is [-5.87, -4.10], which does not contain 0, therefore you can conclude that the group means are significantly different at the 0.05 significance level.

The lower table is the "Bootstrap" table, which is based on the bootstrap resamples. The TTEST documentation explains the resampling process and the computation of the bootstrap statistics. The top row of the table shows estimates for the difference of means. The bootstrap estimate for the standard error is 0.45. The estimate of bias (which subtracts the average bootstrap statistic from the sample statistic) is -0.01, which is small. The percentile estimate for the confidence interval is [-5.87, -4.14], which is similar to the parametric interval estimate in the top table. (For comparison, the bias-adjusted CI is also similar: [-5.85, -4.12].) Every cell in this table will change if you change the SEED= or NSAMPLES= options because the values in this table are based on the bootstrap samples.

Although the difference of means is the most frequent statistic to bootstrap, you can see from the lower table that the BOOTSTRAP statement also estimates the standard error, bias, and confidence interval for the standard deviation of the difference. Although this article focuses on the two-sample t test, the BOOTSTRAP statement also applies to one sample t tests.

In summary, the BOOTSTRAP statement in PROC TTEST in SAS/STAT 14.3 makes it easy to obtain bootstrap estimates for statistics in one-sample or two-sample t tests (and paired t tests). By using the BOOTSTRAP statement, the manual three-step bootstrap process (resample, compute statistics, and summarize) is reduced to a zero-step process. The TTEST procedure handles the details for you.

The post The BOOTSTRAP statement for t tests in SAS appeared first on The DO Loop.

6月 152018
 

Many things in nature can be seen as chain reactions. When one action occurs, others follow suit. For example, atmospheric greenhouse gas levels are increasing, which leads to a warming of the oceans. As the oceans warm, weather and climate patterns across the globe are impacted because the amount of [...]

4 ways to visualize climate changes in the oceans and the Arctic was published on SAS Voices by Mary Osborne

6月 152018
 

I recently read an interesting article that claims "a single cremation emits as much carbon dioxide as a 1,000-mile car trip." This got me wondering about cremation data, and I ended up on the Wikipedia page about cremation rates. They had a map of the US cremation rates by state ... but the more [...]

The post Cremation rates in the US, by state appeared first on SAS Learning Post.

6月 152018
 

My 2018 SAS Global Forum paper was about "how to use the random-number generators (RNGs) in SAS." You can read the paper for details, but I recently recorded a short video that summarizes the main ideas in the paper. In particular, the video gives an overview of the new RNGs in SAS, which include the following:

  • MTHYBRID, MT2002, and MT64: Variants of the Mersenne twister RNG. The MTHYBRID method is the default RNG in SAS, beginning with the SAS 9.4M3.
  • PCG: A 64-bit permuted congruential generator
  • RDRAND: A hardware-based RNG that generates random numbers from thermal noise in the chip
  • TF2 and TF4: Counter-based Threefry RNGs

If your browser does not support embedded video, you can go directly to the video on YouTube.

The following references provide more information about the random number generators in SAS:

The post Video: New random number generators in SAS appeared first on The DO Loop.

6月 142018
 

With all the technology advancements and innovative trends driving Industry 4.0 right now, you might expect geeky topics like the Internet of Things (IoT) or artificial intelligence (AI) to be the hottest topics of discussion among industry leaders. Instead, many leaders are still more focused on workplace culture. And here’s [...]

Workplace culture still No. 1 challenge for manufacturers was published on SAS Voices by Roger Thomas

6月 142018
 

In SAS Visual Analytics 8.2 on SAS Viya 3.3, there are a number of new data features available. Some of these features are completely new, and some are features from the 7.x release that had not yet been included in the 8.1 release.  I’ll cover a few of these new features in this post.

First of all, the Data pane interface has changed to enable users to access actions via fewer and better organized menus.

Data item properties can also be displayed for viewing or editing with a single click.

The new Change data source action displays a Repair report window if report data items are not in the new data source.  The window enables you to replace the missing data items with replacement data items from the new data source before continuing with the change.

Speaking of mapping, in SAS Visual Analytics 8.2, linked selections and filters can automatically be add to objects, and the objects may use different data sources. In that case, you can manually map data sources from the data pane.  The + icon enables you to add additional pairs of mappings.

When you create a new Geography data item in SAS Visual Analytics 8.2, in addition to using Predefined names and codes or your own custom latitude and longitude data items, you can now also use custom polygon shapes to display your own custom regions. Once you select Custom polygon shapes, you specify, in additional dialogs, the characteristics of your polygon provider.  You can use a CAS table or an Esri Feature Service.

For more information on custom polygons, see my previous blog here.

If you need to use and Esri shape file for your polygon data, there are macros available in VA 8.2 to convert the data to a SAS dataset and to load the data into CAS.

  • %SHPCNTNT display the contents of the shape file
  • %SHPIMPRT converts the shapefile into a SAS dataset and loads it into CAS.

The Custom Sort feature is also back in SAS Visual Analytics 8.2. Just right-click the data item, select Custom sort, and then select and order your data values.

For creating a new derived data item, there are several new calculations available for measures:

And speaking of creating calculated data items, you’ll want to check out three useful new operators that are available in SAS Visual Analytics 8.2:

A look at the new data pane and data item features in SAS Visual Analytics 8.2 was published on SAS Users.

6月 132018
 

What do the New York Mets, the Orlando Magic and the Boston Bruins all have in common? They all use SAS analytics to gain deeper insights into athlete recruitment, retention, performance, safety and more. And after seeing the success teams like these have had using analytics, collegiate sports are turning [...]

The key to success in college sports? Analytics. was published on SAS Voices by Georgia Mariani

6月 132018
 

If you use PROC SGPLOT to create ODS graphics, "ATTRS" are everywhere. ATTRS is an abbreviation of "attributes." Most options that change the attributes of a graphical element end with the ATTRS suffix. For example, the MARKERATTRS option modifies attributes of markers, the LINEATTRS option modifies attributes of lines, and the FILLATTRS option modifies attributes of the filled area of a bar or other region. These options are easy to remember and use.

However, there are three "ATTRS" that you might find more confusing: CYCLEATTRS, ATTRPRIORITY, and STYLEATTRS. These options determine the colors, line patterns, and marker symbols that are used to represent groups in your data. They interact with each other when you use a GROUP= option to specify groups and also use multiple statements to overlay several graph types such as scatter plots and series plots.

This article summarizes the ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS keywords and provides an example that shows how they interact with each other. The end of this article presents a list of references for further reading.

What are ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS?

In PROC SGPLOT, many statements support the GROUP= option. In order to distinguish one group from another, the groups are assigned different attributes such as colors, line patterns, marker symbols, and so on. The colors, patterns, and other attributes are defined by the current ODS style (or by the STYLEATTRS statement or the DATTRMAP= data set), but the ATTRPRIORITY and CYCLEATTRS keywords determine how the colors, patterns, and symbols combine for each group. The following describes the syntax and purpose of each keyword:

  • The ATTRPRIORITY option is specified on the ODS GRAPHICS statement. If you specify ATTRPRIORITY=COLOR, then groups are represented by changing the color of markers and lines, but not line patterns or marker symbols. If you specify ATTRPRIORITY=NONE, groups are represented by changing the color, line patterns, and marker symbols for each group. The ATTRPRIORITY=NONE is essential for "no color" styles such as the Journal style.
  • The CYCLEATTRS (or NOCYCLEATTERS) option affect whether attributes change when you overlay multiple graphs by using two or more statements. Notice that the word "CYCLE" begins with an "S" sound, as does the word "statements." Say the following sentence out loud, emphasizing the common "S" sounds: "CYCLEATTRS affect Statements."
  • The STYLEATTRS statement in PROC SGPLOT overrides the colors, patterns, and symbols in the current style. You can use the STYLEATTRS statement to set the combination of attributes that appear for each group.

Before looking at the interaction between these keywords, recall how the ATTRPRIORITY= option affects a graph that has only one statement. The following calls to PROC SGPLOT are identical. The only difference is the ATTRPRIORITY= option.

title "Iris Data: AttrPriority=Color";
ods graphics / AttrPriority=Color;
proc sgplot data=Sashelp.Iris;
   reg x=PetalWidth y=SepalLength / group=Species markerattrs=(size=10) lineattrs=(thickness=4);
run;
 
title "Iris Data: AttrPriority=None";
ods graphics / AttrPriority=None;
proc sgplot data=Sashelp.Iris;
   reg x=PetalWidth y=SepalLength / group=Species markerattrs=(size=10) lineattrs=(thickness=4);
run;
ATTRPRIORITY=COLOR and ATTRPRIORITY=NONE options in ODS graphics in SAS PROC SGPLOT

The plot on the left shows the result of using ATTRPRIORITY=COLOR with the HtmlBlue style. The three groups are represented by using the first three colors in the style (in this case, blue, red, and green), but the line patterns and marker symbols do not change between groups. In contrast, the plot on the left is the result of using ATTRPRIORITY=NONE. The three groups are represented by using the first three colors in the style and also the first three line patterns (solid, dashed, and dash-dot) and the first three marker symbols (circle, plus sign, and "X"). The results might depend on the current ODS style.

The interaction between ATTRPRIORITY and CYCLEATTRS

The ATTRPRIORITY and CYCLEATTRS options interact when you use two or more statements to overlay graphs. In Warren Kuhfeld's short course on "Advanced ODS Graphics Examples in SAS," he presents a slide that shows how these options interact. The following example creates a panel of graphs that is inspired by Warren's slide. Each graph in the example shows the same data (the closing stock price for two companies in 2001–2003) displayed for each of the four combinations of the ATTRPRIORITY and CYCLEATTRS options. Although the LOESS statement can display the markers, I used a separate SCATTER statement because I want to illustrate how the CYCLEATTRS statement affects the attributes.

data Stocks;
set Sashelp.Stocks;
where '01JAN2001'd <= Date <= '31OCT2003'd and Stock in ("IBM", "Microsoft");
run;
 
/* The code for each call is the same except for the options. Wrap in a macro for brevity. */
%macro GraphIt(priority, cycle);
title "AttrPriority=&priority.; &cycle.";
ods graphics / AttrPriority=&priority.;
proc sgplot data=Stocks &cycle.;
   loess   x=Date y=Close / group=Stock lineattrs=(thickness=4) NoMarkers smooth=0.75;
   scatter x=Date y=Close / group=Stock markerattrs=(size=10);
   yaxis label="Stock Price";
run;
%mend;
 
ods graphics / width=400px height=300px;
ods layout gridded columns=2 advance=table column_gutter=0 row_gutter=0;
%GraphIt(Color, NoCycleAttrs);
%GraphIt(None,  NoCycleAttrs);
%GraphIt(Color, CycleAttrs);
%GraphIt(None,  CycleAttrs);
ods layout end;
Interaction between the ATTRPRIORITY=  and CYCLEATTRS options in ODS graphics in SAS PROC SGPLOT

The panel for the example shows the interaction between the ATTRPRIORITY= and CYCLEATTRS options.

  • The top-left graph shows the graph for ATTRPRIORITY=COLOR and NOCYCLEATTRS. Only colors are used to distinguish the groups (IBM and Microsoft). The second statement (SCATTER) uses the same attributes as the first statement (LOESS).
  • The top-right graph shows the graph for ATTRPRIORITY=NONE and NOCYCLEATTRS. Colors, line patterns, and symbols are used to distinguish the groups. The second statement (SCATTER) uses the same attributes as the first statement (LOESS).
  • The bottom-left graph shows the graph for ATTRPRIORITY=COLOR and CYCLEATTRS. Only colors are used to distinguish the groups (IBM and Microsoft). The first statement (LOESS) uses the first two colors in the style (blue and red) and the second statement (SCATTER) uses the third and fourth colors in the style (green and brown). Because the markers are not filled, it might be difficult to see that the marker colors are green and brown, but you can click on the graph to enlarge it.
  • The bottom-right graph shows the graph for ATTRPRIORITY=NONE and CYCLEATTRS. Colors, line patterns, and symbols are used to distinguish the groups. The first statement (LOESS) uses the first two attributes in the style (blue-solid and red-dashed) whereas the second statement (SCATTER) uses the third and fourth attributes (green-X and brown-triangle).

Depending on your needs, either graph in the first row would be appropriate for color graphs. The lower-right plot is most suitable for ODS styles that create monochrome graphs.

The STYLEATTRS statement

After you understand the interaction between the ATTRPRIORITY= and CYCLEATTRS options, it is straightforward to use the STYLEATTRS statement. The statement merely overrides the group attributes for the current style. If you specify fewer attributes than the number of groups, the attributes are cyclically reused. For example, the following STYLEATTRS statement specifies that all lines should be solid but specifies different colors and symbols for the groups. Because two attributes are specified, the second statement reuses the same colors and symbols even though the NOCYCLEATTRS option is set:

title "AttrPriority=None; NoCycleAttrs; StyleAttrs";
ods graphics / AttrPriority=None;
proc sgplot data=Stocks NoCycleAttrs;
   styleattrs datalinepatterns=(solid)
              datacontrastcolors=(SteelBlue DarkGreen)
              datasymbols=(CircleFilled TriangleFilled);
   loess   x=Date y=Close / group=Stock lineattrs=(thickness=4) NoMarkers smooth=0.75;
   scatter x=Date y=Close / group=Stock markerattrs=(size=10);
   yaxis label="Stock Price";
run;

Reset the default attribute priority

Here's another trick I learned from Warren's course: You can use the RESET= option to reset the default value of the ATTRPRIORITY option for the current style:

ods graphics / reset=attrpriority; /* reset the default value for the current style */

Summary and further reading

This article provides an example and summarizes the behavior of the ATTRPRIORITY= option (on the ODS GRAPHICS statement) and the CYCLEATTRS option (on the PROC SGPLOT statement). These options interact when you have several statements in PROC SGPLOT that each use the GROUP= option. You can use the STYLEATTRS statement to override the default attributes for the current ODS style.

Much more can be said about these options and how they interact. In addition to the documentation links in the article, I recommend the following:

The post Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics appeared first on The DO Loop.