6月 062018
 

The SURVEYSELECT procedure in SAS 9.4M5 supports the OUTRANDOM option, which causes the selected items in a simple random sample to be randomly permuted after they are selected. This article describes several statistical tasks that benefit from this option, including simulating card games, randomly permuting observations in a DATA step, assigning a random ID to patients in a clinical study, and generating bootstrap samples. In each case, the new OUTRANDOM option reduces the number of statements that you need to write. The OUTRANDOM option can also be specified by using OUTORDER=RANDOM.

Sample data with PROC SURVEYSELECT

Often when you draw a random sample (with or without replacement) from a population, the order in which the items were selected is not important. For example, if you have 10 patients in a clinical trial and want to randomly assign five patients to the control group, the control group does not depend on the order in which the patients were selected. Similarly, in simulation studies, many statistics (means, proportions, standard deviations,...) depend only on the sample, not on the order in which the sample was generated. For these reasons, and for efficiency, the SURVEYSELECT procedure in SAS uses a "one-pass" algorithm to select observations in the same order that they appear in the "population" data set.

However, sometimes you might require the output data set from PROC SURVEYSELECT to be in a random order. For example, in a poker simulation, you might want the output of PROC SURVEYSELECT to represent a random shuffling of the 52 cards in a deck. To be specific, the following DATA step generates a deck of 52 cards in order: Aces first, then 2s, and so on up to jacks, queens, and kings. If you use PROC SURVEYSELECT and METHOD=SRS to select 10 cards at random (without replacement), you obtain the following subset:

data CardDeck;
length Face $2 Suit $8;
do Face = 'A','2','3','4','5','6','7','8','9','10','J','Q','K';
   do Suit = 'Clubs', 'Diamonds', 'Hearts', 'Spades';
      CardNumber + 1;
      output;
   end;
end;
run;
 
/* Deal 10 cards. Order is determined by input data */
proc surveyselect data=CardDeck out=Deal noprint
     seed=1234 method=SRS /* sample w/o replacement */
     sampsize=10;         /* number of observations in sample */
run;
 
proc print data=Deal; run;
Random sample without replacement from a card deck. The sample is in the same order as the data.

Notice that the call to PROC SURVEYSELECT did not use the OUTRANDOM option. Consequently, the cards are in the same order as they appear in the input data set. This sample is adequate if you want to simulate dealing hands and estimate probabilities of pairs, straights, flushes, and so on. However, if your simulation requires the cards to be in a random order (for example, you want the first five observations to represent the first player's cards), then clearly this sample is inadequate and needs an additional random permutation of the observations. That is exactly what the OUTRANDOM option provides, as shown by the following call to PROC SURVEYSELECT:

/* Deal 10 cards in random order */
proc surveyselect data=CardDeck out=Deal2 noprint
     seed=1234 method=SRS /* sample w/o replacement */
     sampsize=10          /* number of observations in sample */
     OUTRANDOM;           /* SAS/STAT 14.3: permute order */
run;
 
proc print data=Deal2; run;
Random sample without replacement from a card deck. The sample is then permuted into a random order.

You can use this sample when the output needs to be in a random order. For example, in a poker simulation, you can now assign the first five cards to the first player and the second five cards to a second player.

Permute the observations in a data set

A second application of the OUTRANDOM option is to permute the rows of a SAS data set. If you sample without replacement and request all observations (SAMPRATE=1), you obtain a copy of the original data in random order. For example, the students in the Sashelp.Class data set are listed in alphabetical order by their name. The following statements use the OUTRANDOM option to rearrange the students in a random order:

/* randomly permute order of observations */
proc surveyselect data=Sashelp.Class out=RandOrder noprint
     seed=123 method=SRS /* sample w/o replacement */
     samprate=1          /* proportion of observations in sample */
     OUTRANDOM;          /* SAS/STAT 14.3: permute order */
run;
 
proc print data=RandOrder; run;
Use PROC SURVEYSELECT to produce a random ordering of a SAS data set

There are many other ways to permute the rows of a data set, such as adding a uniform random variable to the data and then sorting. The two methods are equivalent, but the code for the SURVEYSELECT procedure is shorter to write.

Assign unique random IDs to patients in a clinical trial

Another application of the OUTRANDOM option is to assign a unique random ID to participants in an experimental trial. For example, suppose that four-digit integers are used for an ID variable. Some clinical trials assign an ID number sequentially to each patient in the study, but I recently learned from a SAS discussion forum that some companies assign random ID values to subjects. One way to assign random IDs is to sample randomly without replacement from the set of all ID values. The following DATA step generates all four-digit IDs, selects 19 of them in random order, and then merges those IDs with the participants in the study:

data AllIDs;
do ID = 1000 to 9999;  /* create set of four-digit ID values */
   output;
end;
run;
 
/* randomly select 19 unique IDs */
proc surveyselect data=AllIDs out=ClassIDs noprint
     seed=12345 method=SRS  /* sample w/o replacement */
     sampsize=19            /* number of observations in sample */
     OUTRANDOM;             /* SAS/STAT 14.3: permute order */
run;
 
data Class;
   merge ClassIDs Sashelp.Class;   /* merge ID variable and subjects */
run;
 
proc print data=Class; var ID Name Sex Age; run;

Random order for other sampling methods

The OUTRANDOM option also works for other sampling schemes, such as sampling with replacement (METHOD=URS, commonly used for bootstrap sampling) or stratified sampling. If you use the REPS= option to generate multiple samples, each sample is randomly ordered.

It is worth mentioning that the SAMPLE function in SAS/IML also can to perform a post-selection sort. Suppose that X is any vector that contains N elements. Then the syntax SAMPLE(X, k, "NoReplace") generates a random sample of k elements from the set of N. The documentation states that "the elements ... might appear in the same order as in X." This is likely to happen when k is almost equal to N. If you need the sample in random order, you can use the syntax SAMPLE(X, k, "WOR") which adds a random sort after the sample is selected, just like PROC SURVEYSELECT does when you use the OUTRANDOM option.

The post Sample and obtain the results in random order appeared first on The DO Loop.

6月 062018
 

You've probably seen in the news that a volcano erupted in Guatemala recently. But do you really know much about this volcano, or even where it's located? Hopefully this blog post will get you up to speed on your volcanology! Pictures of a recent volcano erupting in Hawaii show slow-moving [...]

The post Where is that volcano in Guatemala? appeared first on SAS Learning Post.

6月 042018
 

Years ago, I wrote an article about how to create a Top 10 table and bar chart. The program can be trivially modified to create a "Top N" table and plot, such as Top 5, Top 20, or even Top 100. Not long after the article was written, the developer of PROC FREQ (who had read my blog post) implemented the MAXLEVELS= option on the TABLES statement in PROC FREQ in SAS 9.4. The MAXLEVELS= option automatically creates these tables and plots for you! The option applies only to one-way tables, not to cross tabulations.

A Top 10 plot and bar chart

Suppose you want to see the Top 10 manufacturers of vehicles in the Sashelp.Cars data set. The following call to PROC FREQ uses the MAXLEVELS=10 option to create a Top 10 table and a bar chart of the 10 manufacturers who appear most often in the data:

%let TopN = 10;
proc freq data=sashelp.cars ORDER=FREQ;
  tables make / maxlevels=&TopN Plots=FreqPlot;
run;
Top 10 table in SAS, produced by PROC FREQ
Top 10 bar chart in SAS, produced by PROC FREQ

A few comments about the MAXLEVELS= option:

  • The option affects only the display of the table and chart. The statistics (cumulative counts, percentages, chi-square tests,...) are all based on the full data and all categories. Similarly, if you use the OUT= option to write the counts to an output data set, the data set will contain the counts for all categories, not just the top categories.
  • The "OneWayFreqs" table contains a note at the bottom to remind you that you are looking at a partial table. You can also see that the "Cumulative Percent" column does not end at 100.
  • ThePLOTS=FREQPLOT option contains suboptions that you can use to control aspects of the plot. For example, if you want a horizontal bar chart that shows percentages instead of counts, use the option plots=FreqPlot(orient=horizontal scale=percent).

Sometimes a small change can make a big difference. Although it is not difficult to follow the steps in the original article to create a Top 10 table and chart, the MAXLEVELS= option is even easier.

A Top 10 plot with an "Others" category

For completeness, the following statements are reproduced from an earlier article that shows how to merge the low-frequency categories into a new category with the value "Others." The resulting bar chart shows the most frequent categories and lumps the others into an "Others" category.

%let TopN = 10;
%let VarName = Make;
proc freq data=sashelp.cars ORDER=FREQ noprint;   /* no print. Create output data set of all counts */
  tables &VarName / out=TopOut;
run;
 
data Other;      /* keep the values for the Top categories. Use "Others" for the smaller categories */
set TopOut;
label topCat = "Top Categories or 'Other'";
topCat = &VarName;       /* name of original categorical var */
if _n_ > &TopN then
    topCat = "Other";    /* merge smaller categories */
run;
 
proc freq data=Other ORDER=data;   /* order by data and use WEIGHT statement for counts */
  tables TopCat / plots=FreqPlot(scale=percent);
  weight Count;                    
run;
Bar chart of Top 10 categories and an 'Others' category that shows the count of the smaller categories

In the second PROC FREQ analysis, the smaller categories are merged into a new category called "Others." The "Others" category is suitable for tabular and graphical reporting, but you should think carefully before running statistical tests. As this example shows, the distribution of the new categorical variable might be quite different from the distribution of the original variable.

The post An easy way to make a "Top 10" table and bar chart in SAS appeared first on The DO Loop.

6月 022018
 

The SAS PlatformFor software users and SAS administrators, the question often becomes how to streamline their approach into the easiest to use system that most effectively completes the task at hand. At SAS Global Forum 2018, the topic of a “Big Red Button” was an idea that got audience members asking – is there a way to have just a few clicks complete all the stages of the software administration lifecycle? In this article, we review Sergey Iglov’s SAS Global Forum paper A ‘Big Red Button’ for SAS Administrators: Myth or Reality?” to get a better understanding of what this could look like, and how it could change administrators’ jobs for the better. Iglov is a director at SASIT Limited.

What is a “Big Red Button?”

With the many different ways the SAS Platform can be utilized, there is a question as to whether there is a single process that can control “infrastructure provisioning, software installation and configuration, maintenance, and decommissioning.” It has been believed that each of these steps has a different process; however, as Iglov concluded, there may be a way to integrate these steps together with the “Big Red Button.”

This mystery “button” that Iglov talked about would allow administrators to easily add or delete parts of the system and automate changes throughout; thus, the entire program could adapt to the administrator’s needs with a simple click.

Software as a System –SAS Viya and cloud based technologies

Right now, SAS Viya is compatible with the automation of software deployment processes through a centralized management. Right now, SAS Viya is compatible with a centralized automated deployment process. Through insights easily created and shared on the cloud, SAS Viya stands out, as users can access a centrally hosted control panel instead of needing individual installations.

Using CloudFormation by Amazon Web Services

At this point, the “Big Red Button” points toward systems such as CloudFormation. CloudFormation allows users of Amazon Web Services to lay out the infrastructure needed for their product visually, and easily make changes that will affect the software. As Iglov said, “Once a template is deployed using CloudFormation it can be used as a stack to simplify resources management. For example, when a stack is deleted all related resources are deleted automatically as well.”

Conclusion

Connecting to SAS Viya, CloudFormation can install and configure the system, and make changes. This would help SAS administrators adapt the product to their needs, in order to derive intelligence from data. While the future potential to use a one-click button is out there for many different platforms, using cloud based software and programs such as CloudFormation enable users to go through each step of SAS Platform’s administration lifecycle efficiently and effectively.

Additional Resources

SAS Viya Brochure
Sergey Iglov: "A 'Big Red Button' for SAS administrators: Myth or Reality?"

Additional SAS Global Forum 2018 talks of interest for SAS Administrators

A Programming Approach to Implementing SAS® Metadata-Bound Libraries for SAS® Data Set Encryption Deepali Rai, SAS Institute Inc.

Command-Line Administration in SAS® Viya®
Danny Hamrick, SAS

External Databases: Tools for the SAS® Administrator
Mathieu Gaouette, Prospective MG inc.

SAS® Environment Manager – A SAS® Viya® Administrator’s Swiss Army Knife
Michelle Ryals, Trevor Nightingale, SAS Institute Inc.

Troubleshooting your SAS® Grid Environment
Jason Hawkins, Amadeus Software Limited

Multi-Factor Authentication with SAS® and Symantec VIP
Jody Steadman, Mike Roda, SAS Institute Inc.

OpenID Connect Opens the Door to SAS® Viya® APIs
Mike Roda, SAS Institute Inc.

Understanding Security for SAS® Visual Analytics 8.2 on SAS® Viya®
Antonio Gianni, Faisal Qamar, SAS Institute Inc.

Latest and Greatest: Best Practices for Migrating to SAS® 9.4
Alec Fernandez, Leigh Fernandez, SAS Institute Inc.

Planning for Migration from SAS® 9.4 to SAS® Viya®
Don B. Hayes, DLL Consulting Inc.; Spencer Hayes, Cached Consulting LLC; Michael Shealy, Cached Consulting LLC; Rebecca Hayes, Green Peach Consulting Inc.

SAS® Viya®: Architect for High Availability Now and Users Will Thank You Later
Jerry Read, SAS Institute Inc.

Taming Change: Bulk Upgrading SAS® 9.4 Environments to a New Maintenance Release
Javor Evstatiev, Andrey Turlov

Is there a “Big Red Button” to use The SAS Platform? was published on SAS Users.

6月 012018
 

With talk of tariffs in the news lately, I'm sure everyone is curious about possible ways to bypass them. One possible loophole is using the foreign-trade zones (FTZs) in the US. What are FTZs, and where are they located? Read along and find out!... The Foreign-Trade Zone act was passed in [...]

The post Can you bypass tariffs with a foreign-trade zone? appeared first on SAS Learning Post.