proc freq

11月 012017
 

A ghoulish Halloween Boo to all my readers! Hope my costume freaks you out, but even if it doesn't, I’m positive PROC FREQ will in a few amazing ways! Today’s Programming 2: Data Manipulation Techniques class asked about the power of PROC FREQ. Since I stopped to explain some of it's benefits to [...]

The post 3 freaky ways PROC FREQ can work in your favor appeared first on SAS Learning Post.

5月 202016
 

ProblemSolversPROC FREQ is often the first choice when you want to generate basic frequency counts, but it is the last choice when it is compared to other statistical reporting procedures. People sometimes consider PROC FREQ last because they think they have little or no control over the appearance of the output. For example, PROC FREQ does not allow style options within the syntax, which the REPORT and TABULATE procedures do allow. Also, you cannot control the formats or headings with statements in the procedure step.

Sometimes, a simple frequency (via a one-way table) is all you want, and you don’t want to have to create an output data set just to add a format. A one-way table in PROC FREQ is unique also in that it includes a cumulative count and percent. These calculations cannot be done in other procedures without additional code or steps. However, there is a simple way to make some basic modifications to your table. By adding a PROC TEMPLATE step to modify either the Base.Freq.OneWayFreqs or the Base.Freq.OneWayList table template, you can change the formats of the statistics, move the label of the variable, change the labels of the statistics, and suppress the Frequency Missing row that appears below the table. These changes apply to all output destinations, including the traditional listing output.  You can also use PROC TEMPLATE to make small modifications to the nodes that are generated by one-way tables in the table of contents for non-listing ODS destinations.

Customize the Formats and Labels for Statistics

If you want to modify the format for the statistics Frequency, Cumulative Frequency, Percent, and Cumulative Percent, you can use the FORMAT= option in PROC TEMPLATE. This option accepts SAS formats and user-defined formats. If you want to change the statistic heading, you can use the HEADER= option.

The following example uses both the FORMAT= and the HEADER= statements to format the statistics values and to change the Frequency heading to Count. This example also changes the Percent heading to Pct.

proc format;
     picture pctfmt (round) other='009.9%';
run;

proc template; 
     edit Base.Freq.OneWayList;
        edit frequency;  
           header='Count'; 
           format=comma12.;
        end;  

        edit cumfrequency;
           format=comma12.; 
        end;
edit percent;
header='Pct';
format=pctfmt.;
end;
   
edit cumpercent;
format=pctfmt.; 
end; 
end;
run;                                                                                  

data class;
     set sashelp.class;
     wgt=1000;
run;
 
proc freq data=class;
     tables age;
     weight wgt; 
run;

This code generates the following table:

FREQ procedure's one-way tables01

Move the Variable Label

If a variable has a label, it is centered, by default, at the top of the table, and the variable name appears above the column of values.

FREQ procedure's one-way tables02

If you want to use the label instead of the name above the column, you can edit the HEADER= value in PROC TEMPLATE, as shown in the following example:

proc template;
   edit Base.Freq.OneWayList;             
        edit h1; 
                /* Set text to a blank instead of VARLABEL. */ 
             text " ";
        end;                                                
                                                                                                  
        edit FVariable;                                           
           just=varjust;
           style=rowheader;
           id;
           generic;
           header=varlabel;
        end;
     end;
  run;

proc freq data=sashelp.class;
   label age='Age in Years';
   tables age;
run;

This code generates the following table, which replaces the default location of the label with blank text and moves the label so that it is above the column of the variable's values, as shown in this example:

FREQ procedure's one-way tables03

Suppress the Frequency Missing= Row

If a variable has missing values and you do not include the MISSING option in the TABLES statement of PROC FREQ, the output includes the frequency of missing values (Frequency Missing=) below the table with the number of missing values.

The following table shows the default output:

FREQ procedure's one-way tables04

To suppress this line without including missing values in the table output, you can modify the table template, as follows:

data test;
input c1 $;
cards;
a
b
.
;
run;

proc template;
     edit Base.Freq.OneWayFreqs; 

   /* This is the default setting:                     */ 
   /*  define f2;                                      */
   /*     text "Frequency Missing =" fMissing -12.99; */
   /*     print = pfoot2;                              */
   /*  end;                                            */
        edit f2; 
           text " ";
        end;
     end;
run;

proc freq data=test;
   table c1;
run;

The table that is generated by this code does not contain the Frequency Missing= row at the bottom of the table.

FREQ procedure's one-way tables05

Customize the Table of Contents

For a one-way table, the default table of contents looks like this:

FREQ procedure's one-way tables06

To control the first node, The Freq Procedure, you can use the ODS PROCLABEL statement.  You can change the text of this node, but you cannot eliminate the node. You can remove the One-Way Frequencies node using PROC TEMPLATE with the CONTENTS= statement. The following example changes the text of the The Freq Procedure node, and it eliminates the One-Way Frequencies node.

proc template;
   edit Base.Freq.OneWayFreqs;
      contents=off;
   end;
run;

ods listing close;
ods pdf file='test.pdf';

ods proclabel='My One-Way Table';
proc freq data=sashelp.class;
tables age;
run;

ods pdf close;
ods listing;

Here is the modified table of contents:

FREQ procedure's one-way tables07

(For additional customization of the table of contents, you need to consider using the DOCUMENT procedure.  The SAS Global Forum 2011 paper Let's Give 'Em Something to TOC about: Transforming the Table of Contents of Your PDF File, by Bari Lawhorn, contains useful information and examples that illustrate how to use PROC DOCUMENT, PROC TEMPLATE, and other SAS statements and options to customize your TOC.)

To restore the default table template, run the following code:

proc template;
delete Base.Freq.OneWayFreqs;
delete Base.Freq.OneWayList;
run;

The examples above show how you actually can customize PROC FREQ one-way tables, with a little help from PROC TEMPLATE. PROC TEMPLATE offers several methods that make it easy to customize one-way, frequency-count tables that enhance table format and appearance.
tags: Problem Solvers, PROC FREQ, SAS Programmers

Seeing the FREQ procedure's one-way tables in a new light was published on SAS Users.

10月 192013
 
This week's SAS tip is from Art Carpenter and his latest book Carpenter's Guide to Innovative SAS Techniques. A SAS user since 1977, Art is an integral and highly respected member of the user community. He's also the author of several outstanding SAS books--and you can explore them here. The following excerpt is from SAS [...]
8月 272013
 

I planed to extend nonWinferiority testing to one of my statistical notes,  Equivalence Testing and TOST (Two One-Sided Test) since noninferiority test is simply half part of the equivalence test. Today I’m glad to find an even better explanation from

SAS Usage Note 48616: Design and analysis of noninferiority studies

You will love it if you are interested in this subject (bookmark this link since it looks like unfinished somehow). Few comments:

  • It’s better not to use a two-side test for noninferiority testing. In PROC TTEST, you can add option like “side=U” or “side=L”.
  • If you do like a two-side test for noninferiority testing, it’s OK but DO remember assign an alpha value of 0.1 since the default value is 0.05 (if you’re lost by this statement, read this post again). It’s pretty tricky and I saw lots of mistakes due to ignoring of this point.
  • In note 48616, it is said only test of mean(s) is supported by PROC TTEST. Actually, PROC TTEST can also test ratios (a log-transformation will be applied) by a  “dist=lognormal” option.
  • To make decision in noninferiority, superiority, and equivalence testings, lower or/and upper limits are used to compare with the margin(s) rather than the P-value. They are of course equivalent but the former approach is much more accessible and meaningful for non-statistician researchers.
4月 132013
 
This week's SAS tip is from Ron Cody and his popular book SAS Statistics by Example. Ron is the author of many bestselling SAS books. If you like this week's excerpt, visit his author page for additional bonus content from his books! The following excerpt is from SAS Press author Ron [...]
10月 082012
 
In some settings it may be necessary to recode a categorical variable with character values into a variable with numeric values. For example, the matching macro we discussed in example 7.35 will only match on numeric variables. One way to convert character variables to numeric values is to determine which values exist, then write a possibly long series of conditional tests to assign numbers to the values. Surely there's a better way?

SAS
In SAS, Rick Wicklin offers an IML solution and links to a macro with the same function. But if you're not an IML coder, and you don't want to investigate a macro solution, it's simple enough to do with data steps. We'll begin by making some fake data.
data test;
do i = 1 to 100;
cat = "meow";
if i gt 30 then cat = "Purr";
if i gt 70 then cat = "Hiss";
output;
end;
run;
To make the new variable, we'll just sort (section 1.5.6) the data on the categorical variable we want to convert, then use the set ds; by x; syntax to keep track of when a new value is encountered in the data. It's hard to believe that we've never demonstrated this useful syntax before-- perhaps we just can't find it today. The set ds; by x; syntax makes new temporary variables first.x and last.x that are equal to 1 for the first and last observations of each new level of x, respectively, and 0 otherwise. When we find a new value, we'll increase a counter by 1; the counter is our new numeric-valued variable.
proc sort data = test; by cat; run;

data catize;
set test;
by cat;
retain catnum 0;
if first.cat then catnum = catnum + 1;
run;

/* check the result */
proc freq data = catize;
tables cat * catnum;
run;
The table also shows the recoding values.
                             Table of cat by catnum

cat catnum

Frequency|
Percent |
Row Pct |
Col Pct | 1| 2| 3| Total
---------+--------+--------+--------+
Hiss | 30 | 0 | 0 | 30
| 30.00 | 0.00 | 0.00 | 30.00
| 100.00 | 0.00 | 0.00 |
| 100.00 | 0.00 | 0.00 |
---------+--------+--------+--------+
Purr | 0 | 40 | 0 | 40
| 0.00 | 40.00 | 0.00 | 40.00
| 0.00 | 100.00 | 0.00 |
| 0.00 | 100.00 | 0.00 |
---------+--------+--------+--------+
meow | 0 | 0 | 30 | 30
| 0.00 | 0.00 | 30.00 | 30.00
| 0.00 | 0.00 | 100.00 |
| 0.00 | 0.00 | 100.00 |
---------+--------+--------+--------+
Total 30 40 30 100
30.00 40.00 30.00 100.00


R
We begin by making the data. To convert to numbers, we use the labels option to the factor() function, feeding it the sequences of numbers between 1 and however many different values there are. Note that we find this using the factor() function again. There's probably a better way of doing this, but it's a little bit amusing to code it this way. Then we have numbers, but they're store as a factor. We can get them out with a call to as.numeric().
cat = c(rep("meow",30),rep("Hiss",30), rep("Purr", 40))
catn1 = factor(cat, labels=(1:length(levels(factor(cat)))))
catn = as.numeric(catn1)
table(catn,cat)

cat
catn Hiss meow Purr
1 30 0 0
2 0 30 0
3 0 0 40

There's a warning in the documentation for factor() that the values are assigned in location-specific fashion, so the table should be used to establish how the codes were assigned.  For the record, the use cases for this kind of recoding in R may be more strained than the SAS example given above.

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.
9月 242012
 

A guy notices a bunch of targets scattered over a barn wall, and in the center of each, in the "bulls-eye," is a bullet hole. "Wow," he says to the farmer, "that’s pretty good shooting. How’d you do it?" "Oh," says the farmer, "it was easy. I painted the targets after I shot the holes." – An Old Joke

Note on how to get the difference between independent binomial proportions using SAS PROC FREQ (tested in SAS 9.3M1).  Also, benchmarks took from Prof. Newcombe ’s paper:

Interval estimation for the difference between independent proportions: comparison of eleven methods.

Newcombe_diff

Key findings (SAS output):

CI8

1. SAS Proc Freq offers 8 ( including 4 of total 11 in Prof. Newcombe ’s paper) methods to compute confidence intervals for difference between independent binomial proportions:

    • Wald #1
    • Wald (Corrected) #2
    • Exact
    • Exact (FM Score)
    • Newcombe Score #10
    • Newcombe Score (Corrected) #11
    • Farrington-Manning
    • Hauck-Anderson

Note that #1 method is the most popular one in textbook, #10 might  be the most wildly used method in industry.

2. There is a big discrepancy in the outputs in the  so called “exact” method among  SAS Proc Freq and Prof. Newcombe ’s paper. Seems they used different methods (in the same “exact” families) under same name. Needs further investigation.

The SAS codes:

/*
2*2 table

        response0 response1
group1      56      14
group2      48      32
*/

data ci;
    input grp res count;
datalines;
1 0 56
1 1 14
2 0 48
2 1 32
;

/*
methods:
Exact
Farrington-Manning
Hauck-Anderson
Newcombe Score
Wald
*/
ods select  PdiffCLs;
ods output PdiffCLs=ci1;
proc freq data=ci order=data;
    tables grp*res /riskdiff (CL=(EXACT FM HA WILSON WALD));
    weight count;
    exact riskdiff;
run;

/*
methods:
Exact (FM Score)
*/
ods select  PdiffCLs;
ods output PdiffCLs=ci2;
proc freq data=ci order=data;
    tables grp*res /riskdiff (CL=(EXACT));
    weight count;
    exact riskdiff(fmscore);
run;

/*
methods:
Newcombe Score (Corrected)
Wald (Corrected)
*/
ods select  PdiffCLs;
ods output PdiffCLs=ci3;
proc freq data=ci order=data;
    tables grp*res /riskdiff (CORRECT CL=(WILSON WALD));
    weight count;
    exact riskdiff;
run;

/*put them all*/
data CI_all;
    set ci3 ci2 ci1;
    CI=’(‘||compress(put(LowerCL,8.4))||’, ‘||compress(put(UpperCL,8.4))||’)’;
run;

proc sql;
    select
    case Type
        when "Wald" then "a"
        when "Wald (Corrected)" then "b"
        when "Exact" then "c"
        when "Exact (FM Score)" then "d"
        when "Newcombe Score" then "e"
        when "Newcombe Score (Corrected)" then "f"
        when "Farrington-Manning" then "g"
        when "Hauck-Anderson" then "h"
        else "i"
    end as order,
        Type "method",
        CI "95% Confidence Limits"
    from CI_all
    order by order
        ;
quit;

———————————-

You may be also interested in this post on calculating confidence intervals for single proportion:

Statistical Notes (3): Confidence Intervals for Binomial Proportion Using SAS

using the following 11 methods:

      • 1.  Simple asymptotic, Without CC | Wald
      • 2.  Simple asymptotic, With CC      
      • 3.  Score method, Without CC | Wilson       
      • 4.  Score method, With CC    
      • 5. Binomial-based, ‘Exact’ | Clopper-Pearson              
      • 6.  Binomial-based, Mid-p      
      • 7.  Likelihood-based            
      • 8.  Jeffreys    
      • 9.  Agresti-Coull,z^2/2 successes          
      • 10. Agresti-Coull,2 successes and 2 fail           
      • 11. Logit                                               
9月 212012
 

A latest updated post on Freakonomics, Beware the Weasel Word “Statistical” in Statistical Significance!,  seemed to attempt to challenge frequentist statistics by Bayesian. I have no research on Bayesian and won’t jump to the debates. I’d rather to use this case to apply the Dragon’s Teeth and Fleas logic of hypothesis testing (at least I think frequentist is pretty intuitive).

A story goes with this post, that Bob won 20 times out of total 30 rounds with Alice in a children card game, “War”. War itself is a totally random game, so everyone is supposed to have 50% chance to win in a fair game. Since Bob won 20 out of 30, the question raised: is he cheating?

To answer this question, just follow the Dragon’s Teeth and Fleas logic of hypothesis testing:

  • First, assume they are dragon’s teeth; in this case, we assume it is a fair game that everyone has probability of 0.5 to win.
  • Under this assumption, we know the the number of winning follows binomial distribution with probability of 0.5. Now we should check Bob’s 20 winning in 30 in this distribution: is this output extreme, or is it a flea?
  • We then get the p-Value calculated by P(X>=20) which means the chance of winning 20 times and more in 30 games (as extreme or even more extreme ), is 0.0494. Here we only think about winning, so it is a one-sided test. Since this p-Value is less than the default significance level of 0.05 (IT IS A FLEA!), we reject the null and think this is more likely not a fair game (Bob is cheating in this statistical point of view).

Bob

That’s all the story. Then this post goes, “Although the procedure is mathematically well defined, it is nonsense”. The following is the first reason listed in the post:

To calculate the probability of Bob’s winning at least 20 games, you use not only the data that you saw (that Bob won 20 games out of 30) but also the imaginary data of Bob winning 21 out of 30 games, or 22 out of 30 games, all the way up to Bob winning all 30 games. The idea of imaginary data is oxymoronic.

In my opinion, the so called “imaginary data” are just the data filling out the binomial distribution (under the assumption of true null hypothesis) we got before and they are not oxymoronic at all (sorry I use “data” as plural). 

What do you think? I can’t go far away from here due to my limit knowledge of Bayesian.

————-appendix: SAS Codes and Outputs——————

data a;
    input output count;
datalines;
0 20
1 10
;

proc freq data=a;
    tables output /binomialc (p=0.5) alpha=0.05;
    exact binomial;
    weight count;
run;

Bob_SAS

9月 152012
 

A guy notices a bunch of targets scattered over a barn wall, and in the center of each, in the "bulls-eye," is a bullet hole. "Wow," he says to the farmer, "that’s pretty good shooting. How’d you do it?" "Oh," says the farmer, "it was easy. I painted the targets after I shot the holes." – An Old Joke

Last year I dumped piece of SAS codes to compute confidence intervals for single proportion using 11 methods including 7 presented by one of Prof. Newcombe‘s  most wildly cited papers, Two-sided confidence intervals for the single proportion: comparison of seven methods (r is the number of event in total cases of n):

newcombe

To get such results using my macro, just submit the following SAS codes:

filename CI url ‘https://raw.github.com/Jiangtang/Programming-SAS/master/CI_Single_Proportion.sas’;
%include CI;
%CI_Single_Proportion(r=81,n=263);

And the outputs(plus 4 additional methods):

Jiangtang

In SAS 9.2 and 9.3, 6 of the 11 intervals can be calculated by PROC FREQ. First, method 1, 3, 9 ,8, 5 via a binomial option:

data test;
input grp outcome $ count;
datalines;
1 f 81
1 u 182
;

ods select BinomialCLs;
proc freq data=test;
tables outcome / binomial(all);
weight Count;
run;

The outputs:

SASFreqCI

And you can get another Wald interval with a continuity correction (method 2) via binomialC option:

ods select BinomialCLs;
proc freq data=test;
tables outcome / binomialc(all);
weight Count;
run;

get:

SASFreqCI_CC

R Notes

These two R packages contain a bunch of CI calculations:

Binom

PropCIS

For SAS macro %CI_Single_Proportion, I borrowed a lot from this R implementation:

http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/prop.CI.r

Additional Notes

If interested, a nice paper on binomial intervals with ties, the paper Confidence intervals for a binomial proportionin the presence of ties,  and the program.