Kathryn McLawhorn

3月 132021
 

As a programmer, you might be accustomed to how SAS® stores date, time, and datetime values as 8-byte floating-point numeric values. A SAS date is the number of days since January 1, 1960; a SAS time is the number of seconds since midnight; and a SAS datetime is the number of seconds since midnight, January 1, 1960. Are you aware that date and time values are handled differently if you are programming in DS2? Because DS2 can process databases, it has access to ANSI data types, which have greater precision. The ANSI data types that are relevant for processing dates and times include DOUBLE, TIME, DATE, and TIMESTAMP. However, these data types are not comparable to SAS date and time values, so they are not automatically converted.

Dates and times in DS2

Declaring DS2 DATE, TIME, and TIMESTAMP constants requires a specific structure. The syntax is shown here:

  • DATE 'yyyy-mm-dd'
  • TIME 'hh:mm:ss[.fraction]'
  • TIMESTAMP 'yyyy-mm-dd hh:mm:ss[.fraction]'

Here is an example:

proc ds2;
data _null_;
method run();
dcl date ds2dt;
dcl time ds2tm;
dcl timestamp ds2dtm;
ds2dt = date '2017-01-31';
ds2tm = time '20:44:59';
ds2dtm = timestamp '2017-02-07 07:00:00.7569';
put ds2dt=;
put ds2tm=;
put ds2dtm=;
end;
enddata;
run;
quit;

Results:

ds2dt=2017-01-31
ds2tm=20:44:59
ds2dtm=2017-02-07 07:00:00.756900000

Using functions to help with conversion

When date, time, or timestamp values are read into DS2 from a database, they are processed in their native data types. Variables in a SAS data set that are formatted with a date, time, or datetime format and read into DS2 must be converted to the equivalent ANSI DATE, TIME, or TIMESTAMP data types. To successfully process ANSI values in DS2 using SAS interval functions, such as INTCK or INTNX, you must first explicitly convert them to the appropriate SAS double-precision numeric value. The following functions can assist with the conversion between ANSI and SAS:

  • TO_DOUBLE—converts any ANSI date, time, or timestamp value to the appropriate SAS numeric date, time, or datetime value
  • TO_DATE—converts an unformatted SAS date to an ANSI date
  • TO_TIME—converts an unformatted SAS time to an ANSI time
  • TO_TIMESTAMP—converts an unformatted SAS datetime to an ANSI timestamp

Example 1: Convert an unformatted SAS date, time, or datetime value in DS2

You can convert unformatted SAS date, time, or datetime values in DS2 by using the TO_DATE, TO_TIME, and TO_TIMESTAMP functions. The HAVING FORMAT option in the DECLARE (DCL) statement assigns a format to the new variables.

data dates;
sas_time=time();
sas_date=today();
sas_datetime=datetime();
run;
 
proc ds2;
data _null_;
dcl date ds2dt having format YYMMDD10.;
dcl time ds2tm having format TIME18.5;
dcl timestamp ds2dtm having format DATETIME28.5;
method run();
set dates;
ds2dt=to_date(sas_date);
ds2tm=to_time(sas_time);
ds2dtm=to_timestamp(sas_datetime);
put 'SAS: ' sas_date sas_time sas_datetime;
put 'ANSI: ' ds2dt ds2tm ds2dtm;
end;
enddata;
run;
quit;

Results:

SAS: 22326 30565.0650000572 1928996965.065
ANSI: 2021-02-15 8:29:25.06500 15FEB2021:08:29:25.06500

Example 2: Convert a formatted SAS date, time, or datetime value in DS2

If you have applied formats to the SAS variables, you must first convert the variables by using the TO_DOUBLE function. If you do not do this conversion in advance and then try to use a SAS function, such as INTNX, you see messages like the following in the log:

ERROR: Compilation error.
ERROR: Line 402: Invalid conversion for date or time type.
WARNING: Implicit conversion of time(4) type to double type. Statement 402.
WARNING: Implicit conversion of int type to double type. Statement 402.

Here is the syntax that you can use for conversion:

data dates;
sas_time=time();
sas_date=today();
sas_datetime=datetime();
format sas_time time. sas_date mmddyy10. sas_datetime datetime.;
run;
 
proc ds2;
data _null_;
dcl double SAS_Date_New having format mmddyy10.;
dcl double SAS_Time_New having format time.;
dcl double SAS_Datetime_New having format datetime.;
method run();
set work.dates;
SAS_Date_New=INTNX('week',to_double(sas_date),26);
SAS_Time_New=INTNX('hour',to_double(sas_time),3);
SAS_Datetime_New=INTNX('dtweek',to_double(sas_datetime),2);
put SAS_Date_New=;
put SAS_Time_New=;
put SAS_Datetime_New=;
end;
enddata;
run;
quit;

Results:

SAS_Date_New=08/15/2021
SAS_Time_New=11:00:00
SAS_Datetime_New=15AUG21:00:00:00

Example 3: Convert an ANSI date, time, or timestamp value to a SAS date in DS2

If you want to convert an ANSI value into a SAS value, use the TO_DOUBLE function. In the example below, the TO_DOUBLE function converts the ANSI timestamp value to a SAS datetime. After you have a SAS datetime, you can then use the SAS functions DATEPART and TIMEPART to extract the date and time values.

proc ds2;
data _null_;
method run();
dcl timestamp ds2dtm;
dcl double sas_date having format mmddyy10.;
dcl double sas_time having format time8.2;
dcl double sas_datetime having format datetime18.2;
ds2dtm = timestamp '2020-02-01 01:23:45.7569';
sas_datetime=to_double(ds2dtm);
sas_date=datepart(sas_datetime);
sas_time=timepart(sas_datetime);
put ds2dtm=;
put sas_date=;
put sas_time=;
put sas_datetime= ;
end;
enddata;
run;
quit;

Results:

ds2dtm=2020-02-01 01:23:45.756900000
sas_date=02/01/2020
sas_time= 1:23:46
sas_datetime=01FEB20:01:23:45.7

Working with both ANSI and SAS date, time, and datetime values in a DS2 program is easily managed with the TO_DOUBLE, TO_DATE, TO_TIME, and TO_DATETIME functions. These functions make the conversion between ANSI and SAS values seamless and efficient.

Thank you for reading and thanks to Mark Jordan for his expertise and help in reviewing this blog before publication.

Additional references

Tips for working with date, time, and datetime values in DS2 was published on SAS Users.

5月 052020
 

When you route REPORT procedure output to any of the ODS destinations, you might want to apply style attributes to a column based on multiple variable values. To do that, you need to use CALL DEFINE statements in a COMPUTE block. This structure could then require complex logic and many CALL DEFINE statements to ensure that all combinations of the variable values and styles are applied appropriately. However, the STYLE/MERGE and STYLE/REPLACE attributes in the CALL DEFINE statement can simplify this process. They are useful when you have two or more COMPUTE blocks with CALL DEFINE statements and the CALL DEFINE statements refer to the same cell in a table.

Using the STYLE/MERGE attribute

The STYLE/MERGE attribute combines any new styles that are specified by the STYLE= argument with any existing styles that are already applied to values in the row or column. In this example, style attributes are applied to the Sex column. In the first COMPUTE block, if the value of Sex is F, the background color of the cell is yellow. In the second COMPUTE block for Age, if the value of Age is greater than 14, the color of the text in the Sex column is red. When STYLE/MERGE is used, that means that a yellow background that also has red text is used for any cell in the Sex column where the value is F and the corresponding Age is also greater than 14.

 
proc report data=sashelp.class; 
column sex age height weight;
define sex--weight / display;
 
compute sex;
if sex = 'F' then
 call define('sex', "style", "style=[background=yellow]"); 
endcomp;
 
compute age;
if age > 14 then
 call define('sex', "style/merge", "style=[color=red]"); 
endcomp;
run;

Here is the resulting output:

Using the STYLE/REPLACE attribute

The STYLE/REPLACE attribute replaces any existing styles for a row or column with the new styles that are specified by the STYLE= argument. In this example, style attributes are applied to the Sex column again. In the first COMPUTE block, if the value of Sex is F, the background color of the cell is yellow. In the second COMPUTE block for Age, if the value of Age is greater than 14, the color of the text in the Sex column is red. When STYLE/REPLACE is used, that means that red text only, without any background color, is used for any cell in the Sex column where the value is F and the corresponding Age is also greater than 14. The red-text style replaces the yellow background.

 
proc report data=sashelp.class; 
column sex age height weight;
define sex--weight / display;
 
compute sex;
if sex = 'F' then
 call define('sex', "style", "style=[background=yellow]"); 
endcomp;
 
compute age;
if age > 14 then	
 call define('sex', "style/replace", "style=[color=red]"); 
endcomp;
run;

Here is the resulting output:

The STYLE/MERGE and STYLE/REPLACE attributes are supported only in the CALL DEFINE statement in a COMPUTE block in PROC REPORT. These useful tools can simplify complex code and enable you to customize your PROC REPORT output with meaningful style choices.

Additional References

PROC REPORT: CALL DEFINE

Sample Note 43758: How to merge styles from multiple CALL DEFINE statements with PROC REPORT

Using STYLE/MERGE and STYLE/REPLACE in PROC REPORT was published on SAS Users.

7月 192019
 
The RANK procedure (PROC RANK) is useful for ranking numeric variables in a data set across observations. You often see PROC RANK used to rank data into quartiles, deciles, or percentiles. This action requires that you use the GROUPS= option in the PROC RANK statement.

This blog answers three questions related to using PROC RANK with groups and ties. Note that question two actually provide an alternative for using the DATA step when PROC RANK cannot provide what you need.

  1. What does PROC RANK do behind the code when you use the GROUPS= option in the PROC RANK statement?
  2. What do you do if you want equal groups, regardless of tied values?
  3. What do you do if you want groups to start at 1 rather than 0?

What does PROC RANK do when you use the GROUPS= option?

Most of us just accept the ranked values that are calculated by PROC RANK. But have you ever tried to figure out the calculation that happens behind the code? When you use the GROUPS= option in the PROC RANK statement, the values are assigned to groups that range from 0 to the number-of-groups minus 1, based on tied values. If you have many tied values, you might not obtain the number of groups that you request because observations with the same value are assigned to the same group.

The formula for calculating group values is as follows:

FLOOR(rank*k/(n+1))

In this formula:

  • rank is the data value's rank order
  • k is the value of the GROUPS= option
  • n is the number of nonmissing values

Consider the following example. If you want to see the original value as well as the ranking, use both the VAR and RANKS statements in your PROC RANK step. The RANKS variable contains the rank value. If you use a BY statement, the raw rankings start over in each BY group.

Example 1

data;
input x;
cards;
1
1
1
1
1
1
8
8
;
run;
proc rank data=test groups=5 out=test_rank 
 ties=mean /* low and high */;
var x;
ranks rank_x;
run;
 
proc print data=test_rank;
run;

Output

x rank_x
1 1
1 1
1 1
1 1
1 1
1 1
8 4
8 4

 
The following table shows the variable values, the raw ranks, the ties values, and the resulting ranks:

X RAW_RANK TIES=MEAN TIES=LOW TIES=HIGH RANK_MEAN RANK_LOW RANK_HIGH
1 1 3.5 1 6 1 0 3
1 2 3.5 1 6 1 0 3
1 3 3.5 1 6 1 0 3
1 4 3.5 1 6 1 0 3
1 5 3.5 1 6 1 0 3
1 6 3.5 1 6 1 0 3
8 7 7.5 7 8 4 3 4
8 8 7.5 7 8 4 3 4

 
Using the formula that is shown previously, k=5 and n=8. Since TIES=MEAN, you sum the raw ranks of the tied values of X and divide by the number of observations. For X=1, the rank is (1+2+3+4+5+6)=21/6=3.5. For X=2, the rank is (7+8)=15/2=7.5. Similarly, if you use TIES=LOW, for X=1, the rank is 1; for X=2, the rank is 7. Finally, if you use TIES=HIGH, for X=1, the rank is 6; for X=2, the rank is 8. When you insert those values into the formula for the first observation, you obtain the following results:

TIES=MEAN: Floor(3.5*5/9)=Floor(1.9)=1
TIES=LOW: Floor(1*5/9)=Floor(0.5)=0
TIES=HIGH: Floor(6*5/9)=Floor(3.3)=3

What do you do if you want equal groups, regardless of tied values?

Suppose that you want to create groups that have the same number of observations in each one, regardless of tied values. PROC RANK cannot do this. However, you can use the DATA step to accomplish this task.

You need to sort the data set by the ranking variable and then use the same formula in the DATA step, as shown below.

Example 2

proc sort data=test;
by x;
run;
 
data ranks;
set test nobs=numobs;
group=floor(_n_*5/(numobs+1));
run;
 
proc print data=ranks;
run;

Output

x group
1 0
1 1
1 1
1 2
1 2
1 3
8 3
8 4

 

What do you do if you want the groups to start at 1 rather than 0?

When you use the GROUPS= option, the values that are assigned to the groups start at 0. There is no way to indicate for PROC RANK to start the groups at 1. However, once you have the data set with the ranked values, you can add 1 using DATA step logic, as shown in this example:

Example 3

data test;
input y; 
cards;
11
22
10
15
25
;
run;
 
proc sort data=test; 
by y;
run;
 
proc rank data=test out=test_rank1 groups=3;
var y;
ranks rank_y;
run;
 
data test_rank1; 
set test_rank1; rank_y+1;
run;
 
proc print data=test_rank1;
run;

Output

y rank_y
10 1
11 2
15 2
22 3
25 3

 

Conclusion

PROC RANK has many statistical applications, such as helping you understand how your data is distributed. Hopefully, this blog has provided you with a better understanding of how ranks are determined when tied values are present in the data and when you want to assign those ranks to groups.

How the RANK procedure calculates ranks with groups and ties was published on SAS Users.

1月 182019
 
Interested in learning about what's new in a software release? What if you want to know whether anything has changed in a SAS product? Or whether there are steps that you need to take before you upgrade to a new software release?
 
The online SAS product documentation for SAS® 9.4 and SAS® Viya® contains new sections that provide these answers. The following sections are usually listed on the left-hand side of the table of contents for the online Help document:
 
“What’s New in Base SAS: Details”
“What’s New in SAS 9.4 and SAS Viya”
“SAS Guide to Software Updates and Product Changes”
Note: To make the product-change information easier to find, this section was retitled for SAS® 9.4M6 and SAS® Viya® 3.4. For documentation about previous releases of SAS 9.4, this section is called “SAS Guide to Software Updates.” The information about product changes is included in a subsection called “Product Details and Requirements.” Although the title is different in newer documentation, the content remains the same.

What's available in each section?

• “What’s New” contains information about new features. For example, in SAS 9.4M6, “What’s New” discusses a new ODS destination for Word and a new procedure called PROC SGPIE.
 
• The “SAS Guide to Software Updates and Product Changes” includes the following subsections:
      o A section on software updates, which is for customers who are upgrading to a new release. (FYI: A software update is any modification that SAS provides for existing software. An upgrade is a new release of SAS. A maintenance release is a collection of updates that are applied to the currently installed software.)
      o Another subsection discusses product details and requirements. In it, you will find information about values or settings that have changed from one release to the next. For example, for SAS 9.4M6, the default style for HTML5 output changed from HTMLBlue to HTMLEncore. Another example is for SAS® 9.4M0, when the YEARCUTOFF= system option changed from 1920 to 1926.

Other links to these resources

In the “What’s New” documentation, there is a link in each section to the corresponding product topic in “SAS Guide to Software Updates and Product Changes.”

For example, when you scroll through this SAS Studio page, you see references both to various “What’s New” pages for different versions of SAS Studio and to “SAS Guide to Software Updates and Product Changes.”

In “What's New in Base SAS: Details,” you can search by the software and maintenance release to find new features. Beginning with SAS 9.4, new features for maintenance releases are introduced using the SAS 9.4Mx notation. For example, in the search box on the page, you can enter 9.4M6.

Final thoughts

With these new online Help sections, you can find information quickly about new features of the current SAS release, as well as what has changed from the previous release. As always, we welcome your feedback and suggestions for improving the documentation.

Special thanks

Elizabeth Downes and Marie Dexter in SAS Documentation Development were very willing to make the requested wording changes in the documentation. They also contributed to the content of this article. Thanks to both for their time and effort!

Navigating SAS documentation to find out about new, modified, and updated features was published on SAS Users.

7月 192018
 

Suppose that you want to know the value of a character variable that has the highest frequency count or even the top three highest values. To determine that value, you need to create an output data set and sort the data by the descending Count or _FREQ_ variable. Then you need to print the top n observations using the OBS= option, based on the number of values that you want to see. You can do this easily using any of a variety of procedures that calculate a frequency count (for example, the FREQ Procedure or the MEANS Procedure).

This blog provides two detailed examples: one calculates the top n values for a single variable and one calculates the top n values for all character variables in a data set.

Print the top n observations of a single variable

The following example prints the three values of the Make variable in the Sashelp.Cars data set that have the highest frequency count. By default, PROC FREQ prints a variable called Count in the output data set. The output data set is sorted by this variable in descending order, and the number of observations that you want to keep is printed by using the OBS= data set option.

proc freq data=sashelp.cars noprint;
tables make / out=counts(drop=percent);
run;
 
proc sort data=counts;
by descending count;
run;
 
proc print data=counts(obs=3);
run;

Print the top n observations of all character variables in a data set

Suppose that you want to know the top three values for all the character variables in a data set. The process shown in the previous section is not efficient when you have many variables. Suppose you also want to store this information in a data set. You can use macro logic to handle both tasks. The following code uses PROC FREQ to create an output data set for each variable. Further manipulation is done in a DATA step so that all the data sets can be combined. A detailed explanation follows the example code:

%macro top_frequency(lib=,dsn=);
 
/* count character variables in the data set */
proc sql noprint;
select name into :charlist separated by ' '
from dictionary.columns
where libname=%upcase("&lib") and memname=%upcase("&dsn")and type='char';
quit;
 
%put &charlist;
%let cnt=%sysfunc(countw(&charlist,%str( )));
%put &cnt;
 
%do i=1 %to &cnt;
 
/* Loop through each character variable in */
/* FREQ and create a separate output  */           
/* data set.                               */
proc freq data=&lib..&dsn noprint;
tables %scan(&charlist,&i) / missing out=out&i(drop=percent 
 rename=(%scan(&charlist,&i)=value));
run;
 
data out&i;
length varname value $100;
set out&i;
varname="%scan(&charlist,&i)";
run;
 
proc sort data=out&i;
by varname descending count;
run;
 
%end;
 
data combine;
set %do i=1 %to &cnt;
out&i(obs=3) /* Keeps top 3 for each variable. */
%end;;
run;
 
proc print data=combine;
run;
 
%mend top_frequency;
 
options mprint mlogic symbolgen;
%top_frequency(lib=SASHELP,dsn=CARS);

I begin my macro definition with two keyword parameters that enable me to substitute the desired library and data set name in my macro invocation:

%macro top_frequency(lib=,dsn=);

The SQL procedure step selects all the character variables in the data set and stores them in a space-delimited macro variable called &CHARLIST. Another macro variable called &CNT counts how many words (or, variable names) are in this list.

proc sql noprint;
select name into :charlist separated by ' '
from dictionary.columns
where libname=%upcase("&lib") and memname=%upcase("&dsn") and type='char';
quit;
 
%put &charlist;
%let cnt=%sysfunc(countw(&charlist,%str( )));
%put &cnt;

The %DO loop iterates through each variable in the list and generates output data from PROC FREQ by using the OUT= option. The output data set contains two variables: the variable from the TABLES request with the unique values of that variable and the Count variable with the frequency counts. The variable name is renamed to Value so that all the data sets can be combined in a later step. In a subsequent DATA step, a new variable, called Varname, is created that contains the variable name as a character string. Finally, the data set is sorted by the descending frequency count.

%do i=1 %to &cnt;
 
/* Loop through each character variable in PROC FREQ */ 
/* and create a separate output data set.            */
proc freq data=&lib..&dsn noprint;
tables %scan(&charlist,&i) / missing
out=out&i(drop=percent 
 rename=(%scan(&charlist,&i)=value));
run;
 
data out&i;
length varname value $100;
set out&i;
varname="%scan(&charlist,&i)";
run;
 
proc sort data=out&i;
by varname descending count;
run;
 
%end;

The final DATA step combines all the data sets into one using another macro %DO loop in the SET statement. The %END statement requires two semicolons: one ends the SET statement and one ends the %END statement. Three observations of each data set are printed by using the OBS= option.

data combine;
set %do i=1 %to &cnt;
 out&i(obs=3) /* Keeps top 3 for each variable. */
%end;;
run;

Knowing your data is essential in any programming application. The ability to quickly view the top values of any or all variables in a data set can be useful for identifying top sales, targeting specific demographic segments, trying to understand the prevalence of certain illnesses or diseases, and so on. As explained in this blog, a variety of Base SAS procedures along with the SAS macro facility make it easy to accomplish such tasks.

Learn more

These resources show different ways to create "top N" reports in SAS:

Keeping the top frequency count (n) for each character variable in a SAS data set was published on SAS Users.

7月 212017
 

n% of observations from a data setSAS® offers several ways that you can find the top n% and bottom n% of data values based on a numeric variable. The RANK procedure with the GROUPS= option is one method. Another method is The UNIVARIATE procedure with the PCTLPTS= option. Because there are several ways to perform this task, you can choose the procedure that you are most familiar with. In this blog post, I use the SUMMARY procedure to generate the percentile values and macro logic to dynamically choose the desired percentile statistics. After the percentiles are generated, I subset the data set based on those values. This blog post provides two detailed examples: one calculates percentiles for a single variable and one calculates percentiles within a grouping variable.

Calculate Percentiles of a Single Variable

Calculating percentiles of a single variable includes the following steps. Within the macro, a PROC SUMMARY step calculates the percentiles. The subsequent DATA step uses CALL SYMPUTX to create macro variables for the percentile values, and the final DATA step uses those macro variables to subset the data. Here is the code, which is explained in detail below:

/* Create sample data */
data test;                   
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
   drop i; 
run;     
 
proc sort data=test;
   by x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values */                         
proc summary data=test;                                               
   var x;                                                       
   output out=test1 &ptile1= &ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for the percentile values */     
data _null_;                                                         
   set test1;                                                         
   call symputx("&ptile1", x_&ptile1);                                     
   call symputx("&ptile2", x_&ptile2);                                     
run;    
%put &&&ptile1;
%put &&&ptile2; 
 
data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &&&ptile1 or x ge &&&ptile2;                                       
run;  
 
proc print;
run; 
 
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

After creating and sorting the sample data, I begin my macro definition with two parameters that enable me to substitute the desired percentiles in my macro invocation:

%macro generate_percentiles(ptile1,ptile2);

The PROC SUMMARY step writes the desired percentiles for variable X to the Test1 data set. The AUTONAME option names the percentile statistics in the following format, <varname>_<percentile> (for example, x_p25).

proc summary data=test;                                               
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;

Next, I want to store the values of the percentile statistics in macro variables so that I can use them in later processing. I use CALL SYMPUTX to do this, which gives the macro variables the same name as the statistic. To see the resulting values in the log, I use

data _null_;                                                         
   set test1;                                                         
   call symputx("&amp;ptile1", x_&amp;ptile1);                                     
   call symputx("&amp;ptile2", x_&amp;ptile2);                                     
run;    
%put &amp;&amp;&amp;ptile1;
%put &amp;&amp;&amp;ptile2;

The SAS log shows the following:

MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile1
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE1 resolves to p1
SYMBOLGEN:  Macro variable P1 resolves to 123.22158288
123.22158288
MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile2
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE2 resolves to p99
SYMBOLGEN:  Macro variable P99 resolves to 12232.136483
12232.136483

I use these macro variables in a WHERE statement within a DATA step to subset the data set based on the percentile values:

data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &amp;&amp;&amp;ptile1 or x ge &amp;&amp;&amp;ptile2;                                       
run;

Finally, the macro invocations below pass in the desired percentile statistics:

%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

The percentile statistics that are available with PROC SUMMARY are included in the documentation for the

/* Create sample data */
data test; 
 do group='a','b'; 
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
 end;
   drop i; 
run;     
 
proc sort data=test;
   by group x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values by group */                         
proc summary data=test; 
   by group; 
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for each value of the BY variable */
/* Create macro variables for the percentile values for each BY group */ 
/* Create a macro variable that is the count of the unique
values of the BY variable */ 
data _null_;   
  retain count 0; 
   set test1;   
   by group;
   if first.group then do;
    count+1;
    call symputx('val'||left(count),group); 
    call symputx("&amp;ptile1"||'_'||left(count), x_&amp;ptile1);                                     
    call symputx("&amp;ptile2"||'_'||left(count), x_&amp;ptile2);  
   end; 
  call symput('last',left(count));
run;    
%put _user_;
 
/* Loops through each value of the BY variable */ 
%do i=1 %to &amp;last;
 
data test&amp;i;                                                             
   set test;  
   where group="&amp;&amp;val&amp;i"; 
/* Use an IF statement to subset the data */
   if x le &amp;&amp;&amp;ptile1._&amp;i or x ge &amp;&amp;&amp;ptile2._&amp;i;                                       
run;  
 
proc print;
run; 
 
%end;
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)

Calculating percentiles has many applications, including ranking data, finding outliers, and subsetting data. Using a procedure in Base SAS® that enables you to request percentile statistics along with the power of the macro language, you can dynamically generate desired values that can be used for further processing and analysis.

Selecting the top n% and bottom n% of observations from a data set was published on SAS Users.

5月 202016
 

ProblemSolversPROC FREQ is often the first choice when you want to generate basic frequency counts, but it is the last choice when it is compared to other statistical reporting procedures. People sometimes consider PROC FREQ last because they think they have little or no control over the appearance of the output. For example, PROC FREQ does not allow style options within the syntax, which the REPORT and TABULATE procedures do allow. Also, you cannot control the formats or headings with statements in the procedure step.

Sometimes, a simple frequency (via a one-way table) is all you want, and you don’t want to have to create an output data set just to add a format. A one-way table in PROC FREQ is unique also in that it includes a cumulative count and percent. These calculations cannot be done in other procedures without additional code or steps. However, there is a simple way to make some basic modifications to your table. By adding a PROC TEMPLATE step to modify either the Base.Freq.OneWayFreqs or the Base.Freq.OneWayList table template, you can change the formats of the statistics, move the label of the variable, change the labels of the statistics, and suppress the Frequency Missing row that appears below the table. These changes apply to all output destinations, including the traditional listing output.  You can also use PROC TEMPLATE to make small modifications to the nodes that are generated by one-way tables in the table of contents for non-listing ODS destinations.

Customize the Formats and Labels for Statistics

If you want to modify the format for the statistics Frequency, Cumulative Frequency, Percent, and Cumulative Percent, you can use the FORMAT= option in PROC TEMPLATE. This option accepts SAS formats and user-defined formats. If you want to change the statistic heading, you can use the HEADER= option.

The following example uses both the FORMAT= and the HEADER= statements to format the statistics values and to change the Frequency heading to Count. This example also changes the Percent heading to Pct.

proc format;
     picture pctfmt (round) other='009.9%';
run;

proc template; 
     edit Base.Freq.OneWayList;
        edit frequency;  
           header='Count'; 
           format=comma12.;
        end;  

        edit cumfrequency;
           format=comma12.; 
        end;
edit percent;
header='Pct';
format=pctfmt.;
end;
   
edit cumpercent;
format=pctfmt.; 
end; 
end;
run;                                                                                  

data class;
     set sashelp.class;
     wgt=1000;
run;
 
proc freq data=class;
     tables age;
     weight wgt; 
run;

This code generates the following table:

FREQ procedure's one-way tables01

Move the Variable Label

If a variable has a label, it is centered, by default, at the top of the table, and the variable name appears above the column of values.

FREQ procedure's one-way tables02

If you want to use the label instead of the name above the column, you can edit the HEADER= value in PROC TEMPLATE, as shown in the following example:

proc template;
   edit Base.Freq.OneWayList;             
        edit h1; 
                /* Set text to a blank instead of VARLABEL. */ 
             text " ";
        end;                                                
                                                                                                  
        edit FVariable;                                           
           just=varjust;
           style=rowheader;
           id;
           generic;
           header=varlabel;
        end;
     end;
  run;

proc freq data=sashelp.class;
   label age='Age in Years';
   tables age;
run;

This code generates the following table, which replaces the default location of the label with blank text and moves the label so that it is above the column of the variable's values, as shown in this example:

FREQ procedure's one-way tables03

Suppress the Frequency Missing= Row

If a variable has missing values and you do not include the MISSING option in the TABLES statement of PROC FREQ, the output includes the frequency of missing values (Frequency Missing=) below the table with the number of missing values.

The following table shows the default output:

FREQ procedure's one-way tables04

To suppress this line without including missing values in the table output, you can modify the table template, as follows:

data test;
input c1 $;
cards;
a
b
.
;
run;

proc template;
     edit Base.Freq.OneWayFreqs; 

   /* This is the default setting:                     */ 
   /*  define f2;                                      */
   /*     text "Frequency Missing =" fMissing -12.99; */
   /*     print = pfoot2;                              */
   /*  end;                                            */
        edit f2; 
           text " ";
        end;
     end;
run;

proc freq data=test;
   table c1;
run;

The table that is generated by this code does not contain the Frequency Missing= row at the bottom of the table.

FREQ procedure's one-way tables05

Customize the Table of Contents

For a one-way table, the default table of contents looks like this:

FREQ procedure's one-way tables06

To control the first node, The Freq Procedure, you can use the ODS PROCLABEL statement.  You can change the text of this node, but you cannot eliminate the node. You can remove the One-Way Frequencies node using PROC TEMPLATE with the CONTENTS= statement. The following example changes the text of the The Freq Procedure node, and it eliminates the One-Way Frequencies node.

proc template;
   edit Base.Freq.OneWayFreqs;
      contents=off;
   end;
run;

ods listing close;
ods pdf file='test.pdf';

ods proclabel='My One-Way Table';
proc freq data=sashelp.class;
tables age;
run;

ods pdf close;
ods listing;

Here is the modified table of contents:

FREQ procedure's one-way tables07

(For additional customization of the table of contents, you need to consider using the DOCUMENT procedure.  The SAS Global Forum 2011 paper Let's Give 'Em Something to TOC about: Transforming the Table of Contents of Your PDF File, by Bari Lawhorn, contains useful information and examples that illustrate how to use PROC DOCUMENT, PROC TEMPLATE, and other SAS statements and options to customize your TOC.)

To restore the default table template, run the following code:

proc template;
delete Base.Freq.OneWayFreqs;
delete Base.Freq.OneWayList;
run;

The examples above show how you actually can customize PROC FREQ one-way tables, with a little help from PROC TEMPLATE. PROC TEMPLATE offers several methods that make it easy to customize one-way, frequency-count tables that enhance table format and appearance.
tags: Problem Solvers, PROC FREQ, SAS Programmers

Seeing the FREQ procedure's one-way tables in a new light was published on SAS Users.

7月 172015
 

ProblemSolversCustomizing the output data set created using the OUTPUT statement

When you request statistics on the PROC MEANS statement, the default printed output creates a nice table with the analysis variable names in the left-most column and the statistics forming the additional columns.  Even if you create an output data set with no statistics listed on the OUTPUT statement, the default statistics, N, MIN, MAX, MEAN, and STD, are output in a nice table format as values of the _STAT_ variable and the analysis variables form the other columns.  Once you start requesting specific statistics on the OUTPUT statement, the format of the data set changes.  A wide output data set is the result with only one observation and a variable for each analysis variable-statistic combination.  If you use a BY or CLASS statement, you will get multiple observations for each unique value of the BY or CLASS variable combinations, but the analysis variable structure is the same.  The structure of this data set can be hard to view.  It would be nice if the output data set could maintain the format of the printed output or the default output data set.  You can do that with a few simple code modifications using PROC TRANSPOSE and DATA Step logic.

If you want the output data set to look like the default printed output, use the steps below.

/* create a sample data set */
data test;
input x1 x2 x3;
cards;
1 2 3
4 5 6
7 8 9
;
run;
 
/* Request statistics on the OUTPUT statement. */
/* Use the AUTONAME option to create variable names in the form: */
/* variable_statistic */
 
proc means noprint;
var _numeric_;
output out=new(drop=_type_ _freq_) mean= median= std= q1= q3= / autoname;
run;
 
/* Transpose the data set so that each statistic becomes an observation. */
proc transpose data=new out=out;
run;
 
/* Create new variables that contain the statistic name and the */
/* original variable name. */
 
data out1;
set out;
varname=scan(_name_,1,'_');
stat=scan(_name_,2,'_');
drop _name_;
run;
 
proc sort data=out1;
by varname;
run;
 
/* Transpose the data to get one observation for each  */
/* original variable name and one variable for each    */
/* statistic.  This mimics the default printed output. */
 
proc transpose data=out1 out=out2(drop=_name_);
by varname;
id stat;
var col1;
run;
 
proc print data=out2;
title 'Looks like default printed output';
run;

 

Looks like default printed output

varname Mean Median StdDev Q1 Q3
x1 4 4 3 1 7
x2 5 5 3 2 8
x3 6 6 3 3 9

If you want the output data set to look like the default output data set, use the steps below in place of the last three steps above.

proc sort data=out1;
by stat;
run;
 
/* Transpose the data to get one observation for each */
/* statistic name and one variable for each           */
/* original variable.  This mimics the default        */
/* output data set.                                   */
 
proc transpose data=out1 out=out3(drop=_name_);
by stat;
id varname;
var col1;
run;
 
proc print data=out3;
title 'Looks like default output data set';
run;

Looks like default output data set

stat x1 x2 x3
Mean 4 5 6
Median 4 5 6
Q1 1 2 3
Q3 7 8 9
StdDev 3 3 3

 

If your variable names contain underscores (_) in the name, replace the DATA OUT1 step above with the following DATA OUT1 step:

/* Depending on your variable name, you may need to increase */
/* the length $11 assignment below.                          */
 
data out1;
set out;
length varname $11;
lenvar=length(_name_);
lenstat=index(reverse(trim(_name_)),'_')-1;
varname=substr(_name_,1,(lenvar-lenstat)-1);
stat=scan(_name_,-1,'_');
drop _name_ lenvar lenstat;
run;

Customizing the output using PROC TEMPLATE

For the default printed output from PROC MEANS, in the past we said that the only option for formatting the statistics was with the MAXDEC= option which controls the number of decimal places.  We are no longer limited to this level of formatting since we can use PROC TEMPLATE.  We can format the statistics in the printed output by modifying the Base.Summary table template.

/* create sample data */
data test;
input amount ;
cards;
1000
12345
8500
;
run;
 
/* Modify the base.summary table template using */
/* EDIT statements. */
ods path(prepend) work.templat(update);
proc template;
 edit base.summary;
  edit mean;
   format=dollar12.2;
  end;
  edit sum;
   format=dollar12.;
  end;
 end;
run;
 
proc means data=test n mean sum;
var amount;
run;
 
/* restore default template */
proc template;
 delete base.summary;
run;

Custom format for printed output

 

The MEANS Procedure
Analysis Variable : amount
N Mean Sum
3 $7,281.67 $21,845

Customizing the output data using STACKODSOUTPUT

In SAS 9.3, a new option was added that only affects the output data set created using the ODS OUTPUT statement.  This option is STACKODSOUTPUT (alias: STACKODS) and it is placed on the PROC MEANS statement.  This option allows the data set to resemble the default printed output.

/* Use the STACKODSOUTPUT option on the PROC MEANS statement   */
/* and use the ODS OUTPUT statement to create output data set. */
 
ods output summary=with_stackods(drop=_control_);
proc means data=sashelp.class stackodsoutput sum mean std nway;
 class sex;
 var height weight;
run;
 
proc print data=with_stackods noobs;
 title 'Output data set with STACKODSOUTPUT';
run;

Output data set with STACKODSOUTPUT

Sex NObs Variable Sum Mean StdDev
F 9 Height 545.300000 60.588889 5.018328
F 9 Weight 811.000000 90.111111 19.383914
M 10 Height 639.100000 63.910000 4.937937
M 10 Weight 1089.500000 108.950000 22.727186

If you are not satisfied with the output from PROC MEANS, either the default printed output or the output data set, you are just a few steps away from reshaping it into the format you desire.

tags: Problem Solvers

Customizing output from PROC MEANS was published on SAS Users.

5月 092014
 

It’s an understatement to say there are many Base SAS procedures!

Some procedures may be used for basic report writing. Other procedures may be used to perform statistical analysis. Some have similar functions. Others are unique in the output that they can produce. Which procedure you choose generally depends on the type of output you are trying to generate—with perhaps a bit of personal preference sprinkled into the mix

I often get calls from SAS users who are trying sort through the options and thought a blog post illustrating a few alternatives might help you choose the procedure that’s the best fit for your needs.  Here are a few common choices for calculating frequency, percentages and a few other simple statistics, but you can certainly use other Base SAS procedures or DATA step processing to perform these calculations. I’ve also included a few notes on customizing calculations and output.

It’s helpful to note that Base procedures have specific keywords to refer to statistics. For future reference, you might want to bookmark this table of common procedures and the simple statistics.

Calculating frequency

If you need to generate basic frequency (N) or sum reports, you can use a number of Base procedures.

PROC PRINT allows you to get a frequency count within a BY group and across the entire data set. In addition, PROC PRINT can create summarized values of numeric variables, also within a BY group and for the entire data set.

proc sort data=sashelp.class out=class;                   
by sex;                                                
run;                         
 
proc print data=class noobs                               
     sumlabel='Subtotal' grandtotal_label='Grand Total';   
by sex;                                                
var name age height weight;                            
sum height weight;                                     
run;

PROC REPORT allows for more customized grouping and display of variable values, and it supports the computation of new variables within COMPUTE blocks.

proc report data=sashelp.class nowd;
column sex age height weight bmi;
define sex / group;
define age / group;
define height / sum;
define weight / sum;
define bmi / computed format=8.2;
 
compute bmi;
bmi=(weight.sum/(height.sum)**2)*703;
endcomp;
run;

PROC FREQ is another procedure that outputs basic frequency counts.  This procedure will group like variable values together and return the frequency count for the grouping.  PROC FREQ also has the ability to create an output data set.

proc freq data=sashelp.class;
tables age*sex / out=new outpct;
run;
 
proc print data=new;
run;

If you want to get the distinct count of a variable’s values, you can use PROC FREQ with the NLEVELS option.

proc freq data=sashelp.class nlevels;
tables age;
run;

Calculating percentages

PROC FREQ, by default, outputs percentages for multi-way tables, representing overall, row, and column percents.

proc freq data=sashelp.class;
tables age*sex;
run;

PROC TABULATE outputs comparable percentages using the following statistic keywords:  PCTN, ROWPCTN, and COLPCTN.

proc tabulate data=sashelp.class;
class age sex;
table age*(n pctn rowpctn colpctn) all*(n rowpctn), sex all;
run;

PROC TABULATE has the added ability to generate more advanced denominator definitions.  You will find the SAS Global Forum 2013 paper Tips for Generating Percentages Using the SAS® TABULATE Procedure helpful.

PROC REPORT uses the PCTN statistic to generate a column percentage. Other custom percentages and be computed in PROC REPORT using COMPUTE blocks.

proc report data=sashelp.class nowd;
column age sex,(n pctn);
define age / group;
define sex / across;
define pctn / format=percent8.2 'Col %';
run;

Calculating other statistics

SAS Base procedures MEANS, SUMMARY, REPORT and TABULATE can calculate many statistics as highlighted in the table of common procedures and the simple statistics.

PROC TABULATE and PROC REPORT have a report-friendly tabular structure.

proc tabulate data=sashelp.class;
class age;
var height weight;
table age, (height weight)*(sum mean min max);
run;

PROC SUMMARY or PROC MEANS are recommended if you need to create an output data set for your requested statistics. These two procedures are essentially the same except for a few defaults:

  • PROC SUMMARY does not create printed output by default, but PROC MEANS does.
  • Another difference is if you omit the VAR statement, PROC SUMMARY creates a simple frequency count of observations, but PROC MEANS analyzes all numeric variables that are not listed on other statements.
proc summary data=sashelp.class;
class age;
var height weight;
output out=stats sum= mean= min= max= / autoname;
run;
 
proc print data=stats;
run;

Customizing calculations, summaries and output

If your output needs to include customized summaries using IF/THEN logic, then PROC REPORT is the procedure to choose with its’ COMPUTE blocks and LINE statements.  The SAS Samples below illustrate how to:

Finally, any of these procedures can be customized and output to any destination, including Excel, RTF, PDF and HTML, using the Output Delivery System (ODS).  Here is an example to Demonstrate the use of banding in PROC TABULATE.

tags: base sas, Problem Solvers, SAS Programmers, statistics
5月 302012
 

I am part of a 'virtual team from SAS Technical Support' who provide the suggestions for the Tips & Techiques section for the SAS Tech Report each month. A couple of days ago, Bill Gibson, Chief Technology Officer from SAS Australia, sent us an email saying that like him, many seasoned SAS programmers concatenate values with the combination of || and TRIM and LEFT functions. He thought it would be a good idea to create a SAS Usage Note that spotlights the faster and more elegant concatenation approaches available in SAS 9.0 and beyond.

Prior to SAS 9.0, in order to concatenate text strings in SAS, a combination of the TRIM and LEFT functions and the concatenation operator (||) had to be used. A separator, if desired, would have to be added using a quoted string.

Beginning in SAS 9.0, the CAT family of functions was introduced. These functions simplify concatenating text strings by automatically using certain combinations of these functions and the concatenation operator. The new functions are listed below.

CAT: concatenates character strings without removing leading or trailing blanks

CATS: concatenates character strings and removes leading and trailing blanks

CATT: concatenates characters strings and removes trailing blanks

CATX: concatenates character strings, removes leading and trailing blanks, and inserts separators

The table below shows a comparison of each CAT function to using the concatenation operator.

Function Equivalent Code
CAT(OF X1-X4) X1||X2||X3||X4
CATS(OF X1-X4) TRIM(LEFT(X1))||TRIM(LEFT(X2))||TRIM(LEFT(X3))||TRIM(LEFT(X4))
CATT(OF X1-X4) TRIM(X1)||TRIM(X2)||TRIM(X3)||TRIM(X4)
CATX(SP, OF X1-X4) TRIM(LEFT(X1))||SP||TRIM(LEFT(X2))||SP||TRIM(LEFT(X3))||SP||TRIM(LEFT(X4))

To see an example of the new functions, run the following code:

data test;
  input (x1-x4) ($);
  x5=' 5';
  length new1 $40 new2-new4 $10 ;
  new1=cat(of x1-x5);
  new2=cats(of x1-x5);
  new3=catt(x1,x2,x3,x4,x5);
  new4=catx(',', of x1-x5);
  keep new:;
datalines;
1 2 3 4
5 6 . 8
;
run;

proc print;
run;

There are several differences in the CAT family of functions compared to the concatenation operator for concatenating character strings. If a variable has not been previously assigned a length, the variable created with the CAT function will be assigned a length of 200. Using the concatenation operator, the length assigned is the sum of the lengths of the values being concatenated. Another difference is in how missing values are treated. When using the CATX function with a separator, missing values are ignored. When using the concatenation operator and defining a separator, if any variable has a missing value, multiple separators will appear consecutively. The sample code below illustrates this.
data test;
x1='1';
x2=' ';
x3='6';
newx=catx(',', of x1-x3);
newx1=trim(left(x1))||','||trim(left(x2))||','||trim(left(x3));
put newx=;
put newx1=;
run;


These code examples are provided as is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

tags: concatenation, tips & techniques