3月 222022

In my previous blog, you saw how to create a Beale cipher. In this blog, you will see a program that can decode a Beale cipher. As a reminder, here is a list of numbers that you can use as a substitute for a letter when creating your cipher.

Now, suppose you want to send the following message: "Come to safe house at ten tonight." One possible cipher for this message is:

65 12 81 84 55 46 3 73 88 71 80 11 7
20 57 94 35 84 82 22 29 33 44 16 31 10
67 48 73 60

The first step to decode this cipher is the same as the first step in the program to create the cipher: Make a list of possible numbers to represent each letter. I'll repeat it here:

*Create the list of letters and numbers;
Data Decipher;
   length Letter $ 1; 
   infile 'c:\Books\Blogs\Declare.txt'; 
   input Letter : $upcase1. @@; 
   N + 1; 
title "Listing of Data Set Decipher";
title2 "First Five Observations";
proc print data=Decipher(obs=5) noobs;

This is the program that created the list of numbers corresponding to each letter. The next step in the program to create a Beale cipher was to sort by Letter. This time you want it in number order. Because it is already in order by the variable N, you don't have to sort it. Here are the first five observations in data set Decipher:

The next step is to read the message and make a SAS data set.

*Make a SAS data set from the Message text;
data Message;
   infile 'c:\books\Blogs\Cipher\Message.txt';
   input NN @@;
title "First 5 Observations from Data Set Message";
proc print data=Message(obs=5) noobs;

Here is the listing:

The final step is to create a temporary array (long enough to hold all the numbers). Each element in this array will contain the letter corresponding to the position in the array. The DATA step below first loads the temporary array elements with the appropriate letters and then reads each number from the file Message.txt (that contains the secret code). The temporary array is acting as a lookup table to find the letter corresponding to the number. I have annotated the program so that you can see exactly what is going on.

data Final;
   length Letter $ 1 String $ 200;
   array Letters[1000] $ _temporary_; ❶
   set Decipher (keep=Letter) end=Last_Obs; ❷
      Letters[N] = Letter; ❸
      if Last_Obs then do i = 1 to N_Message; ❹
      set Message Nobs=N_Message; ❺
      Letter = Letters[NN]; ❻
      String = catx(' ',String,Letter); ❼
      if i = N_Message then output; ❽
   keep String;
title "Decoded message";
proc print data=Final noobs; ❾

❶ Create a temporary array. Each element in the temporary array (Letters) is a letter corresponding the element number. For example, Letters[1] is 'W', Letters[2] is 'I', and so forth.

❷ Bring in the observations in data set Decipher. Each observation in this data set contains the first letter of each word in the document. The END= option lets you know when you have read the last observation in the Decipher data set.

❸ Load up the temporary array based on the values of N and Letter

❹ Once the temporary array is loaded, read in the observations in data set Message. Notice that the variable N_Message was set to the number of observations in data set Message at compile time by using the SET option NOBS=.

❺ Bring in the observations from data set Message.

❻ Decipher the number (NN) to determine the letter it represents.

❼ Use the CATX function to add all the letters to the variable String.

❽ After all the numbers from the file Message.txt have been processed, it is time to output an observation containing the variable String.

❾ Use PROC PRINT to print out the message.

Here is the output:

I showed this program to my friend Mark Jordan (aka SAS Jedi), and he came up with a solution that uses formats to do the table lookup. It is probably an easier and more elegant program than mine (his programs usually are), and I am including his program here.

The first step is once again to create the cipher. Make a list of possible numbers to represent each letter.  This time, though, we’ll create the Decipher data set so that it can be used to build a SAS format.

*Create the list of letter and numbers;
data Decipher;
   retain Fmtname 'Decipher' Type 'N'; ❶
   length LABEL $ 1;  ❷
   infile 'c:\Books\Blogs\Declare.txt'; 
   input Label : $upcase1. @@; 
   N + 1; 
   Start=N; ❸
   output; ❹
   drop N;
title "Listing of Data Set Decipher";
title2 "First 5 Observations";
proc print data=Decipher(obs=5) noobs; ❺

❶ FMTNAME and TYPE are required to be the same value for each observation. We accomplish that with a RETAIN statement.

❷ LABEL and START are the other two required variables for a PROC FORMAT control data set.

❸ Set Start to N.

❹ Write one row for each value we want to decode.

❺ Print the first 5 observations of the Decipher data set.

Here is the listing:

The next step is to create a format from the Decipher data set:

* Make a format from the Decipher data set;
proc format cntlin=Decipher fmtlib;

The FMTLIB option produces a report documenting the format. Here is a sample:

The final step is to use the format on each number in the message text to decode it. The first DATA step below reads each number from the file Message.txt (that contains the secret code) to create the Message data set. The second DATA step reads the Message data set and applies the format to each numeric value using the PUT function. This produces the letter corresponding to the number. I have annotated the program so that you can see exactly what is going on.

*Make a SAS data set from the Message text;
data Message;
   infile 'c:\Books\Blogs\Cipher\Message.txt';
   input NN @@;
data Final;
   length String $200;
   retain string;
   keep String;
   set Message end=last; ❶
   String = catx(' ',String,put(NN,decipher.)); ❷
   if last then output;
title "Decoded message";
proc print data=Final noobs; ❸

❶ Bring in the observations from the Message data set.
❷ Use the PUT function to produce the correct letter, and the CATX function to combine the letters into the variable String.
❸ Use PROC PRINT to print out the message.

I hope you enjoy both of these programs. Please add a comment to the blog with your preference. I think I'll vote for Mark's program!

Fun with Ciphers (Part 2) was published on SAS Users.

9月 072020

Locale-specific SAS® format catalogs make reporting in multiple languages more dynamic. It is easy to generate reports in different languages when you use both the LOCALE option in the FORMAT procedure and the LOCALE= system option to create these catalogs. If you are not familiar with the LOCALE= system option, see the "Resources" section below for more information.

This blog post, inspired by my work on this topic with a SAS customer, focuses on how to create and use locale-specific informats to read in numeric values from a Microsoft Excel file and then transform them into SAS character values. I incorporated this step into a macro that transforms ones and zeroes from the Excel file into meaningful information for multilingual readers.

Getting started: Creating the informats

The first step is to submit the LOCALE= system option with the value fr_FR. For the example in this article, I chose the values fr_FR and en_US for French and English from this table of LOCALE= values. (That is because I know how to say “yes” and “no” in both English and French — I need to travel more!)

   options locale=fr_fr;

The following code uses both the INVALUE statement and the LOCALE option in PROC FORMAT to create an informat that is named $PT_SURVEY:

   proc format locale library=work;
      invalue $pt_survey 1='oui' 0='non'; run;

Now, toggle the LOCALE= system option and create a second informat using labels in a different language (in this example, it is English):
options locale=en_us;

   proc format locale library=work;
      invalue $pt_survey 1='yes' 0='no';

In the screenshot below, which shows the output from the DATASETS procedure, you can see that PROC FORMAT created two format catalogs using the specified locale values, which are preceded by underscore characters. If the format catalogs already exist, PROC FORMAT simply adds the $PT_SURVEY informat entry type to them.

   proc datasets memtype=catalog; 

Before you use these informats for a report, you must tell SAS where the informats are located. To do so, specify /LOCALE after the libref name within the FMTSEARCH= system option. If you do not add the /LOCALE specification, you see an error message stating either that the $PT_SURVEY informat does not exist or that it cannot be found. In the next two OPTIONS statements, SAS searches for the locale-specific informat in the FORMATS_FR_FR catalog, which PROC FORMAT created in the WORK library:

   options locale=fr_fr;
   options fmtsearch=(work/locale);

If you toggle the LOCALE= system option to have the en_US locale value, SAS then searches for the informat in the other catalog that was created, which is the FORMATS_EN_US catalog.

Creating the Excel file for this example

For this example, you can create an Excel file by using the ODS EXCEL destination from the REPORT procedure output. Although you can create the Excel file in various ways, the reason that I chose the ODS EXCEL statement was to show you some options that can be helpful in this scenario and are also useful at other times.
Use the ODS EXCEL destination to create a file from PROC REPORT. I specify the TAGATTR= style attribute using “TYPE:NUMBER” for the Q_1 variable:

   %let  path=%sysfunc(getoption(WORK));
   filename temp "&path\surveys.xlsx"; 
   ods excel file=temp;
   data one;
      infile datalines truncover;
      input ptID Q_1;
   111 0
   112 1
   proc report data=one;
      define ptID / display style(column)={tagattr="type:String"};
      define Q_1 / style(column)={tagattr="type:Number"};
   ods excel close;

Now you have a file that looks like this screenshot when it is opened in Excel. Note that the data value for the Q_1 column is numeric:

The IMPORT procedure uses the DBSASTYPE= data set option to convert the numeric Excel data into SAS character values. Then I can apply the locale-specific character informat to a character variable.

As you will see below, in the macro, I use DBMS=EXCEL in PROC IMPORT to read the Excel file because my SAS and Microsoft Office versions are both 64-bit. (You might have to use the PCFILES LIBNAME Engine to connect to Excel through the SAS PC Files Server if you are not set up this way.)

Using the informats in a macro to create the multilingual reports

The final step is to run the macro with parameters to produce the two reports in French and English, using the locale-specific catalogs. When the macro is called, depending on the parameter value for the macro variable LOCALE, the LOCALE= system option changes, and the $PT_SURVEY informat from the locale-specific catalog is applied. These two tabular reports are produced:

Here is the full code for the example:

   %let  path=%sysfunc(getoption(WORK));
   filename temp "&path\surveys.xlsx";
   ods excel file=temp;
   data one;
      infile datalines truncover;
      input ptID Q_1;
   111 0
   112 1
   proc report data=one;
      define ptID / display style(column)={tagattr="type:String"};
      define Q_1 / style(column)={tagattr="type:Number"};
   ods excel close;
   options locale=fr_fr;
   proc format locale library=work;
      invalue $pt_survey 1='oui' 0='non';
   options locale=en_us;
   proc format locale library=work;
      invalue $pt_survey 1='yes' 0='no';
   /* Set the FMTSEARCH option */
   options fmtsearch=(work/locale);
   /* Compile the macro */
   %macro survey(locale,out);
      /* Set the LOCALE system option */
      options locale=&locale;
      /* Import the Excel file  */
      filename survey "&path\surveys.xlsx";
      proc import dbms=excel datafile=survey out=work.&out replace;
      data work.&out;
         set work.&out;
         /* Create a new variable for the report whose values are assigned by specifying the locale-specific informat in the INPUT function */
         newvar=input(Q_1, $pt_survey.);
         label newvar='Q_1';
      options missing='0';
      /*  Create the tabular report */
      proc tabulate data=&out;
         class ptID newvar;
         table ptID='Patient ID', newvar*n=' '/box="&locale";
   %mend survey;
   /* Call the macros */

For a different example that does not involve an informat, you can create a format in a locale-specific catalog to print a data set in both English and Romanian. See Example 19: Creating a Locale-Specific Format Catalog in the Base SAS® 9.4 Procedures Guide.


For more information about the LOCALE option:

For more information about reading and writing Excel files:

For more information about creating macros and using the macro facility in SAS:

Using locale-specific format catalogs to create reports in multiple languages was published on SAS Users.

12月 042017

During my 35 years of using SAS® software, I have found the CNTLIN and CNTLOUT options in the FORMAT procedure to be among the most useful features that I routinely suggest to other SAS users. The CNTLIN option enables you to create user-defined formats from a SAS data set (input control data set). The CNTLOUT option enables you to create a SAS data set (output control data set) containing format details from an entry in a SAS format catalog.

In this blog post, I provide a few examples demonstrating how to use the CNTLIN option. I also mention how to use the CNTLOUT option to store your format information in case you need to move to a new operating environment.

You can store all the format details from a SAS format catalog in a CNTLOUT data set and later restore them in a format catalog in your new operating environment using the CNTLIN option. For details, see SAS Usage Note 22194: “How to use the CNTLOUT= and CNTLIN= options in PROC FORMAT to move formats from one platform to another.”

A data set for the CNTLIN option contains variables that give specific information about ranges and values. At a minimum, the data set must contain the following variables:

FMTNAME specifies a character variable whose value is the format or informat name.
START specifies a variable that gives the range's starting value.
LABEL specifies a variable whose value is associated with a format or an informat.

For details about input and output control data sets, see the “FORMAT Procedure” section of Base SAS® 9.4 Procedures Guide, Seventh Edition.

Create a Numeric Format

The following simple example using the CNTLIN option creates a numeric format named respf:

 data test;                                         
    input response desc $20.;                       
 1  Strongly Disagree                               
 2  Disagree                                        
 3  Neutral                                         
 4  Agree                                           
 5  Stongly Agree                                   
 data crfmt;                                        
    set test;                                       
 proc format library=work cntlin=crfmt fmtlib;      
    select respf;                                   

Controlling Your Formats

Reveal Data Set Variables

To see the other variables that are included in data sets created by the CNTLIN and CNTLOUT options, use CNTLOUT to create a data set for the respf format created above:

 proc format library=work cntlout=outfmt;       
    select respf;                               
 proc print data=outfmt;                        

Add Additional Ranges

To add another range to the respf format, you can use DATA step processing with the data set created by the CNTLOUT option. Then, re-create the format using the CNTLIN option:

data infmt;                                               
    set outfmt end=last;                                   
    if last then do;                                       
       HLO='O';  /* indicates a special other range  */      
 proc format library=work cntlin=infmt fmtlib;             
    select respf;                                          

Convert a State Name to Its Postal Abbreviation

One use for the CNTLIN option is to create a format that converts a state name to its 2-letter postal abbreviation. For example, this option can convert 'North Carolina' to 'NC'.  Because SAS does not have a function or format to convert state names to postal abbreviations, this is an excellent use of the CNTLIN option.

We can use data from the SASHELP.ZIPCODE data set to create a user-defined format using the CNTLIN option, as shown below:

proc sql noprint;                               
    create table crfmt as                        
    select distinct statename as start,          
           statecode as label,                   
           '$mystate' as fmtname                 
    from sashelp.zipcode;                        
 proc format library=work cntlin=crfmt fmtlib;   
    select $mystate;                             

Identify State Capitals

In a similar manner, we can use the MAPS.USCITY data set to create a user-defined format that identifies state capitals from the 2-letter state abbreviation. See the sample code and partial results below:

proc sql noprint;                                 
   create table crfmt as                          
   select distinct statecode as start,            
          city as label,                          
          '$mycity' as fmtname                    
   from maps.uscity                               
   where capital='Y';                             
proc format library=work cntlin=crfmt fmtlib;     
   select $mycity;                                

Use External Data Sources

You can gather information from external data sources and read that information into a data set created by the CNTLIN option to create user-defined formats.

The following example uses ICD10 medical diagnosis codes. I downloaded a list of ICD10 codes and their descriptions into a Microsoft Excel file from the Center for Medicare & Medicaid Services website. Then, I created a user-defined format from the first 25 records:
Note: You can also download the codes as a text file.

/* This code reads in the Excel file.   */                                                                  
proc import out==myicd10                                              
   datafile= "C:\Section111ValidICD10-2017.xlsx"   
   dbms=excelcs replace;                                                
   range="'Valid ICD10 2017 & NF Exclude$'";                            
data crfmt;                                         
   set myicd10 (obs=25);                         
title1 'ICD10 Format';                                                      
title3 'FMTLIB results only display the first 40 characters of the label';  
proc format library=work cntlin=crfmt fmtlib;       
   select $myicd;                                   

A more complicated example that uses other data set variables created by the CNTLIN option is included in the linked sample program in Sample 47312: “Create a user-defined format containing decile ranges from PROC UNIVARIATE results.”

If you can think of a scenario in which the CNTLIN format would be helpful, give it a try. If you have questions, you can ask via  SAS Communities or contact us in SAS Technical Support.

Controlling your formats was published on SAS Users.

12月 202016

Joining tables with PROC FORMAT

The title of this post borrows from Stanley Kubrick’s 1964 comedy “Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb.” It stars the great Peter Sellers as the titular character as well as George C. Scott and Slim Pickens. The black and white film is strange and brilliant just like Kubrick was. Similarly, as I was experiencing the issue I outline below and was told of this solution, I thought two things. The first was “what a strange but brilliant solution” and the second one was “I’ll try anything as long as it works.”   Thus, a title was born. But enough about that. Why are we here?


You want to add a couple of columns of information to your already large dataset, but each time you try to join the tables you run out of memory!  For example, you want to append latitude and longitude values from Table B to an existing list of customer phone numbers in Table A.

You’ve tried this and got nowhere fast:

proc sort data = demo.tablea;
by npa nxx;
proc sort data = demo.tableb;
by npa nxx;
data demo.aunionb;
merge demo.tablea (in=a) demo.tableb (in=b);
by npa nxx;
if a;

And then you tried this and also got nowhere (albeit a little slower):

proc sql;
   	create table demo.aunionb as 
   	select *,
	from demo.tablea a
left join demo.tableb b on (a.npa = b.npa) and (a.nxx = b.nxx);

Solution - Joining tables with PROC FORMAT


Here’s how:

First, take Table B and create character equivalents of the fields required in your join (assuming they aren’t characters already). In this example, NPA and NXX are the two fields that you are joining on. They will be your key once you concatenate them.  Next, create character equivalents of the fields that you want appended.

data work.tableb (keep = npa_nxx--nxx_c); 
set demo.tableb; 
npa_c = compress(put(npa, best10.));
nxx_c = compress(put(nxx, best10.));
npa_nxx = catx('_',npa_c, nxx_c);
lat_c = compress(put(latitude, best14.3)); 
long_c = compress(put(longitude, best14.3)); 

Next, make sure that you have only unique values of your key. Use PROC SORT with OPT=noduprecs turned on.

Now, create a table that will be used as the input into PROC FORMAT. In this example, you are creating a table that will contain the formats for the latitude column.

proc sort data = work.tableb noduprecs;
by npa_nxx;
data demo.tableb_lat_fmt(keep=fmtname type start label); 
retain fmtname 'lat_f' type 'C'; 
set work.tableb; 
if npa_nxx = '._.' then start = 'Other  ';
else start = npa_nxx; 
label = lat_c; 
proc sort data = demo.tableb_fmt;
by start;

This step creates a table that includes the format name (lat_f), the format type (C), the key field (start) and its corresponding latitude value (label).  Sort this table by the ‘start’ column and then repeat this step for every column you wish to append, with each column getting its own unique format and table.

Now run PROC FORMAT using the CNTLIN option pointing to the tables that you just created in order to create your format.

proc format cntlin=demo.tableb_lat_fmt; 
proc format cntlin=demo.tableb_long_fmt; 

Now all you have to do is run your data step to create the resultant dataset with the appended values.

data demo.aunionb (drop = npa_nxx); 
set demo.tablea; 
npa_nxx = catx('_',compress(put(npa,best10.)),compress(put(nxx, best10.)));
latitude = input(put(npa_nxx, $lat_f.), BEST.); 
longitude = input(put(npa_nxx, $long_f.), BEST.);

This step creates 3 columns: npa_nxx, latitude, and longitude. Npa_nxx is the key built from the NPA  and NXX values. Latitude and longitude are then populated with the formatted value of npa_nxx, which in this case is the character equivalent of the original latitude or longitude. It also formats the value back into a numeric field.

The result is a clever way to add columns to a dataset, much like a VLOOKUP function works in Microsoft Excel, without the hassle of running out of memory space.


  1. The author realizes there are other, more boring ways of tackling this issue like indexing and using WHERE statements, but where’s the fun in that?
  2. This solution may not be right for you. See your doctor if you experience any of the following symptoms:  memory loss, headache, frustration, Cartesian rage, cranial-keyboard embedment or memory loss.
tags: Global Technology Practice, Joining tables, PROC FORMAT, SAS Programmers, tips & techniques

Dr. Strangeformat or: How I Learned to Stop Joining Tables and Love the PROC was published on SAS Users.

11月 152016

Rick Wicklin showed us how to visualize the ages of US Presidents at the time of their inaugurations. That's a pretty relevant thing to do, as the age of the incoming president can indirectly influence aspects of the president's term, thanks to health and generational factors.

As part of his post, Rick supplied the complete data set for US Presidents and their birthdays. He challenged his readers to create their own interesting visualizations, and that's what I'm going to do here. I'm going to show you the distribution of US Presidents by their astrological signs.

Now, you might think that "your sign" is not as relevant of a factor as Age, and I certainly hope that you're correct about that. But past presidents have sought the advice of astrologers, and zodiac signs can influence the counsel such astrologers might offer. (Famously, Richard Nixon took advice from celebrity psychic Jeane Dixon. First Lady Nancy Reagan also sought her advice, and we know that Mrs. Reagan in turn influenced President Reagan.)

Like any good analyst, I mostly reused existing work to produce my results. First, I used the DATA step that Rick provided to create the data set of presidents and birthdays. Next, I reused my own work to create a SAS format that displays a zodiac sign for each date. And finally, I wrote write a tiny bit of PROC FREQ code to create my table and frequency plot.

data signs;
 /* So this column appears first */
 retain President;
 length sign 8;
 /* SIGN. format created earlier with PROC FORMAT */
 format sign sign.;
 set presidents (keep=President BirthDate InaugurationDate);
 /* convert birthday to our normalized SIGN date */
 sign = mdy(month(birthdate),day(birthdate),2000);
ods graphics on;
proc freq data=signs order=freq;
tables sign / plots=freqplot;

To keep things a bit fresh, I did all of this work in SAS University Edition using the Jupyter Notebook interface. Here's a glimpse of what it looks like:

And here's the distribution you've all been waiting to see. When he takes office, Donald Trump will join George H. W. Bush and JFK in the Gemini column.

I've shared the Jupyter Notebook file as a public gist on GitHub. You can download and import into your own instance if you have SAS and Jupyter Notebook working together. (Having trouble rendering the notebook file? Try looking at it through the nbviewer service. That usually works.)

tags: Jupyter, proc format, SAS University Edition, zodiac

The post Zodiac signs of US Presidents appeared first on The SAS Dummy.

6月 112014

Now, the Star-Belly Sneetches
Had bellies with stars.
The Plain-Belly Sneetches
Had none upon thars.
- from "The Sneetches", by Dr. Seuss

Recently a user on the SAS-L mailing list had this challenge: "I would like to display stars in a table (created by PROC REPORT) based on variable values. For example, if value=3, then display five stars with 3 in yellow, and the other two in grey."

In her original post, the user had sketched out an approach that used a custom SAS format with uppercase and lowercase Xs representing the desired "star" states. It turns out that this was very close to a working approach. All she needed to do was apply the same technique that I used to add Harvey Balls to a SAS report. Like the Fix-It-Up Chappie, I'm here to offer the solution.

In my Harvey Balls example, I used ODS ESCAPECHAR and Unicode character codes to add extended characters to my SAS format labels. There is a Unicode character for a star (HEX 2605), so the same approach can work. In addition, the ESCAPECHAR sequence can set off style instructions, such as a color definition. This allows you to control the character color "mid-stream". For example, this sequence specifies a red star followed by a gray star:

 ~{style [color=red] ~{unicode '2605'x}} ~{style [color=ygr] ~{unicode '2605'x}}

As it happens, I keep some movie rating data that I downloaded from my Netflix account history (a very rich source of trivial SAS examples). When I apply a custom "star" format to my data, I can produce a report like this:
Here's my SAS format and the PROC PRINT code that references it. The format and report look perfect in HTML, PDF and RTF output:

/* ODS EXCAPECHAR needed to set style/unicode cues */
ods escapechar='~';
/* Captured these in macro variables for readability and */
/* easy maintenance                                      */
%let graystar = ~{style [color=ygr] ~{unicode '2605'x}};
%let redstar =  ~{style [color=red] ~{unicode '2605'x}};
/* using a 4-star system where 1 means "zero stars" */
/* and 5 means the full "4 stars"                   */
proc format lib=work;
value stars
 1 = "&graystar.&graystar.&graystar.&graystar."
 2 = "&redstar.&graystar.&graystar.&graystar." 
 3 = "&redstar.&redstar.&graystar.&graystar." 
 4 = "&redstar.&redstar.&redstar.&graystar." 
 5 = "&redstar.&redstar.&redstar.&redstar." 
title "Movie titles with STARS on THARS";
proc print data=work.movies noobs;
  format rating stars.;

If you want to try it yourself, you can download my program with test data:

>> Complete SAS program with test data

This program should work in SAS Display Manager, SAS Enterprise Guide (select HTML output), and even the SAS University Edition (SAS Studio).

tags: ods escapechar, proc format
11月 022012
This week's SAS tip is from Ron Cody and his book Cody's Data Cleaning Techniques Using SAS, Second Edition. Ron is the popular author of several bestselling SAS books and has been a SAS user since 1977. Visit Ron's author page for lots of bonus content, including a free chapter from his [...]
3月 212012
learn proc format:

data test;
input start $ end $ label;
nextag nextag 1
#sina #sina 2
#1000 #1000 3

data fmt0;
retain fmtname'$testfmt' ;
set test end=last;
if last then do;

proc print data=fmt0;

proc format cntlin=fmt0;
select $testfmt;

data test;
input test $20.;

data test_result;
set test;
id=put(test, $testfmt.)+0;

proc print data=test_result;

The output is:

Obs test id

1 nextag 1
2 #sina 2
3 &ok 0
4 #1000 3
5 wokong 0

 Posted by at 2:04 下午
3月 162012
The question is: after calculate the percentage, we want to count from how many obs are in 0-1%, how many in 1%-2%, ..., how many in 99%-100%. For this simple example, we can use round or floor or ceil to get the result. But here shows how to use proc format to get it.

After we get cumulative percentage, we format the percentage with put function. Take care in the proc format don't forget to add hlo, otherwise it will gives error since the first start number is missing.

data test;
do i=1 to 1000;

data test;
set test;

proc sql;
create table test as
select x, y, y/max(y) as pct
from test;

data test2;
do i=1 to 100;

data a;
set test2;
if _n_=1 then hlo='L';

proc format cntlin=a;
select fmttestf;

data final;
set test;
rank=put(pct, fmttestf.)+0;

proc print data=final;

proc sql;
select rank, count(1) as cnt
from final
group by rank
order by rank;

 Posted by at 2:12 下午
9月 142011

Rick Wicklin and I are engaged in an arms race of birthday-related blog posts.  To recap:

Now I have no choice but to respond again.  This isn't my fault. I didn't start this.

Today, I'm going to take the data that Rick supplied and attempt to answer the deep scientific question, "What is our (zodiac) sign?"

You might as well ask the ever-reliable desktop scientist, the Magic 8-Ball: "Reply hazy, try again."  As you can see from the PROC FREQ output below, there doesn't appear to be a clear dominant horoscope that might be influencing our collective fate.

Attributes of Virgo (meticulous and reliable) and Taurus (warm-hearted and loving) are definitely reflected in our corporate culture, but I'm not sure that these signs are concentrated enough within our population to affect that.  Perhaps the most useful thing that came out of this exercise is my user-written SAS format that equates birthdates to signs of the zodiac.  Here's the complete program, which you ought to be able to run as-is from SAS 9.2 or later, or in SAS Enterprise Guide.

filename bdays url 
  /* behind a corporate firewall? don't forget the PROXY= option here */
/* SAS format for zodiac signs in a given year */
proc format lib=work;
	value sign
	'21Mar2000'd - '19Apr2000'd = 'Aries'
	'20Apr2000'd - '20May2000'd = 'Taurus'
	'21May2000'd - '20Jun2000'd = 'Gemini'
	'21Jun2000'd - '22Jul2000'd = 'Cancer'
	'23Jul2000'd - '22Aug2000'd = 'Leo'
	'23Aug2000'd - '22Sep2000'd = 'Virgo'
	'23Sep2000'd - '22Oct2000'd = 'Libra'
	'23Oct2000'd - '21Nov2000'd = 'Scorpio'
	'22Nov2000'd - '21Dec2000'd = 'Sagittarious'
	/* split Capricorn to make two valid ranges */
	/* that don't span the calendar boundary */
	'22Dec2000'd - '31Dec2000'd = 'Capricorn'
	'01Jan2000'd - '19Jan2000'd = 'Capricorn'
	'20Jan2000'd - '18Feb2000'd = 'Aquarius'
	'19Feb2000'd - '20Mar2000'd = 'Pisces'
	other = 'Unknown';
data bdays;
	infile bdays dsd firstobs=2;
	input mon day;
	length birthdate 8 birthsign 8;
	format birthdate date5.;
	format birthsign sign.;
	label birthsign="Zodiac sign";
	/* make sure we pick a leap year, so 29Feb is valid */
	birthdate = mdy(mon,day,2000);
	birthsign = birthdate;
ods graphics on / height=400 width=800;
title "What's our sign?";
ods noproctitle;
proc freq data=bdays order=data;
	tables birthsign /plots=freqplot(scale=percent);
/* clear the filename */
filename bdays;

I'm sure that the question has occurred to you: what about my Facebook friends?  How are they distributed among the stars?  Of those that report their birthdays, here is how they fall:

As you can see, there really isn't a dominant sign among them.  However, Aquarius might be a bit underrepresented.  That's a shame, because I could probably use more honest and loyal people among my friends.

tags: facebook, magic 8-ball, proc format, proc freq, zodiac