Just for Fun

9月 302011

Birds migrate south in the fall. Squirrels gather nuts. Humans also have behavioral rituals in the autumn. I change the batteries in my smoke detectors, I switch my clocks back to daylight standard time, and I turn the mattress on my bed. The first two are relatively easy. There's even a mnemonic for changing the clocks to and from daylight savings time: "spring forward, fall back."

Man-handling a heavy, queen-size mattress can be a challenge, but it is important so that your mattress wears evenly. There are four positions that a mattress can be in, and it is important to use all four positions over a two-year cycle. I use a simple mnemonic to ensure that I alternate between all four positions: "spring spin, fall flip."

How did I develop this mnemonic? By using matrices, of course!

Mattress positions

There are three ways to turn a mattress:

  • The rotation (R): This is the easiest maneuver, and it requires only one person. It doesn't flip the mattress, but spins it on the box spring so that the head becomes the foot. (When an airplane rotates along this axis, the movement is called "yaw.")
  • The horizontal flip (H): This is the easiest way to flip a mattress from one side to the other. It requires that you turn the mattress along its long axis. (When an airplane rotates along this long axis, the movement is called "roll.")
  • The vertical flip (V): This is the hardest way to flip a mattress, and might be impossible if you have low ceilings! It requires that you turn the mattress along its short axis. (When an airplane rotates along this short axis, the movement is called "pitch.")

Fortunately, it turns out that you NEVER need to use the vertical flip! You can get even wear on your mattress by alternating the two easier moves. In the spring, use the rotation; in the fall, use the horizontal flip. "Spring spin, fall flip." Repeat this every year and your mattress will mathe-magically alternate between all four possible positions.

The matrices of mattresses

Rotations can be represented by the actions of matrices. Set up a coordinate axis in the middle of your mattress so that the y axis points towards the head of the bed and the z axis points toward the ceiling. Then the rotation (R) sends a vector (x, y, z) to the new vector (-x, -y, z). Similarly, the horizontal flip (H) send (x, y, z) to (-x, y, -z), and the vertical flip (V) send (x, y, z) to (x, -y, -z). This means that the three rotations can be represented by the following matrices:

proc iml;
/* R: rotation about the z axis = rotate 180 degrees in the (x,y)-plane */
R = {-1  0  0,
      0 -1  0,
      0  0  1};
/* H: rotation about the y axis = rotate 180 degrees in the (x,z)-plane */
H = {-1  0  0,
      0  1  0,
      0  0 -1};
/* V: rotation about the x axis = rotate 180 degrees in the (y,z)-plane */
V = { 1  0  0,
      0 -1  0,
      0  0 -1};

These matrices have a few interesting properties. The first is that they are their own inverses: H*H = V*V = R*R = I. The second is that you can use any two to obtain the third. In particular, R*H=V. This is great, because it means that you don't ever need to attempt the difficult vertical flip, you can use the product of simpler rotations to achieve the same result.

A previous mattress article

When I started writing this article, I searched the Internet for an illustration that I could use. I discovered a wonderful article that contains some excellent illustrations and much of the same content. The author is my fellow Cornellian, Steve Strogatz, and he wrote it for the New York Times . (By the way, if you haven't read Strogatz's book, Synch, I highly recommend it. He also gave an entertaining 20-minute TED talk on the subject of "synch.")

The Strogatz article illustrates the mattress rotations with the following colorful diagram, which shows the four possible positions of a mattress and how each rotation takes you from one position to another:

Suppose that your mattress is initially in the position in the upper left corner. Let's see what happens if you follow "spring spin, fall flip" for two consecutive years. That is, you rotate the mattress according to the sequence R, followed by H, followed by R, followed by H:

  1. After the first "spring spin," the mattress position is as shown in the lower right of the diagram.
  2. A "fall flip" moves the mattress into the position showed in the lower left.
  3. After a second "spring spin," the mattress position is as shown in the upper right.
  4. A second "fall flip" returns the mattress to its original position.

These matrix representations are not the only ones possible. You can also represent the rotations by using permutation matrices, which I've discussed in a previous article on gift-giving among family members. A permutation matrix acts on the vector (1,2,3,4) and maps the four positions of the mattress to each other. One permutation matrix is H = {0 1 0 0, 1 0 0 0, 0 0 0 1, 0 0 1 0}; I'll leave the construction of the R and V permutation matrices as an exercise.

tags: Just for Fun, Matrix Computations
9月 292011

I previously wrote about an intriguing math puzzle that involves 5-digit numbers with certain properties. This post presents my solution in the SAS/IML language.

It is easy to generate all 5-digit perfect squares, but the remainder of the problem involves looking at the digits of the squares. For this reason, I converted the set of all 5-digit numbers into an n x 5 array. I used the PUTN function to convert the numbers to strings, and then used the SUBSTR function to extract the first, second, third, fourth, and fifth digits into columns. (I then used the NUM function to change the character array back into a numeric matrix, but this is not necessary.)

The solution enables me to highlight three new functions in SAS 9.3:
  • The ELEMENT function enables you to find which elements in one set are contained in another set. I use this function to get rid of all 5-digit perfect squares that contain the digits {6,7,8,9,0}.
  • The ALLCOMB function generates all combinations of n elements taken k at a time. After I reduced the problem to a set of nine 5-digit numbers, I used the ALLCOMB function to look at all triplets of the candidate numbers.
  • The TABULATE subroutine computes the frequency distribution of elements in a vector or matrix. I used this subroutine to check the frequency of the digits in the triplets of numbers.

Here is my commented solution:

proc iml;
/* generate all 5-digit squares */
f0 = ceil(sqrt(10000));
f1 = floor(sqrt(99999));
AllSquares = T(f0:f1)##2;
/* convert to (n x 5) character array */
c = putn(AllSquares,"5.0");
m = j(nrow(c), 5," ");
do i = 1 to 5;   m[,i] = substr(c,i,1);  end;
m = num(m); /* convert to (n x 5) numerical matrix */
/* The numbers are clearly 1,2,3,4,5, since there 
   are 15 digits and each appears a unique number of times.
   Get rid of any rows that don't have these digits. */
bad = (6:9) || 0;
b = element(m, bad);   /* ELEMENT: SAS/IML 9.3 function */
hasBad = b[ ,+];       /* sum across columns */
g = m[loc(hasBad=0),]; /* only nine perfect squares left! */
/* Look at all 3-way combinations */
k = allcomb(nrow(g),3);/* ALLCOMB: SAS/IML 9.3 function */
SolnNumber=0;          /* how many solutions found? */
do i = 1 to nrow(k);
   soln = g[k[i,], ]; /* 3x5 matrix */
   /* The frequencies of the digits should be 1,2,3,4,5
      and the freq of a digit cannot equal the digit */
   call tabulate(levels, freq, soln); /* /* TABULATE: SAS/IML 9.3 function */
   if ncol(unique(freq))=5 then do;   /* are there five freqs? */
      if ^any(freq=(1:5)) then do;    /* any freq same as digit? */
         SolnNumber = SolnNumber+1;
         print "********" SolnNumber "********",
         soln[r={"Square1" "Square2" "Square3"}], freq;

At first, I didn't understand the last clue, so I printed out all seven triplets of numbers that satisfied the first five conditions. When I looked at the output, I finally made sense of the last clue, which is "If you knew which digit I have used just once, you could deduce my three squares with certainty." This told me to look closely at the FREQ vectors in the output. Of the seven solutions, one has frequency vector {3 5 4 2 1}, which means that the 1 digit appears three times, the 2 digit appears five times, and so on to the 5 digit, which appears once. In all of the other solutions, it is the 3 digit that appears once. Therefore, there is a unique solution in which the 5 digit appears only one time. The solution is as follows:

The SAS/IML language gave me some powerful tools that I used to solve the math puzzle. I'm particularly pleased that I only used two loops to solve this problem. I was able to vectorize the other computations.

Can you improve my solution? Use the comment to post (or link to) your program that solves the problem. The original post on the SAS Discussion Forum includes other ways to solve the problem in SAS.

tags: Just for Fun, Statistical Programming
9月 272011

I was intrigued by a math puzzle posted to the SAS Discussion Forum (from New Scientist magazine). The problem is repeated below, but I have added numbers in brackets because I am going to comment on each clue:

[1] I have written down three different 5-digit perfect squares, which [2] between them use five different digits. [3] Each of the five digits is used a different number of times, [4] the five numbers of times being the same as the five digits of the perfect squares. [5] No digit is used its own number of times. [6] If you knew which digit I have used just once, you could deduce my three squares with certainty.

What are the three perfect squares?

I solved the problem by using the SAS/IML language. If you want to try solving the problem yourself without further hints, don't read any further!

Deciphering the clues

I found the clues difficult to understand, so I'll give an example for each clue:

  1. You are looking for three five-digit squares, such as 12321 = (111)2
  2. There are five distinct digits among the 15 digits in the three numbers. For example, the numbers can't be (10000, 10201, 10404), because there are only four unique digits in these numbers: 0, 1, 2, and 4.
  3. The frequencies of the five unique digits are distinct. Because there are 15 digits in the three numbers, we immediately know that the distribution of frequencies is 1, 2, 3, 4 and 5. So, for example, the numbers can't be (12321, 12544, 13225) because although there are five 2s and four 1s, no digit appears three times.
  4. The five frequencies are the same as the five digits. Therefore, the five digits are 1, 2, 3, 4, and 5. For example, 57121 cannot be one of the numbers, because it contains a 7 among its digits.
  5. The phrase "no digit is used its own number of times" baffled me. I finally figured out that it means that the digit 1 appears more than one time, the digit 2 appears either once or more than twice, the digit 3 does not appear exactly three times, and so forth.
  6. At first I didn't realize that the last sentence is also a clue. The person who posted the puzzle said he found seven solutions, but the puzzle implies that there is a unique solution. The last sentence means that if you look at the frequency distribution of the digits among the seven potential solutions, only one of them is not a repeat of another. For example, the solution is not (12321, 244521, 355225) because the 3 digit appears once, but there is also another (potential) solution for which the 3 digit appears once.

When I wrote the SAS/IML program that solves the problem, I was pleased to discover that I had used three functions that are new to SAS/IML 9.3! Since the entire program is less than 25 lines long, that's a pretty good ratio of new functions to total lines. The three new functions I used are as follows:

  • The ELEMENT function enables you to find which elements in one set are contained in another set. I use this function to get rid of all 5-digit perfect squares that contain the digits {6,7,8,9,0}.
  • The ALLCOMB function generates all combinations of n elements taken k at a time. After I reduced the problem to a set of nine 5-digit numbers, I used the ALLCOMB function to look at all triplets of the candidate numbers.
  • The TABULATE subroutine computes the frequency distribution of elements in a vector of matrix. I used this subroutine to apply rules 4 and 5.

You can use this post to ask questions or to clarify the directions. I will post my solution on Thursday. You can post your own solution as a comment to Thursday's post. DATA step and PROC SQL solutions are also welcome. Good luck!

tags: Just for Fun
7月 192011
Yesterday, Jiangtang Hu did a frequency analysis of my blog posts and noticed that there are some holidays on which I post to my blog and others on which I do not.

The explanation is simple: I post on Mondays, Wednesdays, and Fridays, provided that SAS Institute (World Headquarters) is not closed. Because I am absent-minded, I wrote a little SAS/IML program that tells me when I am supposed to post. Notice the use of the HOLIDAY function in Base SAS in order to figure out the dates of certain holidays.

proc iml;
date = today(); /** get today's date **/
/** is today M, W, or F? **/
MWF = any( putn(date, "DOWName3.") = {"Mon" "Wed" "Fri"} );

/** Is SAS closed today?**/
/** Find SAS holidays that might occur Mon-Fri **/
year = year(date);
Float = holiday({Labor Memorial NewYear USIndependence}, year);
/** Thanskgiving or day after **/
TDay = holiday("Thanksgiving", year) + (0:1); 
/** Christmas or week after **/
WinterBreak = holiday("Christmas", year) + (0:7);
SASHolidays = Float || TDay || WinterBreak;
SASClosed = any(date = SASHolidays);

/** should Rick post to his blog today? **/
if MWF & ^SASClosed then 
   print "Rick posts a blog today.";
   print "Rick is working on a post for some other day.";

Readers are also welcome to run this program to determine whether a post is scheduled or not.

Kidding. Just kidding.

6月 132011
My primary purpose in writing The DO Loop blog is to share what I know about statistical programming in general and about SAS programming in particular. But I also write the blog for various personal reasons, including the enjoyment of writing.

The other day I encountered a concept on Ajay Ohri's Decision Stats blog that made me reflect on what I, personally, gain from blogging. The concept is the Johari window, which is a self-assessment technique in cognitive psychology. The following image is taken from Peter Dorrington's article about "unknown unknowns and risk":

By looking at the Johari window, I realized that blogging helps me to become more aware of what I know and what I don't know. The Johari window framework also summarizes the process that I use to write this blog. Long-time readers know that I post three articles a week according to the following schedule:

  • On Mondays, I post Getting Started articles. These articles correspond to the upper right quadrant of my Johari window. They represent topics that I know that I know. I exploit this knowledge to get out a quick article that requires minimal effort.
  • On Wednesdays, I post articles on a variety of topics in statistical programming such as sampling and simulation and efficient SAS programming. These articles often correspond to the lower left quadrant of my Johari window. They represent topics that I am trying to learn. Usually, I am not an expert on these topics, so I risk making a fool of myself. However, blogging gives me an opportunity to share what little I know and it motivates me to get it right. I often experiment with several approaches before I feature one in my blog.
  • On Fridays, I like to post articles about data analysis. These articles correspond to the upper left quadrant of my Johari window. They are often inspired by reading other blogs or by having a robust curiosity about topics such as "What Topics Appear in The Far Side Cartoons?" Even after I explore the data and blog about it, I am aware that there is more that could be said.

What about the lower right quadrant? That comes into play when I am searching for something to blog about. In the past 15 years, I've written and saved thousands of SAS and SAS/IML programs, demos, examples, test programs, presentations, and papers. These are scattered in dozens of directories on my computer. Sometimes I'll stumble upon a program I wrote ten years ago and think, "That's pretty clever; I should write a blog about this!" My challenge is to find these gems that I have forgotten about—to rediscover and expose what I once knew.

My goal is to become the best statistical programmer and data analyst that I can be, and to help other SAS programmers do the same. Blogging helps by making me keenly aware of what I know and what I don't know.

4月 132011
I have recently returned from five days at SAS Global Forum in Las Vegas. A riffle shuffle|Source=Flickr [http://www.flickr.com/photos/latitudes/66424863/in/set-1442169/] |Date= November 2005 - April 2006 |Author= Todd Klassy [http://www.flickr.com/photos/latitudes/] |Permission=CC-by 2.0 On the way there, I finally had time to read a classic statistical paper: Bayer and Diaconis (1992) describes how many shuffles are needed to randomize a deck of cards. Their famous result that it takes seven shuffles to randomize a 52-card deck is known as "the bane of bridge players" because the result motivated many bridge clubs to switch from hand shuffling to computer generated shuffling. Casual bridge players also blame this result for "slowing down the game" while the cards are shuffled more times than seems intuitively necessary.

In the second paragraph of the paper, Bayer and Diaconis introduce a "mathematically precise model of shuffling," which is known as the Gilbert-Shannon-Reeds (GSR) model. This model is known to be a "good description of the way real people shuffle real cards." (See Diaconis (1988) and the references at the end of this article.) This article describes how to implement GSR shuffling model in SAS/IML software.

The Riffle Shuffle

Computationally, you can shuffle a deck by generating a permutation of the set 1:n, but that is not how real cards are shuffled.

The riffle (or "dovetail") shuffle is the most common shuffling algorithm. A deck of n cards is split into two parts and the two stacks are interleaved. The GSR algorithm simulates this physical process.

The GSR model splits the deck into two pieces according to the binomial distribution. Each piece has roughly n/2 cards. Then cards are dropped from the two stacks according to the number of cards remaining in each stack. Specifically, if there are NL cards in the left stack and NR cards in the right stack, then the probability of the next card dropping from the right stack is NR / (NR + NL).

The following SAS/IML module is a straightforward implementation of the GSR algorithm:

proc iml;
/** Gilbert-Shannon-Reeds statistical 
    model of a riffle shuffle. Described in 
    Bayer and Diaconis (1992) **/
start GSRShuffle(deck);
   n = nrow(deck);
   /** cut into two stacks **/
   nL = rand("Binomial", 0.5, n);
   nR = n - nL;

   /** left stack, right stack **/
   L = deck[1:nL]; R = deck[nL+1:n];
   j = 1; k = 1; /** card counters **/
   shuffle = j(n,1); /** allocate **/
   do i = 1 to n;
      c = rand("Bernoulli", nR/(nL+nR)); 
      if c=0 then do;  /** drop from left **/
        shuffle[i] = L[j];
        nL = nL-1;  j=j+1; /** update left **/
      else do;  /** drop from right **/
        shuffle[i] = R[k];
        nR = nR-1;  k=k+1; /** update right **/

Testing the GSR Algorithm

You can test the algorithm by starting with a deck of cards in a known order and observing how the cards are mixed by consecutive riffle shuffles. The following statements riffle the cards seven times and store the results of each shuffle. To save space, a 20-card deck is used. The original order of the cards is denoted 1, 2, 3, ..., 20.

call randseed(1234);
n = 20; /** n=52 for a standard deck **/
deck = T(1:n);

s = j(n, 8); /** allocate for 8 decks **/
s[,1] = deck; /** original order **/
do i = 1 to 7;
   s[,i+1] = GSRShuffle( s[,i] );
names = "s0":"s7";
print s[colname=names];

    s0  s1  s2  s3  s4  s5  s6  s7

     1   1   1   1   1   1  18   2
     2   2  14  14  14  11   1  13
     3  12   2   2   4  14  11  12
     4   3   6   6   2   4  14  18
     5  13  12   3  12   2  19   1
     6   4  15  16  15  12   7  11
     7   5   7  13  17  15   3  14
     8  14   8   4  10  17   4  15
     9   6   9  12   6  10  16  19
    10  15   3  15  18   6   2   8
    11   7  16  17   7  18  13   7
    12   8  13  10   3  19  12   3
    13   9   4  18  16   7  15   9
    14  16  17   7  13   3   8   4
    15  17  10   8   8  16   9  17
    16  10  18   5   5  13  17  20
    17  18   5  11  11   8  20  16
    18  11  11  19  19   9  10  10
    19  19  19   9   9  20   5   5
    20  20  20  20  20   5   6   6

It is interesting to study the evolution of mixing the cards:

  • For the first shuffle, the original deck is cut into two stacks (1:11 and 12:20) and riffled to form the second column. Notice that usually one or two cards are dropped from each stack, although at one point three cards (7, 8, 9) are dropped from the left stack.
  • The second cut occurs at card 14, which is the eighth card in the second column. After the second riffle, notice that the cards 7, 8, and 9 are still consecutive, and the first and last cards are still in their original locations. Six cards (30%) are still in their initial positions in the deck.
  • The pair 7 and 8 is not separated until the fourth shuffle.
  • The last card (20) does not move from the bottom of the deck until the fifth shuffle.
  • The first card (1) does not move from the top of the deck until the sixth shuffle.

The Efficiency of the GSR Algorithm

As far as efficiency goes, the GSRShuffle module that I've implemented here is not very efficient. As I've said before, the SAS/IML language is a vector language, so statements that operate on a few long vectors run much faster than equivalent statements that involve many scalar quantities.

This implementation of the shuffling algorithm is not vectorized. Unfortunately, because the probability of a card dropping from the left stack changes at every iteration, there is no way to call the RANDGEN function once and have it return all n numbers required to simulate a single riffle shuffle.

Or is there? Perhaps there is an equivalent algorithm that can be vectorized? Next week I'll present a more efficient version of the GSR algorithm that does not require an explicit loop over the number of cards in the deck.


D. Bayer and P. Diaconis (1992), "Trailing the Dovetail Shuffle to Its Lair", Annals of Applied Probablity 2(2) 294-313
P. Diaconis (1988), Group Representations in Probability and Statistics. IMS, Hayward, CA.
E. Gilbert (1955) "Theory of Shuffling," Technical memorandum. Bell Laboratories.
J. Reeds (1981), Unpublished manuscript.
4月 082011
At the beginning of 2011, I heard about the Dow Piano, which was created by CNNMoney.com. The Dow Piano visualizes the performance of the Dow Jones industrial average in 2010 with a line plot, but also adds an auditory component. As Bård Edlund, Art Director at CNNMoney.com, said,
The daily trading of the Dow Jones industrial average determines the songwriting here, translating the ups and downs of 2010's market into musical notes. Using a five-note scale spanning three octaves, pitch is determined by each day's closing level.

When I first saw the Dow Piano, I immediately thought, "I can do that in SAS/IML Studio by using the SOUND call in the SAS/IML language!"

To be fair, I can't fully duplicate the Dow Piano in SAS/IML software. The SOUND call in SAS is very simple: it only provides pitch and duration. The music for the Dow Piano is more complex:

  • It uses piano notes, which involves sound characteristics such as attack and decay.
  • It uses dynamics to represent trading volume: high-volume days are represented by loud notes, low-volume days by soft notes.
  • It overlays a percussion track so that you hear a percussive beat on days that the stock market is closed.
Nevertheless, I think the SAS/IML version captures the main idea by allowing you to listen to the performance of the Dow Jones industrial average. (Get it? The performance?)

Will this kind of visualization (auditorization? sonification?) catch on? Time will tell, but for these data, the sound doesn't add any new information; it merely uses the sense of sound to repeat information that is already in the line plot. In fact, the sound present less information than the line plot: there are more than 250 closing values of the Dow Jones industrial average, but the Dow Piano collapses these values into 15 notes (three octave of a five note scale).

But enough analysis! Listen to the Way of the Dow. The demo starts at 0:45, after I introduce the problem.

4月 012011
Today, SAS, the leader in business analytics announces significant changes to two popular SAS blogs, The DO Loop (written by Rick Wicklin) and The SAS Dummy (previously written by Chris Hemedinger).

The two blogs will be merged into a single blog, called The SAS Smarty: A Blog for an Elite Few. The blog, which will be written by Wicklin, will debut next week.

Ima Phül, director of SAS Brand Preservation, says the change is needed.

"Like many companies, SAS continuously monitors the way its brand is perceived by customers," says Phül. "By using SAS Social Media Analytics and SAS Web Analytics, my team was able to identify several problems in the way that the SAS brand is perceived by customers."

Particularly troubling was the high correlation between the terms "SAS" and "dummy," and the terms "SAS" and "dogfood".

"Thanks to SAS Social Media Analytics, Chris's blog was determined to be the root cause of these undesireable relationships," continues Phül.

Wicklin says that re-aligning the SAS Dummy blog with corporate directions will not be hard. "Expect less humor and more math," he says. "I've got some great articles planned on computing multi-dimensional integrals and using orthogonal regression polynomials."

The archives of the SAS Dummy will continue to be hosted on blogs.sas.com, but the title of the discontinued blog will be changed to an unpronouncible symbol:

Alison Bolin, Editor of Blogs and Social Content at SAS, thanks Chris for his years of blogging. "Chris was instrumental in the success of our SAS blogging efforts. We are happy to announce that Chris will be leading a new R&D effort from our Siberian office, and we wish him well in his new endeavors."

These changes are effective as of today, April 1, 2011, also known as April Fools' Day.

2月 142011
If you tell my wife that she's married to a statistical geek, she'll nod knowingly. She is used to hearing sweet words of affection such as
You are more beautiful than Euler's identity.
My love for you is like the exponential function: increasing, unbounded, and transcendental.
But those are ordinary, everyday sentiments. For Valentine's Day, I want to do something really special, such as formulating a parametric expression whose image is heart-shaped. If you haven't gotten anything sweet for your special someone, you too can use SAS/IML Studio to create the following image!

Modify the program to create your own personal message. My personal message to my wife is "I love you."

/** Valentine's Day heart. Rick Wicklin: blogs.sas.com/iml **/
/** Parametrize in polar coordinates h(t) = (r(t), theta(t)) **/
Pi = constant("Pi");
t = do(0, 2*Pi, 0.01*Pi/4)`;
r = 2 - 2*sin(t) + sin(t)#sqrt(abs(cos(t))) / (sin(t)+1.4);
/** Convert to Euclidean coordinates for plotting **/
x = r#cos(t);
y = r#sin(t);

/** Use SAS/IML Studio to produce the image **/
declare DataObject dobj;
dobj = DataObject.Create("Valentine", {"x" "y"}, x||y );
dobj.AddVar("const", j(nrow(t),1));
dobj.SetMarkerFillColor(OBS_ALL, RED);

declare PolygonPlot p;
p = PolygonPlot.Create(dobj, "x", "y", "const", true);
"r = 2 - 2*sin(t) + sin(t)*sqrt(|cos(t)|) / (sin(t)+1.4)");
2月 102011
I enjoyed the Dataists' data-driven blog on the best numbers to choose in a Super Bowl betting pool. It reminded me of my recent investigation of which initials are most common.

Because the Dataists' blog featured an R function that converts Arabic numerals into Roman numerals, the blog post also reminded me that SAS has a built in format that does the same thing. The ROMANw. format makes it easy to convert everyday whole numbers into their Roman equivalent.

For example, here are a few numbers and their Roman counterparts:

data RomanNumerals;
input x @@;
n = x;
format n ROMAN8.;
1 2 3 4 5 9 30 40 45 50 1999 2011
proc print noobs; run;



























I've always wondered why SAS has this obscure format, but now I know of at least one useful application: using it to troll through Wikipedia to extract box scores from Super Bowl I through XLV.