Just for Fun

6月 182012

To celebrate special occasions like Father's Day, I like to relax with a cup of coffee and read the newspaper. When I looked at the weather page, I was astonished by the seeming uniformity of temperatures across the contiguous US. The weather map in my newspaper was almost entirely yellow and orange, which told me that the midday temperatures were predicted to have extremely small variation. It looked like the range of temperatures might vary by less than 30 degrees Fahrenheit (17 degrees Celsius), with most temperatures in the 70's and 80's.

To me this seemed like an extremely small variation, so that afternoon I downloaded the actual midday temperatures from Weather.com, which are shown below:

The Weather.com map uses a different color scale than my local paper. Nevertheless I quickly estimated that most of the temperatures were between 62 F and 92 F, with just a few hot spots in southern Texas, Arizona, and Nevada. The statistician in me wondered, what is the record for the "most uniform" temperatures at midday in the contiguous US? I'll bet that someone can automate the downloading of temperatures for standard US locations and answer the question. I'd love to see a graph of "temperature variation vs. date" for, say, the last ten or twenty years.

I don't know of a source for temperature data, but I figured that I might as well do SOME statistics to celebrate Father's Day weekend! I enlisted my youngest child to assist me in typing in the temperatures. (Yes, it was a father/daughter bonding moment.) I ran the following univariate analysis of the temperatures:

data Temps06162012;
input Temperature @@;
66 72 72 73 86 81 68 67 70 64
64 67 63 73 71 75 81 75 94 96 77 96 93
67 71 72 70 74 69 86 80 83 78 86
63 69 71 71 79 78 82 83 85 84 89 90 88 89 92 94
79 66 79 74 72 75 88 83 88 82 91 88 89 90 86 86 83 84
79 81 85 86 80 84 84 77 85 74 79 83 84 85
62 63 81 61 78 78 81 80 74 78 81 83 84 88 87
proc univariate data=Temps06162012;
   var Temperature;
   histogram Temperature;

The histogram and the accompanying tabular output told me that the mean temperature was a comfortable 79 degrees F, and that the standard deviations of temperatures was a paltry 8.56 degrees F. It was, indeed, a beautiful weekend for celebrating Father's Day.

Can anyone find a date for which the temperatures exhibited less variation? I'd love to see the analysis.

tags: Data Analysis, Just for Fun
6月 132012

A collegue who works with time series sent me the following code snippet. He said that the calculation was overflowing and wanted to know if this was a bug in SAS:

data A(drop=m);
call streaminit(12345);
m = 2;
x = 0;
do i = 1 to 5000;
   x = m*x + rand("Normal");
NOTE: A number has become too large at line 6 column 11.
      The number is >1.80E+308 or <-1.80E+308.

This behavior is expected, and is definitely not a bug. The behavior follows from a basic analysis of the computation within the DO loop. Take a moment to figure out why the magnitude of x becomes large in this program.

A simple discrete dynamical system

Before I became a statistician, I studied dynamical systems, which is the study of the long-term qualitative behavior of a differential equation or of an iterative map. A lot can be learned about the behavior of iterates of x by just ignoring the random term. (You can justify ignoring the random term by noting that rand("Normal") has zero mean.)

The iterative mapping x → mx is linear, so it has simple dynamical properties:

  • If m < 1, all initial values for x approach 0 under iteration because the sequence x, mx, m2x, ..., mnx approaches zero for all values of x.
  • If m > 1, all nonzero values of x diverge under iteration.

A discrete dynamical system with a stochastic component

Adding a random term with mean zero does not change the mathematics of this problem very much. However, now the dynamics are described in terms of "expected behavior," because the system is no longer deterministic. For the iterative mapping x → mx + ε, ε ∼ N(0,1), you can expect the following long-term behavior for the iterates:

  • the iterates to bounce around a neighborhood of zero when m < 1
  • the iterates diverge for m > 1

The following graphs show typical behavior:

data A;
call streaminit(12345);
phi = 0.99; x = 0;
do i = 1 to 5000;
   x = phi*x + rand("Normal"); output;
phi = 1.01; x = 0;
do i = 1 to 5000;
   x = phi*x + rand("Normal"); output;
proc sgpanel data=A;
panelby phi / uniscale=column rows=2;
refline 0 / axis=y;
scatter x=i y=x;

You might wonder about the case m = 1. There is a theorem that says that a random walk x → x + ε, ε ∼ N(0,1), will eventually get arbitrarily far from it's initial position. Therefore the case m = 1 would eventually diverge in finite precision arithmetic...but you might have to wait a REALLY long time!

tags: Just for Fun, Numerical Analysis, Statistical Thinking
3月 082012

In the United States, this upcoming weekend is when we turn our clocks forward one hour as we adopt daylight saving time. (Some people will also flip their mattresses this weekend!)

Daylight saving time (DST) in the US begins on the second Sunday in March and ends on the first Sunday in November. The US Naval Observatory has a table that shows when daylight saving time begins and ends. But how can you create such a table and how can you find the onset of DST for future years? By writing a SAS program, of course!

Base SAS has a useful little function called the INTNX function that enables you to find a date that is a certain number of time intervals after an earlier date. For example, you can use the INTNX function to find the date that is 7 weeks after today's date. There is even a SAS Knowledge Base sample that shows how to use the INTNX function to compute the beginning and end of daylight saving time. The following program simplifies the sample:

/* compute the beginning and end of daylight saving time for several years */
data DST;
format dst_beg dst_end DATE5.;
do year=2012 to 2022;
   fdo_mar = mdy( 3,1,year); /* first day of March */
   fdo_nov = mdy(11,1,year); /* first day of November */
   /* DST begins on the second Sunday in March */
   incr  =(weekday(fdo_mar)^=1) + 1;   /* 1 if fdo_mar is Sunday; 2 otherwise */
   dst_beg=intnx("week",fdo_mar,incr); /* 1 or 2 weeks after 01MAR */
   /* DST ends on the first Sunday in November */
   incr  =(weekday(fdo_nov)^=1);       /* 1 if fdo_nov is Sunday; 0 otherwise */
   dst_end=intnx("week",fdo_nov,incr); /* 0 or 1 weeks after 01NOV */
proc print noobs; 
var year dst_beg dst_end;

The tricky part about this code is the definition of the incr variable. If the first day of March is a Sunday, incr is set to 1. If the first day of March is Monday through Saturday, incr is set to 2. In either case, the "week" interval causes the INTNX function to return the first day (Sunday) of the second week in March, which is the day that DST begins. The calculation of the first Sunday in November is handled similarly.

There are some people who oppose flipping the clocks back and forth twice a year, but the rest of us can run our SAS program and then "spring forward and fall back" on the correct day. Or is it "spring back and fall forward"? Those of us who can't remember can insert that information into the program as well!

tags: Just for Fun, SAS Programming
12月 162011

A few colleagues and I were exchanging short snippets of SAS code that create Christmas trees and other holiday items by using the SAS DATA step to arrange ASCII characters. For example, the following DATA step (contributed by Udo Sglavo) creates a Christmas tree with ornaments and lights:

data _null_;
put @11 '@';
do i=0 to 9;
  put @(11-i) b;
put @10 '| |';
         | |

If you delete some unnecessary spaces, you can produce the Christmas tree in 90 characters. As a result, the DATA step is short enough to post on Twitter! So tweet the following message to all of your SAS friends!

#SAS XMas Tree http://bit.ly/uQELwe  
DATA;put@9'@';do i=0 to 8;b=substr(repeat('*0',i),1,2*i+1);put@(9-i)b;end;
put@8'| |';RUN;

Can you do better? One of my colleagues create a 76-character DATA step that produces a plain Christmas (with a star (*) on top) made entirely of the '0' character. Use the comments to post your favorite SAS code that spreads holiday cheer!

tags: Just for Fun, SAS Programming
12月 022011

Recently the "SAS Sample of the Day" was a Knowledge Base article with an impressively long title:

Sample 42165: Using a stored process to eliminate duplicate values caused by multiple group memberships when creating a group-based, identity-driven filter in SAS® Information Map Studio

"Wow," I thought. "This is the longest title on a SAS Sample that I have ever seen!"

This got me wondering whether anyone has run statistics on the SAS Knowledge Base. It would be interesting, I thought, to see a distribution of the length of the titles, to see words that appear most frequently in titles, and so forth.

I enlisted the aid of my friend Chris Hemedinger who is no dummy at reading data into SAS. A few minutes later, Chris had assembled a SAS data set that contained the titles of roughly 2,250 SAS Samples.

The length of titles

The first statistic I looked at was the length of the title, which you can compute by using the LENGTH function. A quick call to PROC UNIVARIATE and—presto!—the analysis is complete:

proc univariate data=SampleTitles;
   var TitleLength;
   histogram TitleLength;

The table of basic statistical measures shows that the median title length is about 50 characters long, with 50% of titles falling into the range 39–67 characters. Statistically speaking, a "typical" SAS Sample has 50 characters, such as this one: "Calculating rolling sums and averages using arrays." A histogram of the title lengths indicates that the distribution has a long tail:

The shortest title is the pithy "Heat Maps," which contains only nine characters. The longest title is the mouth-filling behemoth mentioned at the beginning of this article, which tips the scales at an impressive 173 characters and crushes the nearest competitor, which has a mere 149 characters.

Frequency of words that appear most often in SAS Samples

The next task was to investigate the frequency of words in the titles. Which words appear most often? The visual result of this investigation is a Wordle word cloud, shown at the beginning of this article. (In the word cloud, capitalization matters, so Using and using both appear.) As you might have expected, SAS and PROC are used frequently, as are action words such as use/using and create/creating. Nouns such as data, variable, example, and documentation also appear frequently.

You can do a frequency analysis of the words in the titles by using the COUNTW, SCAN, and SUBSTR functions to decompose the titles into words. The following SAS code excludes certain simple words (such as "a," "the," and "to") and runs PROC FREQ to perform a frequency analysis on the words that remain. The UPCASE function is used to combine words that differ only in capitalization:

data words;
keep Word;
set SampleTitles;
length Word $20;
count = countw(title);
do i = 1 to count;
   Word = scan(title, i);
   if substr(Word,1,3)="SAS" then Word="SAS"; /* get rid of (R) symbol */
   if upcase(Word) NOT IN ("A" "THE" "TO" "WITH" "FOR" "IN" "OF"
           "AND" "FROM" "AN" "ON" "THAT" "OR" "WHEN" 
           "1" "2" "3" "4" "5" "6" "7" "8" "9")
      & Word NOT IN ("by" "By") then do;
      Word = upcase(Word);
proc freq data=words order=freq noprint;
tables Word / out=FreqOut(where=(count>=50));
ods graphics / height=1200 width=750;
proc sgplot data=FreqOut;
dot Word / response=count categoryorder=respdesc;
xaxis values=(0 to 650 by 50) grid fitpolicy=rotate;

As is often the case, the distribution of frequencies decreases quickly and then has a long tail. The graph shows the frequency counts of terms that appear in titles more than 50 times.

tags: Data Analysis, Just for Fun, Statistical Graphics
11月 182011

Halloween night was rainy, so many fewer kids knocked on the door than usual. Consequently, I'm left with a big bucket of undistributed candy.

One evening as I treated myself to a mouthful of tooth decay, I chanced to open a package of Wonka® Bottle Caps. The package contained three little soda-flavored candies, and I was surprised to find that all three were the same flavor: grape. "Hey," I exclaimed to my youngest child, "they're all the same color! What do you think are the odds of that? You want to help me check some other packages?"

"Mom! Daddy's making me work on his blog," she complained.

After assuring her that she could devour the data when I finished my analysis, she helped me open the remaining packages of Bottle Caps and tabulate the flavors of the candies within.

The flavors of Bottle Caps candies are cola, root beer, grape, cherry, and orange. As we unwrapped the sugary data, my daughter and I hypothesized that the flavors of Bottle Caps were evenly distributed.

Although the hypothesis is the obvious one to make, it is not necessarily true for other candies. There have been dozens (hundreds?) of studies about the distribution of colors in M&M® candies. In fact, many math and statistics classes count the colors of M&M candies and compare the empirical counts to official percentages that are supplied by the manufacturer. The great thing about this classic project is that the colors of M&Ms are not uniformly distributed. Last time I checked, the official proportions were 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue.

So what about the Bottle Caps? There were 101 candies in 34 packages (one package contained only two candies). If our hypothesis is correct, the expected number of each flavor is 20.2 candies. We counted 23 Cherry, 15 Cola, 21 Grape, 25 Orange, and 17 Root Beer.

I entered the data into SAS for each package. (If you'd like to do further analysis, you can download the data.) Each package is a sample from a multinomial distribution, and I want to test whether the five flavors each have the same proportion, which is 1/5. I used the FREQ procedure to analyze the distribution of flavors and to run a chi-square test for equal proportions:

proc freq data=BottleCaps;
label Flavor = "Flavor";
weight Count;
tables Flavor / nocum chisq plots=DeviationPlot;

The chi-square test gives a large p-value, so there is no statistical indication that the proportions of flavors of Bottle Caps are not uniform. All of the deviations can be attributed to random sampling.

The FREQ procedure can automatically produce plots as part of the analysis. For these data, I asked for a bar chart that shows the relative deviations of the observed frequencies (O) from the expected frequencies (E). The relative deviation is part of the chi-square computation, and is computed as (O-E)/E.

Although the Bottle Caps that I observed had fewer cola and root beer flavors than expected, the chi-square test shows that these deviations are not significant. Anyway, my favorite Bottle Caps are cherry and orange, so I think the sampling gods smiled on me during this experiment.

Conclusion? Looks like Wonka makes the same number of each flavor. Now back to my pile of sugary goodness. I wonder how many candies are in a typical box of Nerds...

tags: Data Analysis, Just for Fun
11月 152011

One aspect of blogging that I enjoy is getting feedback from readers. Usually I get statistical or programming questions, but every so often I receive a comment from someone who stumbled across a blog post by way of an internet search. This morning I received the following delightful comment on my 2010 post, "Automating the Great Christmas Gift Exchange":

This is exactly what I need!!! We are a family of 10, including parents, siblings, and siblings' spouses. Spouses don't give to each other and of course you don't give to yourself! For the life of me, I can't figure out how to do it. I'm NOT a statistical programmer and need help! I'll send you some cookies this Christmas if you can create my family's gift exchange grid!!

So, at Debbie's request, I will revisit the Great Christmas Gift Exchange.

See the original article for the details, but the main idea is to define a feasible matrix: an N x N matrix of zeros and ones, where N is the number of people participating. In Debbie's case, N=10. The matrix has a 1 in the (i,j) position if person i is permitted to give to person j. In Debbie's case, the matrix is all ones except for 2x2 blocks of zeros along the diagonal:

proc iml;
/* algorithm to generate random gift exchange  */
/* Person_1 Spouse_1   ...  Person_5  Spouse_5 */
names  = {P1 Sp1 P2 Sp2 P3 Sp3 P4 Sp4 P5 Sp5};
N = 10;
Valid = {0 0 1 1 1 1 1 1 1 1,
         0 0 1 1 1 1 1 1 1 1,
         1 1 0 0 1 1 1 1 1 1,
         1 1 0 0 1 1 1 1 1 1,
         1 1 1 1 0 0 1 1 1 1,
         1 1 1 1 0 0 1 1 1 1,
         1 1 1 1 1 1 0 0 1 1,
         1 1 1 1 1 1 0 0 1 1,
         1 1 1 1 1 1 1 1 0 0,
         1 1 1 1 1 1 1 1 0 0

One way to generate gift exchanges is to generate random permutations. If the permutation is feasible (which you can determine by comparing it with the feasible matrix), then keep the permutation, otherwise generate another random permutation. This is implemented in the following program (download the program):

/* helper module: create a permutation matrix from a permutation, v */
start GetPermutationMatrix(v);
   n = ncol(v);
   P = j(n, n, 0);
   do i = 1 to n;
      P[i, v[i]] = 1;
   return (P);
NumYears = 10;
v = j(NumYears, N, .); /* to store valid permutations */
/* create random permutation of 1:N */
call randseed(12252011); /* set random seed; I used 12/25/2011 */
do Year = 1 to NumYears;
   s = 0;
   do while(s<N);
      perm = ranperm( N );
      P = GetPermutationMatrix( perm );
      s = sum(Valid#P); /** check if permutation is valid **/
   v[Year, ] = perm;   /* got a valid gift exchange; save it */
years = char(1:N);
Exchange = shape( names[v], N );
print Exchange[r=years c=names];

The above matrix gives valid gift exchanges for the next ten years for Debbie's family. Each column is a name; each row is a year. The names (column headers) stand for "Person" and "Spouse." Debbie might assign P1=Dad, Sp1=Mom, P2=Brother1, Sp1=Sister-in-law1, ..., P5=Debbie, Sp5=Husband. For Year 1, Dad give to P3, Mom gives to Husband, Brother1 gives to Debbie, Sister-in-law1 gives to Dad, ..., Debbie gives to Brother1, and Husband gives to Sister-in-Law1.

So that I don't get overwhelmed with cookie offers, here are possible ways to exchange gifts if your family has eight people (four pairs of spouses), six people (three pairs of spouses), or four people (two pairs of spouses). Notice that for two pairs of spouses, there are only four possible arrangements. You can also use an alternative algorithm for the case of three pairs of spouses, as shown at the end of my previous blog post.

As for the cookies, Debbie, this one's on me. If you feel compelled to show appreciation, please bring your home-baked cookies to a homeless shelter in your community or donate canned goods to a local food pantry.

tags: Just for Fun, Sampling and Simulation
11月 082011

What do you call an interview on Twitter? A Tw-interview? A Twitter-view?

Regardless of what you call it, I'm going to be involved in a "live chat" on Twitter this coming Thursday, 10NOV2011, 1:30–2:00pm ET. The hashtag is #saspress.

Shelly Goodin (@SASPublishing) and SAS Press author recruiter Shelley Sessoms (@SSessoms) will start by asking me (@RickWicklin) questions about my book, Statistical Programming with SAS/IML Software. During our conversation, you can jump in anytime and make a comment or ask a question. That's right, you (yes, YOU!) will have a chance to pepper me with questions. Ask about my book. Ask about my blog. Ask about programming in SAS. Ask me about publishing with SAS Press. But ask something interesting. Otherwise, it will be a long 30 minutes.

And to anticipate your objections before you can utter them, here is my list of the Top Five Reasons People Give for NOT Attending the Live Chat:

5. I don't have a Twitter account.
Sign up on Twitter. After you have an account, type @RickWicklin into the search field and click Search. My profile will appear. Click Follow.
4. I don't know what a hash tag is or how to "follow" one.
A hashtag is just a string (like #SAS, #saspress, or #statistics) that begins with the hash (or pound) sign. You can read more about Twitter hashtags. The nice thing about a hashtag is that you can search for all tweets that include it: Type the hashtag into the search field on Twitter and—voila!—all recent tweets that use the hashtag appear. During the Live Chat, search for #saspress every few minutes to follow the conversation. (Or use software such as TweetDeck to automatically track hashtags.)
3. I don't have the time.
Then sign on, ask your question, and leave. This is your chance to ask me technical questions like, "What is the chance that a random matrix is singular?"
2. I don't want to be bored.
Same here. I don't plan to spend a lot of time discussing where I grew up or whether I wear boxers or briefs. I prefer to answer questions about SAS, programming, blogging, statistics, matrix linear algebra, and so forth. If things get boring, I'll break out my arsenal of math and stat jokes. DON'T MAKE ME DO THAT!
1. I don't have any questions to ask you.
Review some of my recent blog posts for ideas.

Remember, you have to ask questions, or I'll resort to corny math jokes like "Why did the mathematician cross the Mobius strip?" (To get to the other...oh, never mind.)

See you there!
tags: Just for Fun
9月 302011

Birds migrate south in the fall. Squirrels gather nuts. Humans also have behavioral rituals in the autumn. I change the batteries in my smoke detectors, I switch my clocks back to daylight standard time, and I turn the mattress on my bed. The first two are relatively easy. There's even a mnemonic for changing the clocks to and from daylight savings time: "spring forward, fall back."

Man-handling a heavy, queen-size mattress can be a challenge, but it is important so that your mattress wears evenly. There are four positions that a mattress can be in, and it is important to use all four positions over a two-year cycle. I use a simple mnemonic to ensure that I alternate between all four positions: "spring spin, fall flip."

How did I develop this mnemonic? By using matrices, of course!

Mattress positions

There are three ways to turn a mattress:

  • The rotation (R): This is the easiest maneuver, and it requires only one person. It doesn't flip the mattress, but spins it on the box spring so that the head becomes the foot. (When an airplane rotates along this axis, the movement is called "yaw.")
  • The horizontal flip (H): This is the easiest way to flip a mattress from one side to the other. It requires that you turn the mattress along its long axis. (When an airplane rotates along this long axis, the movement is called "roll.")
  • The vertical flip (V): This is the hardest way to flip a mattress, and might be impossible if you have low ceilings! It requires that you turn the mattress along its short axis. (When an airplane rotates along this short axis, the movement is called "pitch.")

Fortunately, it turns out that you NEVER need to use the vertical flip! You can get even wear on your mattress by alternating the two easier moves. In the spring, use the rotation; in the fall, use the horizontal flip. "Spring spin, fall flip." Repeat this every year and your mattress will mathe-magically alternate between all four possible positions.

The matrices of mattresses

Rotations can be represented by the actions of matrices. Set up a coordinate axis in the middle of your mattress so that the y axis points towards the head of the bed and the z axis points toward the ceiling. Then the rotation (R) sends a vector (x, y, z) to the new vector (-x, -y, z). Similarly, the horizontal flip (H) send (x, y, z) to (-x, y, -z), and the vertical flip (V) send (x, y, z) to (x, -y, -z). This means that the three rotations can be represented by the following matrices:

proc iml;
/* R: rotation about the z axis = rotate 180 degrees in the (x,y)-plane */
R = {-1  0  0,
      0 -1  0,
      0  0  1};
/* H: rotation about the y axis = rotate 180 degrees in the (x,z)-plane */
H = {-1  0  0,
      0  1  0,
      0  0 -1};
/* V: rotation about the x axis = rotate 180 degrees in the (y,z)-plane */
V = { 1  0  0,
      0 -1  0,
      0  0 -1};

These matrices have a few interesting properties. The first is that they are their own inverses: H*H = V*V = R*R = I. The second is that you can use any two to obtain the third. In particular, R*H=V. This is great, because it means that you don't ever need to attempt the difficult vertical flip, you can use the product of simpler rotations to achieve the same result.

A previous mattress article

When I started writing this article, I searched the Internet for an illustration that I could use. I discovered a wonderful article that contains some excellent illustrations and much of the same content. The author is my fellow Cornellian, Steve Strogatz, and he wrote it for the New York Times . (By the way, if you haven't read Strogatz's book, Synch, I highly recommend it. He also gave an entertaining 20-minute TED talk on the subject of "synch.")

The Strogatz article illustrates the mattress rotations with the following colorful diagram, which shows the four possible positions of a mattress and how each rotation takes you from one position to another:

Suppose that your mattress is initially in the position in the upper left corner. Let's see what happens if you follow "spring spin, fall flip" for two consecutive years. That is, you rotate the mattress according to the sequence R, followed by H, followed by R, followed by H:

  1. After the first "spring spin," the mattress position is as shown in the lower right of the diagram.
  2. A "fall flip" moves the mattress into the position showed in the lower left.
  3. After a second "spring spin," the mattress position is as shown in the upper right.
  4. A second "fall flip" returns the mattress to its original position.

These matrix representations are not the only ones possible. You can also represent the rotations by using permutation matrices, which I've discussed in a previous article on gift-giving among family members. A permutation matrix acts on the vector (1,2,3,4) and maps the four positions of the mattress to each other. One permutation matrix is H = {0 1 0 0, 1 0 0 0, 0 0 0 1, 0 0 1 0}; I'll leave the construction of the R and V permutation matrices as an exercise.

tags: Just for Fun, Matrix Computations
9月 292011

I previously wrote about an intriguing math puzzle that involves 5-digit numbers with certain properties. This post presents my solution in the SAS/IML language.

It is easy to generate all 5-digit perfect squares, but the remainder of the problem involves looking at the digits of the squares. For this reason, I converted the set of all 5-digit numbers into an n x 5 array. I used the PUTN function to convert the numbers to strings, and then used the SUBSTR function to extract the first, second, third, fourth, and fifth digits into columns. (I then used the NUM function to change the character array back into a numeric matrix, but this is not necessary.)

The solution enables me to highlight three new functions in SAS 9.3:
  • The ELEMENT function enables you to find which elements in one set are contained in another set. I use this function to get rid of all 5-digit perfect squares that contain the digits {6,7,8,9,0}.
  • The ALLCOMB function generates all combinations of n elements taken k at a time. After I reduced the problem to a set of nine 5-digit numbers, I used the ALLCOMB function to look at all triplets of the candidate numbers.
  • The TABULATE subroutine computes the frequency distribution of elements in a vector or matrix. I used this subroutine to check the frequency of the digits in the triplets of numbers.

Here is my commented solution:

proc iml;
/* generate all 5-digit squares */
f0 = ceil(sqrt(10000));
f1 = floor(sqrt(99999));
AllSquares = T(f0:f1)##2;
/* convert to (n x 5) character array */
c = putn(AllSquares,"5.0");
m = j(nrow(c), 5," ");
do i = 1 to 5;   m[,i] = substr(c,i,1);  end;
m = num(m); /* convert to (n x 5) numerical matrix */
/* The numbers are clearly 1,2,3,4,5, since there 
   are 15 digits and each appears a unique number of times.
   Get rid of any rows that don't have these digits. */
bad = (6:9) || 0;
b = element(m, bad);   /* ELEMENT: SAS/IML 9.3 function */
hasBad = b[ ,+];       /* sum across columns */
g = m[loc(hasBad=0),]; /* only nine perfect squares left! */
/* Look at all 3-way combinations */
k = allcomb(nrow(g),3);/* ALLCOMB: SAS/IML 9.3 function */
SolnNumber=0;          /* how many solutions found? */
do i = 1 to nrow(k);
   soln = g[k[i,], ]; /* 3x5 matrix */
   /* The frequencies of the digits should be 1,2,3,4,5
      and the freq of a digit cannot equal the digit */
   call tabulate(levels, freq, soln); /* /* TABULATE: SAS/IML 9.3 function */
   if ncol(unique(freq))=5 then do;   /* are there five freqs? */
      if ^any(freq=(1:5)) then do;    /* any freq same as digit? */
         SolnNumber = SolnNumber+1;
         print "********" SolnNumber "********",
         soln[r={"Square1" "Square2" "Square3"}], freq;

At first, I didn't understand the last clue, so I printed out all seven triplets of numbers that satisfied the first five conditions. When I looked at the output, I finally made sense of the last clue, which is "If you knew which digit I have used just once, you could deduce my three squares with certainty." This told me to look closely at the FREQ vectors in the output. Of the seven solutions, one has frequency vector {3 5 4 2 1}, which means that the 1 digit appears three times, the 2 digit appears five times, and so on to the 5 digit, which appears once. In all of the other solutions, it is the 3 digit that appears once. Therefore, there is a unique solution in which the 5 digit appears only one time. The solution is as follows:

The SAS/IML language gave me some powerful tools that I used to solve the math puzzle. I'm particularly pleased that I only used two loops to solve this problem. I was able to vectorize the other computations.

Can you improve my solution? Use the comment to post (or link to) your program that solves the problem. The original post on the SAS Discussion Forum includes other ways to solve the problem in SAS.

tags: Just for Fun, Statistical Programming