5月 222019

The eigenvalues of a matrix are not easy to compute. It is remarkable, therefore, that with relatively simple mental arithmetic, you can obtain bounds for the eigenvalues of a matrix of any size. The bounds are provided by using a marvelous mathematical result known as Gershgorin's Disc Theorem. For certain matrices, you can use Gershgorin's theorem to quickly determine that the matrix is nonsingular or positive definite.

The Gershgorin Disc Theorem appears in Golub and van Loan (p. 357, 4th Ed; p. 320, 3rd Ed), where it is called the Gershgorin Circle Theorem. The theorem states that the eigenvalues of any N x N matrix, A, are contained in the union of N discs in the complex plane. The center of the i_th disc is the i_th diagonal element of A. The radius of the i_th disc is the absolute values of the off-diagonal elements in the i_th row. In symbols,
Di = {z ∈ C | |z - Ai i| ≤ ri }
where ri = Σi ≠ j |Ai j|. Although the theorem holds for matrices with complex values, this article only uses real-valued matrices.

An example of Gershgorin discs is shown to the right. The discs are shown for the following 4 x 4 symmetric matrix:

At first glance, it seems inconceivable that we can know anything about the eigenvalues without actually computing them. However, two mathematical theorems tells us quite a lot about the eigenvalues of this matrix, just by inspection. First, because the matrix is real and symmetric, the Spectral Theorem tells us that all eigenvalues are real. Second, the Gershgorin Disc Theorem says that the four eigenvalues are contained in the union of the following discs:

  • The first row produces a disc centered at x = 200. The disc has radius |30| + |-15| + |5| = 50.
  • The second row produces a disc centered at x = 100 with radius |30| + |5| + |5| = 40.
  • The third row produces a disc centered at x = 55 with radius |-15| + |5| + |0| = 20.
  • The last row produces a disc centered at x = 25 with radius |5| + |5| + |0| = 10.

Although the eigenvalues for this matrix are real, the Gershgorin discs are in the complex plane. The discs are visualized in the graph at the top of this article. The true eigenvalues of the matrix are shown inside the discs.

For this example, each disc contains an eigenvalue, but that is not true in general. (For example, the matrix A = {1 −1, 2 −1} does not have any eigenvalues in the disc centered at x=1.) What is true, however, is that disjoint unions of discs must contain as many eigenvalues as the number of discs in each disjoint region. For this matrix, the discs centered at x=25 and x=200 are disjoint. Therefore they each contain an eigenvalue. The union of the other two discs must contain two eigenvalues, but, in general, the eigenvalues can be anywhere in the union of the discs.

The visualization shows that the eigenvalues for this matrix are all positive. That means that the matrix is not only symmetric but also positive definite. You can predict that fact from the Gershgorin discs because no disc intersects the negative X axis.

Of course, you don't have to perform the disc calculations in your head. You can write a program that computes the centers and radii of the Gershgorin discs, as shown by the following SAS/IML program, which also computes the eigenvalues for the matrix:

proc iml;
A = { 200  30 -15  5,
       30 100   5  5,
      -15   5  55  0, 
        5   5   0 15};
evals = eigval(A);                 /* compute the eigenvalues */
center = vecdiag(A);               /* centers = diagonal elements */
radius = abs(A)[,+] - abs(center); /* sum of abs values of off-diagonal elements of each row */
discs = center || radius || round(evals,0.01);
print discs[c={"Center" "Radius" "Eigenvalue"} L="Gershgorin Discs"];

Diagonally dominant matrices

For this example, the matrix is strictly diagonally dominant. A strictly diagonally dominant matrix is one for which the magnitude of each diagonal element exceeds the sum of the magnitudes of the other elements in the row. In symbols, |Ai i| > Σi ≠ j |Ai j| for each i. Geometrically, this means that no Gershgorin disc intersects the origin, which implies that the matrix is nonsingular. So, by inspection, you can determine that his matrix is nonsingular.

Gershgorin discs for correlation matrices

The Gershgorin theorem is most useful when the diagonal elements are distinct. For repeated diagonal elements, it might not tell you much about the location of the eigenvalues. For example, all diagonal elements for a correlation matrix are 1. Consequently, all Gershgorin discs are centered at (1, 0) in the complex plane. The following graph shows the Gershgorin discs and the eigenvalues for a 10 x 10 correlation matrix. The eigenvalues of any 10 x 10 correlation matrix must be real and in the interval [0, 10], so the only new information from the Gershgorin discs is a smaller upper bound on the maximum eigenvalue.

Gershgorin discs for unsymmetric matrices

Gershgorin's theorem can be useful for unsymmetric matrices, which can have complex eigenvalues. The SAS/IML documentation contains the following 8 x 8 block-diagonal matrix, which has two pairs of complex eigenvalues:

A = {-1  2  0       0       0       0       0  0,
     -2 -1  0       0       0       0       0  0,
      0  0  0.2379  0.5145  0.1201  0.1275  0  0,
      0  0  0.1943  0.4954  0.1230  0.1873  0  0,
      0  0  0.1827  0.4955  0.1350  0.1868  0  0,
      0  0  0.1084  0.4218  0.1045  0.3653  0  0,
      0  0  0       0       0       0       2  2,
      0  0  0       0       0       0      -2  0 };

The matrix has four smaller Gershgorin discs and three larger discs (radius 2) that are centered at (-1,0), (2,0), and (0,0), respectively. The discs and the actual eigenvalues of this matrix are shown in the following graph. Not only does the Gershgorin theorem bound the magnitude of the real part of the eigenvalues, but it is clear that the imaginary part cannot exceed 2. In fact, this matrix has eigenvalues -1 ± 2 i, which are on the boundary of one of the discs, which shows that the Gershgorin bound is tight.


In summary, the Gershgorin Disc Theorem provides a way to visualize the possible location of eigenvalues in the complex plane. You can use the theorem to provide bounds for the largest and smallest eigenvalues.

I was never taught this theorem in school. I learned it from a talented mathematical friend at SAS. I use this theorem to create examples of matrices that have particular properties, which can be very useful for developing and testing software.

This theorem also helped me to understand the geometry behind "ridging", which is a statistical technique in which positive multiples of the identity are added to a nearly singular X`X matrix. The Gershgorin Disc Theorem shows the effect of ridging a matrix is to translate all of the Gershgorin discs to the right, which moves the eigenvalues away from zero while preserving their relative positions.

You can download the SAS program that I used to create the images in this article.

Further reading

There are several papers on the internet about Gershgorin discs. It is a favorite topic for advanced undergraduate projects in mathematics.

The post Gershgorin discs and the location of eigenvalues appeared first on The DO Loop.

9月 242018

Last week I compared the overhand shuffle to the riffle shuffle. I used random operations to simulate both kinds of shuffles and then compared how well they mix cards. The article caused one my colleague and fellow blogger, Rob Pratt, to ask if I was familiar with a bit of shuffling trivia: if you perform a perfect riffle shuffle, the cards return to their original order after exactly eight perfect shuffles! This mathematical curiosity is illustrated in a beautiful animation by James Miles, which shows the results of eight perfect shuffles.

After being reminded of this interesting fact, I wondered how the result generalizes to decks of various sizes. That is, if you use a deck with N cards, what is the minimum number of perfect riffle shuffles (call it P(N)) that you need to restore the deck to its original order? I decided to run a SAS program to discover the answer. The result is summarized in the following graph, which plots P(N) versus the number of cards in a deck. All points are below the identity line, which implies that at most N – 1 shuffles are required for a deck that contains N cards. If you want to learn more about the graph and its interesting patterns, read on.

Number of perfect shuffles for deck of N cards to restore the original order

Coding the perfect riffle shuffle in SAS

The perfect riffle shuffle is easier to program than the Gilbert-Shannon-Reeds (GSR) model, which uses randomness to model how real people shuffle real cards. In a perfect riffle shuffle, you cut the deck exactly two equal parts. Then you interlace the cards from the top stack with the cards from the bottom stack. The new deck ordering is TBTBTB..., where T indicates a card from the top half of the original deck and B indicates a card from the bottom half.

There are actually two perfect riffle shuffles for an even number of cards: you can interlace the cards TBTBTB..., which is called an "out shuffle," or you can interlace the cards BTBTBT..., which is called an "in shuffle." For odd-numbered decks, there is also a choice of where to cut the deck: does the top part of the deck get the extra card or does the bottom part? My program always uses the "out shuffle" convention (the top card stays on top) and gives the top stack the extra card when there is an odd number of cards. Therefore, the shuffled deck looks like TBTB...TB for even-numbered decks and TBTB...TBT for odd-numbered decks. The following SAS/IML function takes a row vector that represents a card deck and performs a perfect riffle shuffle to reorder the cards in the deck:

proc iml;
start PerfectRiffle(deck);
   N = ncol(deck);                        /* send in deck as a row vector */
   shuffle = j(1,N,.);                    /* allocate new deck */
   nT = ceil(N/2);                        /* cut into two stacks; "Top" gets extra card when N odd */
   nB = N - nT;                           /* nT = size of Top; nB = size of Bottom */
   shuffle[, do(1,N,2)] = deck[,1:nT];   /* top cards go into odd positions */
   shuffle[, do(2,N,2)] = deck[,nT+1:N]; /* bottom cards go into even positions */
/* test the algorithm on a deck that has N = 10 cards */
origDeck = 1:10;
deck = PerfectRiffle(origDeck);
print (origdeck//deck)[r={"Orig Deck" "Perfect Riffle"}];
Perfect shuffle on deck of N=10 cards

The output shows one perfect riffle of a deck that has 10 cards that are originally in the order 1, 2, ..., 10.

How many perfect riffle shuffles to restore order?

If you call the PerfectSuffle function repeatedly on the same deck, you can simulate perfect riffle shuffles. After each shuffle, you can test whether the order of the deck is in the original order. If so, you can stop shuffling. If not, you can shuffle again. The following SAS/IML loops test decks that contain up to 1,000 cards. The array nUntilRestore keeps track of how many perfect riffle shuffles were done for each deck size.

decksize = 1:1000;                       /* N = 1, 2, 3, ..., 1000 */
nUntilRestore = j(1, ncol(decksize), .); /* the results vector */
do k = 2 to ncol(decksize);              /* for the deck of size N */
   origDeck = 1:decksize[k];             /* original order is 1:N */
   deck = OrigDeck;                      /* set order of deck */
   origOrder = 0;                        /* flag variable */
   do nShuffles = 1 to decksize[k] until (origOrder); /* repeat shuffling */
      deck = PerfectRiffle( deck );      /* shuffle deck */
      origOrder = all(deck = origDeck);  /* until the order of the cards are restored */
   nUntilRestore[k] = nShuffles;         /* store the number of shuffles */
/* print the results for N=1,..,99 */
x = j(10, 10, .);
x[2:100] = nUntilRestore[1:99];          /* present in 10x10 array */
rr = putn( do(0, 90, 10), "Z2.");        /* each row has 10 elements */
cc = ("0":"9");
print x[r=rr c=cc L="Number of Perfect Shuffles to Restore Order"];
title "Number of Perfect Shuffles to Restore Deck Order";
call scatter(deckSize, nUntilRestore) grid={x y} other="lineparm x=0 y=0 slope=1;"
     label={"N: Number of Cards in Deck" "P(N): Number of Perfect Shuffles to Restore Order"}
Number of perfect shuffles required to restore order in a deck of N cards, N-1..99

The table shows the number of perfect riffle shuffles required for decks that have fewer than 99 cards. The results are in a 10x10 grid. The first row shows decks that have less than 10 cards, the second row shows sizes 10-19, and so on. For example, to find the result for a 52-card deck, move down to the row labeled "50" and over to the column labeled "2". The result in that cell is 8, which is the number of perfect shuffles required to restore a 52-card deck to its original order.

Remarks on the number of perfect riffle shuffles

The graph at the top of this article shows the number of shuffles for decks that contain up to 1,000 cards. To me, the most surprising feature of the graph is the number of points that fall near diagonal lines of various rational slopes. The most apparent diagonal features are the lines whose slopes are approximately equal to 1, 1/2, and 1/3.

A complete mathematical description of this problem is beyond the scope of this blog article, but here are a few hints and links to whet your appetite.

  • The integer sequence P(N) is related to the "multiplicative order of 2 mod 2n+1" in the On-Line Encyclopedia of Integer Sequences (OEIS). The encyclopedia includes a graph that is similar to the one here, but is computed for N ≤ 10,000. That integer sequence applies to the number of perfect riffle shuffles of an EVEN number of cards.
  • The link to the OEIS shows a quick way to explicitly find P(N) for even values of N. P(N) is the smallest value of k for which 2k is congruent to 1 (mod N–1). For example, 8 is the smallest number for which 28 is congruent to 1 (mod 51) as you can compute explicitly: the value of mod(2**8, 51) is 1.
  • The points in the graph for which P(N) = N-1 are all prime numbers: 2, 3, 5, 11, 13, 19, 29, 37, 53, 59, 61, 67, .... However, notice that not all prime numbers are on this list.

There is much more to be said about this topic, but I'll stop here. If you have something to add, please leave a comment.

The post How many perfect riffle shuffles are required to restore a deck to its initial order? appeared first on The DO Loop.

9月 102018

Given a rectangular grid with unit spacing, what is the expected distance between two random vertices, where distance is measured in the L1 metric? (Here "random" means "uniformly at random.") I recently needed this answer for some small grids, such as the one to the right, which is a 7 x 6 grid. The graph shows that the L1 distance between the points (2,6) and (5,2) is 7, the length of the shortest path that connects the two points. The L1 metric is sometimes called the "city block" or "taxicab" metric because it measures the distance along the grid instead of "as the bird flies," which is the Euclidean distance.

The answer to the analogous question for the continuous case (a solid rectangle) is difficult to compute. The main result is that the expected distance is less than half of the diameter of the rectangle. In particular, among all rectangles of a given area, a square is the rectangle that minimizes the expected distance between random points.

Although I don't know a formula for the expected distance on a discrete regular grid, the grids in my application were fairly small so this article shows how to compute all pairwise distances and explicitly find the average (expected) value. The DISTANCE function in SAS/IML makes the computation simple because it supports the L1 metric. It is also simple to perform computer experiments to show that among all grids that have N*M vertices, the grid that is closest to being square minimizes the expected L1 distance.

L1 distance on a regular grid

An N x M grid contains NM vertices. Therefore the matrix of pairwise distances is NM x NM. Without loss of generality, assume that the vertices have X coordinates between 1 and N and Y coordinates between 1 and M. Then the following SAS/IML function defines the distance matrix for vertices on an N x M grid. To demonstrate the computation, the distance matrix for a 4 x 4 grid is shown, along with the average distance between the 16 vertices:

proc iml;
start DistMat(rows, cols);
   s = expandgrid(1:rows, 1:cols);   /* create (x, y) ordered pairs */
   return distance(s, "CityBlock");  /* pairwise L1 distance matrix */
/* test function on a 4 x 4 grid */
D = DistMat(4, 4);
AvgDist = mean(colvec(D));
print D[L="L1 Distance Matrix for 4x4 Grid"];
print AvgDist[L="Avg Distance on 4x4 Grid"];
L1 Distance matrix for vertices on a regular 4 x 4 grid Average L1 distance on a 4 x 4 grid

For an N x M grid, the L1 diameter of the grid is the L1 distance between opposite corners. That distance is always (N-1)+(M-1), which equates to 6 units for a 4 x 4 grid. As for the continuous case, the expected L1 distance is less than half the diameter. In this case, E(L1 distance) = 2.5.

A comparison of distances for grids of different aspect ratios

As indicated previously, the expected distance between two random vertices on a grid depends on the aspect ratio of the grid. A grid that is nearly square has a smaller expected distance than a short-and-wide grid that contains the same number of vertices. You can illustrate this fact by computing the distance matrices for grids that each contain 36 vertices. The following computation computes the distances for five grids: a 1 x 36 grid, a 2 x 18 grid, a 3 x 12 grid, a 4 x 9 grid, and a 6 x 6 grid.

/* average L1 distance on 36 x 36 grid in several configurations */
rows = {1, 2, 3, 4, 6};
cols = N / rows;
AvgDist = j(nrow(rows), 1, .);
do i = 1 to nrow(rows);
   D = DistMat(rows[i], cols[i]);
   AvgDist[i] = mean(colvec(D));
/* show average distance as a decimal and as a fraction */
numer = AvgDist*108; AvgDistFract = char(numer) + " / 108"; 
print rows cols AvgDist AvgDistFract;

The table shows that short-and-wide tables have an average distance that is much greater than a nearly square grid that contains the same number of vertices. When the points are arranged in a 6 x 6 grid, the distance matrix naturally decomposes into a block matrix of 6 x 6 symmetric blocks, where each block corresponds to a row. When the points are arranged in a 3 x 12 grid, the distance matrix decomposes into a block matrix of 12 x 12 blocks The following heat maps visualize the patterns in the distance matrices:

D = DistMat(6, 6);
call heatmapcont(D) title="L1 Distance Between Vertices in a 6 x 6 Grid";
D = DistMat(3, 12);
call heatmapcont(D) title="L1 Distance Between Vertices in a 3 x 12 Grid";
Visualization of L1 distance matrix for items arranged on a 6 x 6 grid Visualization of L1 distance matrix for items arranged on a 3 x 12 grid

Expected (average) distance versus number of rows

You can demonstrate how the average distance depends on the number of rows by choosing the number of vertices to be a highly composite number such as 5,040. A highly composite number has many factors. The following computation computes the average distance between points on 28 grids that each contain 5,040 vertices. A line chart then displays the average distance as a function of the number of rows in the grid:

N = 5040;               /* 5040 is a highly composite number */
rows = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 18, 20, 
       21, 24, 28, 30, 35, 36, 40, 42, 45, 48, 56, 60, 63, 70};
cols = N / rows;
AvgDist = j(nrow(rows), 1, .);
do i = 1 to nrow(rows);
   D = DistMat(rows[i], cols[i]);
   AvgDist[i] = mean(colvec(D));
title "Average L1 Distance Between 5040 Objects in a Regular Grid";
call series(rows, AvgDist) grid={x y} other="yaxis offsetmin=0 grid;";
Expected L1 distance between vertices on a regular grid with 5,040 points. The curve demonstraes that nearly square grids have a much lower average distance than short and wide grids.

Each point on the graph represents the average distance for an N x M grid where NM = 5,040. The horizontal axis displays the value of N (rows). The graph shows that nearly square grids (rows greater than 60) have a much lower average distance than very short and wide grids (rows less than 10). The scale of the graph makes it seem like there is very little difference between the average distance in a grid with 40 versus 70 rows, but that is somewhat of an illusion. A 40 x 126 grid (aspect ratio = 3.15) has an average distance of 55.3; a 70 x 72 grid (aspect ratio = 1.03) has an average distance of 47.3.

Applications and conclusions

In summary, you can use the DISTANCE function in SAS/IML to explicitly compute the expected L1 distance (the "city block" distance) between random points on a regular grid. You can minimize the average pairwise distance by making the grid as square as possible.

City planning provides a real-world application of the L1 distance. If you are tasked with designing a city with N buildings along a grid, then the average distance between buildings is smallest when the grid is square. Of course, in practice, some buildings have a higher probability of being visited than others (such as a school or a grocery). You should position those buildings in the center of town to shorten the average distance that people travel.

The post Distances on rectangular grids appeared first on The DO Loop.

9月 062018

Continued fractions show up in surprising places. They are used in the numerical approximations of certain functions, including the evaluation of the normal cumulative distribution function (normal CDF) for large values of x (El-bolkiny, 1995, p. 75-77) and in approximating the Lambert W function, which has applications in the modeling of heavy-tailed error processes. This article shows how to use SAS to convert a rational number into a finite continued fraction expansion and vice versa.

What is a continued fraction?

I discussed continued fractions in a previous post about the contiunued fraction expansion of pi. Recall that every real number x can be represented as a continued fraction:

In this continued fraction, a0 is the integer portion of the number and the ai for i > 0 are positive integers that represent the noninteger portion. Because all of the fractions have 1 in the numerator, the continued fraction can be compactly represented by specifying only the integers: x = [a0; a1, a2, a3, ...]. Every rational number is represented by a finite sequence; every irrational number requires an infinite sequence. Notice that the decimal representation of a number does NOT have this nice property: 1/3 is rational but its decimal representation is infinite.

Convert a finite continued fraction to a rational number

My article about the continued fraction expansion of pi contains a few lines of SAS code that compute the decimal representation of a finite continued fraction. Since every finite continued fraction corresponds to a rational number, you can modify the algorithm to compute the rational number instead of a decimal approximation.

The algorithm is straightforward. Given a vector
a = [a0; a1, a2, ..., ak],
start at the right side, form the fraction sk = 1 / ak. Then move to the left and compute the fraction sk-1 = ak-1 + sk. Continue moving to the left until all fractions are added. In the SAS/IML language (or in the SAS DATA step), you can represent each fraction as a two-element vector where the first element is the numerator of the fraction and the second is the denominator of the fraction. This leads to the following algorithm that computes a rational number from a finite continued fraction expansion:

proc iml;
start ContinuedFrac(_a);
   a = colvec(_a);
   f = a[nrow(a)] // 1;         /* trick: start with reciprocal */
   do i = nrow(a)-1 to 1 by -1; /* evaluate from right to left */
      f = f[2:1];               /* take reciprocal */
      f[1] = a[i]*f[2] + f[1];  /* compute new numerator */
   return f;
a = {3, 4, 12, 4}; 
f = ContinuedFrac(a);      /* 649 / 200 */
pi = {3, 7,15,1,292,1,1,1,2,1,3,1};
f_pi = ContinuedFrac(pi);  /* 5419351 / 1725033 */
e = {2, 1,2,1,1,4,1,1,6,1,1,8};
f_e = ContinuedFrac(e);    /* 23225 / 8544 */
print (f||f_pi||f_e)[r={"Numer" "Denom"} c={"f" "f_pi" "f_e"}];
Convert (finite) continued fractions to rational numbers

The examples show a few continued fraction expansions and the rational numbers that they represent:

  • The continued fraction expansion (cfe) [3; 4, 12, 4] corresponds to the fraction 649 / 200.
  • The 12-term partial cfe for pi corresponds to the fraction 5419351 / 1725033. You might have learned in grade school that 22/7 is an approximation to pi. So to is this fraction.
  • The partial cfe for e (using 12 terms) corresponds to the fraction 23225 / 8544.

Rational numbers to continued fractions

You can program the inverse algorithm to produce the cfe from a rational number. The algorithm is called the Euclidean algorithm and consists of finding the integer part, subtracting the integer part from the original fraction to obtain a remainder (expressed as a fraction), and then repeating this process. The process terminates for every rational number. In the SAS/IML language, the Euclidean algorithm looks like this:

start EuclidAlgRational(val, nSteps=10);
   a = j(nSteps, 1, .);
   x = val;
   done = 0;
   do i = 1 to nSteps while(^done);
      a[i] = floor(x[1] / x[2]);
      x[1] = x[1] - a[i]*x[2];  /* remainder */
      if x[1]=0 then done=1;
      x = x[2:1];               /* reciprocal */
   idx = loc(a=.);
   if ncol(idx)>0 then a = colvec(remove(a, idx));
   return a;
v = {649, 200};          /* = [3,4,12,4] */
a = EuclidAlgRational(v);
v = {5419351, 1725033};  /* pi = [3; 7,15,1,292,1,1,1,2,1,3,1,...] */
a_pi = EuclidAlgRational(v);
v = {23225, 8544};       /* e = [2; 1,2,1,1,4,1,1,6,1,1,8,...] */
a_e = EuclidAlgRational(v);
print a a_pi a_e;
Convert rational numbers to (finite) continued fractions

Each column of the table shows the continued fraction representations of a rational number. The output shows that the ContinuedFrac and EuclidAlgRational functions are inverse operations: each function undoes the action of the other.

Two amazing properties of the continued fraction representation

There are some very interesting mathematical properties of continued fractions. Two statistical properties are discussed in Barrow (2000), "Chaos in Numberland: The secret life of continued fractions." Both properties are illustrated by the first 97 terms in the continued fraction expansion of pi, which are
π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1,
15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7,
1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2,
4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24, 1, 2, 1, 3, 1, 2, 1, ...]

The first property concerns the probability distribution of the ak for sufficiently large values of k. Notice that the continued fractions in this article mostly contain small integers such as 1, 2, 3, and 4. It turns out that this is generally true. For almost every irrational number in (0,1) and for k sufficiently large, the k_th term ak is more likely to be a small integer than a large integer. In fact, the Gauss-Kuzmin theorem states that the distribution of ak (thought of as a random variable) asymptotically approaches the probability mass function P(ak = t) = -log2(1 - 1/(1+t)2), which is shown below for t ≤ 20. Thus for a random irrational number, although the value of, say, the 1000th term in the continued fraction expansion could be any positive integer, it is probably less than 20!

The second interesting mathematical property is Khinchin's Theorem, which states that for almost every irrational number in (0,1), the geometric mean of the first t digits in the continued fraction expansion converges to a constant called Khinchin's constant, which has the value 2.685452001.... For example, the first 97 terms in the continued fraction expansion of pi have a geometric mean of 2.685252.

Gauss-Kuzmin distribution of the coefficients of a continued fraction expnsion

The post The continued fraction representation of a rational number appeared first on The DO Loop.

7月 162018

My colleague Robert Allison recently blogged about using the diameter of Texas as a unit of measurement. The largest distance across Texas is about 801 miles, so Robert wanted to find the set of all points such that the distance from the point to Texas is less than or equal to 801 miles.

Robert's implementation was complicated by the fact that he was interested in points on the round earth that are within 801 miles from Texas as measured along a geodesic. However, the idea of "thickening" or "inflating" a polygonal shape is related to a concept in computational geometry called the offset polygon or the inflated polygon. A general algorithm to inflate a polygon is complicated, but this article demonstrates the basic ideas that are involved. This article discusses offset regions for convex and nonconvex polygons in the plane. The article concludes by drawing a planar region for a Texas-shaped polygon that has been inflated by the diameter of the polygon. And, of course, I supply the SAS programs for all computations and images.

Offset regions as a union of circles and rectangles

Assume that a simple polygon is defined by listing its vertices in counterclockwise order. (Recall that a simple polygon is a closed, nonintersecting, shape that has no holes.) You can define the offset region of radius r as the union of the following shapes:

  • The polygon itself
  • Circles of radius r centered at each vertex
  • Rectangles of width r positions outside the polygon along each edge

The following graphic shows the offset region (r = 0.5) for a convex "house-shaped" polygon. The left side of the image shows the polygon with an overlay of circles centered at each vertex and outward-pointing rectangles along each edge. The right side of the graphic shows the union of the offset regions (blue) and the original polygon (red):

The image on the right shows why the process is sometimes called an "inflating" a polygon. For a convex polygon, the edges are pushed out by a distance r and the vertices become rounded. For large values of r, the offset region becomes a nearly circular blob, although the boundary is always the union of line segments and arcs of circles.

You can draw a similar image for a nonconvex polygon. The inflated region near a convex (left turning) vertex looks the same as before. However, for the nonconvex (right turning) vertices, the circles do not contribute to the offset region. Computing the offset region for a nonconvex polygon is tricky because if the distance r is greater than the minimum distance between vertices, nonlocal effects can occur. For example, the following graphic shows a nonconvex polygon that has two "prongs." Let r0 be the distance between the prongs. When you inflate the polygon by an amount r > r0/2, the offset region can contain a hole, as shown. Furthermore, the boundary of the offset regions is not a simple polygon. For larger values of r, the hole can disappear. This demonstrates why it is difficult to construct the boundary of an offset region for nonconvex polygons.

Inflating a Texas-shaped polygon

The shape of the Texas mainland is nonconvex. I used PROC GREDUCE on the MAPS.US data set in SAS to approximate the shape of Texas by a 36-sided polygon. The polygon is in a standardized coordinate system and has a diameter (maximum distance between vertices) of r = 0.2036. I then constructed the inflated region by using the same technique as shown above. The polygon and its inflated region are shown below.

The image on the left, which shows 36 circles and 36 rectangles, is almost indecipherable. However, the image on the right is almost an exact replica of the region that appears in Robert Allison's post. Remember, though, that the distances in Robert's article are geodesic distances on a sphere whereas these distances are Euclidean distances in the plane. For the planar problem, you can classify a point as within the offset region by testing whether it is inside the polygon itself, inside any of the 36 rectangles, or within a distance r of a vertex. That computation is relatively fast because it is linear in the number of vertices in the polygon.

I don't want to dwell on the computation, but I do want to mention that it requires fewer than 20 SAS/IML statements! The key part of the computation uses vector operations to construct the outward-facing normal vector of length r to each edge of the polygon. If v is the vector that connects the i_th and (i+1)_th vertex of the polygon, then the outward-facing normal vector is given by the concise vector expression r * (v / ||v||) * M, where M is a rotation matrix that rotates by 90 degrees. You can download the SAS program that computes all the images in this article.

In conclusion, you can use a SAS program to construct the offset region for an arbitrary simple polygon. The offset region is the union of circles, rectangles, and the original polygon, which means that it is easy to test whether an arbitrary point in the plane is in the offset region. That is, you can test whether any point is within a distance r to an arbitrary polygon.

The post Offset regions: Find all points within a specified distance from a polygon appeared first on The DO Loop.

7月 092018

Back in high school, you probably learned to find the intersection of two lines in the plane. The intersection requires solving a system of two linear equations. There are three cases: (1) the lines intersect in a unique point, (2) the lines are parallel and do not intersect, or (3) the lines are coincident. Thus, for two lines, the intersection problem has either 1, 0, or infinitely many solutions. Most students quickly learn that the lines always intersect when their slopes are different, whereas the special cases (parallel or coincident) occur when the lines have the same slope.

Recently I had to find the intersection between two line segments in the plane. Line segments have finite extent, so segments with different slopes may or may not intersect. For example, the following panel of graphs shows three pairs of line segments in the plane. In the first panel, the segments intersect. In the second panel, the segments have the same slopes as in the first panel, but these segments do not intersect. In the third panel, the segments intersect in an interval. This article shows how to construct a linear system of equations that distinguishes between the three cases and compute an intersection point, if it exists.

Parameterization of line segments

Let p1 and p2 be the endpoints of one segment and let q1 and q2 be the endpoints of the other. Recall that a parametrization of the first segment is (1-s)*p1 + s*p2, where s ∈ [0,1] and the endpoints are treated as 2-D vectors. Similarly, a parametrization of the second segment is (1-t)*q1 + t*q2, where t ∈ [0,1]. Consequently, the segments intersect if and only if there exists values for (s,t) in the unit square such that
(1-s)*p1 + s*p2 = (1-t)*q1 + t*q2
You can rearrange the terms to rewrite the equation as
(p2-p1)*s + (q1-q2)*t = q1 - p1

This is a vector equation which can be rewritten in terms of matrices and vectors. Define the 2 x 2 matrix A whose first column contains the elements of (p2-p1) and whose second column contains the elements of (q1-q2). Define b = q1 - p1 and x = (s,t). If the solution of the linear system A*x = b is in the unit square, then the segments intersect. If the solution is not in the unit square, the segments do not intersect. If the segments have the same slope, then the matrix A is singular and you need to perform additional tests to determine whether the segments intersect.

A vector solution for the intersection of segments

As shown above, the intersection of two planar line segments is neatly expressed in terms of a matrix-vector system. In SAS, the SAS/IML language provides a natural syntax for expressing and solving matrix-vector equations. The following SAS/IML function constructs and solves a linear system. For simplicity, this version does not handle the degenerate case of two segments that have the same slope. That case is handled in the next section.

start IntersectSegsSimple(p1, p2, q1, q2);
   b = colvec(q1 - p1); 
   A = colvec(p2-p1) || colvec(q1-q2); /* nonsingular when segments have different slopes */
   x = solve(A, b);                    /* x = (s,t) */
   if all(0<=x && x<=1) then           /* if x is in [0,1] x [0,1] */
      return (1-x[1])*p1 + x[1]*p2;    /* return intersection */
   else                                /* otherwise, segments do not intersect */
      return ({. .});                  /* return missing values */
/* Test 1: intersection at (0.95, 1.25) */
p1 = {1.8 2.1};   p2 = {0.8 1.1};
q1 = {1 1.25};    q2 = {0 1.25};
z = IntersectSegsSimple(p1,p2,q1,q2);
print z;
/* Test 2: no intersection */
p1 = {-1 0.5};  p2 = {1 0.5};
q1 = {0 1};     q2 = {0 2};
v = IntersectSegsSimple(p1, p2, q1, q2);
print v;

The function contains only a few statements. The function is called to solve the examples in the first two panels of the previous graph. The SOLVE function solves the linear system (assuming that a solution exists), and the IF-THEN statement tests whether the solution is in the unit square [0,1] x [0,1]. If so, the function returns the point of intersection. If not, the function returns a pair of missing values.

The general case: Handling overlapping line segments

For many applications, the function in the previous section is sufficient because it handles the generic cases. For completeness the following module also handles segments that have identical slopes. The DET function determines whether the segments have the same slope. If so, the segments could be parallel or collinear. To determine whether collinear segments intersect, you can test for three conditions:

  • The endpoint q1 is inside the segment [p1, p2].
  • The endpoint q2 is inside the segment [p1, p2].
  • The endpoint p1 is inside the segment [q1, q2].

Notice that the condition "p2 is inside [q1,q2]" does not need to be checked separately because it is already handled by the existing checks. If any of the three conditions are true, there are infinitely many solutions (or the segments share an endpoint). If none of the conditions hold, the segments do not intersect. For overlapping segments, the following function returns an endpoint of the intersection interval.

/* handle all cases: determine intersection of two planar line segments 
   [p1, p2] and [q1, q2] */
start Intersect2DSegs(p1, p2, q1, q2);
   b = colvec(q1 - p1); 
   A = colvec(p2-p1) || colvec(q1-q2);
   if det(A)^=0 then do;         /* nonsingular system: 0 or 1 intersection */
      x = solve(A, b);           /* x = (s,t) */
      if all(0<=x && x<=1) then  /* if x is in [0,1] x [0,1] */
         return (1-x[1])*p1 + x[1]*p2;  /* return intersection */
      else                       /* segments do not intersect */
         return ({. .});         /* return missing values */
   /* segments are collinear: 0 or infinitely many intersections */
   denom = choose(p2-p1=0, ., p2-p1);  /* protect against division by 0 */
   s = (q1 - p1) / denom;        /* Is q1 in [p1, p2]? */
   if any(0<=s && s<=1) then
      return q1;
   s = (q2 - p1) / denom;        /* Is q2 in [p1, p2]? */
   if any(0<=s && s<=1) then
      return q2;
   denom = choose(q2-q1=0, ., q2-q1);  /* protect against division by 0 */
   s = (p1 - q1) / denom;        /* Is p1 in [q1, q2]? */
   if any(0<=s && s<=1) then
      return p1;
   return ({. .});               /* segments are disjoint */
/* test overlapping segments; return endpoint of one segment */
p1 = {-1 1};  p2 = {1 1};
q1 = {0 1};   q2 = {2 1};
w = Intersect2DSegs(p1, p2, q1, q2);
print w;

In summary, by using matrices, vectors, and linear algebra, you can easily solve for the intersection of two line segments or determine that the segments do not intersect. The general case needs some special logic to handle degenerate configurations, but the code that solves the generic cases is straightforward when expressed in a vectorized language such as SAS/IML.

The post The intersection of two line segments appeared first on The DO Loop.

11月 092016

This article uses graphical techniques to visualize one of my favorite geometric objects: the surface of a three-dimensional torus. Along the way, this article demonstrates techniques that are useful for visualizing more mundane 3-D point clouds that arise in statistical data analysis.

Projections of a 3-D torus: Four visualization techniques for high-dimensional data. #DataViz
Click To Tweet

Define points on a torus

A torus is the product of two circles, so it can be parameterized by two angles, usually called θ and φ. You can create a regular grid of points in the square (θ, φ) ∈ [0, 2π] x [0, 2π] and map the points into Euclidean three-space as shown in the following SAS DATA step:

data Torus;
R = 8;       /* radius to center of tube */
A = 3;       /* radius of tube */
pi = constant('pi');
step = 2*pi/50;
/* create torus as parametric image of [0, 2*pi] x [0,2*pi] */
do theta = 0 to 2*pi-step by step;
   do phi = 0 to 2*pi-step by 2*pi/50;
      x = (R + A*cos(phi)) * cos(theta);
      y = (R + A*cos(phi)) * sin(theta);
      z =      A*sin(phi);
keep x y z;
title "Projections of a Standard Torus";
proc sgscatter data=Torus;
   matrix x y z;
Projections of torus onto coordinate planes

The SGSCATTER procedure displays the projection of the torus onto the three coordinate planes. In the (x,y) plane you can see that the object has a hole, but the projection onto the other coordinate planes are not very insightful because there is a lot of overplotting.

Rotating scatter plot in SAS/IML Studio

One way to avoid overplotting is to visualize the torus as a 3-D cloud of points. The SAS/IML Studio environment supports a rotating 3-D scatter plot, as shown to the left. In SAS/IML Studio you can interactively rotate the 3-D cloud of points, change the marker colors, and perform other techniques in exploratory data analysis.

Alternatively, if you want to look at planar projections with PROC SGSCATTER, you can rotate the torus so that the projections onto the coordinate planes are not degenerate.

Rotating a cloud of points

My previous article defined SAS/IML functions that create rotation matrices. The following SAS/IML program is almost identical to the program in the previous article, so I will not explain each statement. The program reads the Torus data set, rotates the points in a particular way, and writes the rotated coordinates to a SAS data set.

proc iml;
/* choose rotation matrix as product of canonical rotations */
load module=Rot3D;         /* see */
pi = constant('pi');
Rz = Rot3D(-pi/6, "Z");    /* rotation matrix for (x,y) plane */
Rx = Rot3D(-pi/3, "X");    /* rotation matrix for (y,z) plane */ 
Ry = Rot3D( pi/6, "Y");    /* rotation matrix for (x,z) plane */
P = Rx*Ry*Rz;              /* cumulative rotation */
use Torus;                 /* read data (points on a torus) */
read all var {x y z} into M;
close Torus;
Rot = M * P`;              /* apply rotation and write to data set */
create RotTorus from Rot[colname={"Px" "Py" "Pz"}];
append from Rot;

You can use PROC SGSCATTER to project these rotated points onto the coordinate planes. The TRANSPARENCY= option creates semi-transparent markers that give the illusion of a ghostly see-through torus. This can be an effective technique for visualizing any point cloud, since it enables you to see regions in which overplotting occurs. The statements also use the COLORRESPONSE= option to color the markers by the (original) Z variable. The COLORRESPONSE= option requires SAS 9.4m3.

data Coords;
merge Torus RotTorus;
title "Projections of a Rotated Torus";
proc sgscatter data=Coords;
matrix Px Py Pz / markerattrs=(size=12 symbol=CircleFilled)
       colorresponse=Z  colormodel=ThreeColorRamp; /* COLORRESPONSE= requires 9.4m3 */
Projections of rotated torus onto coordinate planes

The transparent markers serve a second function. The torus is composed of a sequence of circles. Therefore, the last circles (near θ = 2π) will obscure the first circles (near θ = 0) if opaque markers are used. The parametric construction is very apparent if you remove the TRANSPARENCY= option.

If you want to plot without transparency, you should sort the data, which is a standard technique in the graphics community where it is called z-ordering or depth sorting. For example, in the (Px,Py) plane you can sort by the Pz variable so that negative Pz values are plotted first and positive Pz values are plotted on top of the earlier markers. Unfortunately, you can only use this sorting trick to plot one pair of coordinates at a time. The following code demonstrates this trick for the (Px, Py) plane:

proc sort data=Coords out=SortCoords;
   by Pz;
title "Projection of a Rotated Torus";
title2 "Sorted by Distance Above Coordinate Plane";
proc sgplot data=SortCoords;
   scatter x=Px y=Py / markerattrs=(size=12 symbol=CircleFilled)
           colorresponse=Z  colormodel=ThreeColorRamp;
Projection of torus onto coordinate plane

In conclusion, PROC SGSCATTER enables you to visualize high-dimensional data via projections onto coordinate planes. I demonstrated the techniques on a 3-D torus, but the techniques apply generally to any high-dimensional data. Visualization tricks in this article include:

  • If the projections of the data onto coordinate planes are degenerate, rotate the points.
  • Use semi-transparency to reduce the effect of overplotting.
  • Use the COLORRESPONSE= option to color the markers by one of the variables. This requires SAS 9.4m4.
  • If you do not want to use transparency, sort the data in a direction perpendicular to the projected plane. That makes the rendering look more realistic.

Your high-dimensional point clouds might not look as cool as this torus, but if you use these visualization tips, the data will be easier to interpret and understand. The SGSCATTER procedure in SAS is an easy-to-use routine for investigating relationships among several variables.

tags: Math, SAS/IML Studio, Statistical Graphics

The post Visualize a torus in SAS appeared first on The DO Loop.

7月 052016

Last week I blogged about how to draw the Cantor function in SAS. The Cantor function is used in mathematics as a pathological example of a function that is constant almost everywhere yet somehow manages to "climb upwards," thus earning the nickname "the devil's staircase."

The Cantor function has three properties: (1) it is continuous, (2) it is non-decreasing, and (3) F(0)=0 and F(1)=1. There is a theorem that says that any function with these properties is the cumulative distribution function (CDF) for some random variable. In theory, therefore, you can generate a large number of random variates (a large sample), then use PROC UNIVARIATE to plot the empirical CDF. The empirical CDF should be the devil's staircase function.

A random sample from the Cantor set

My last article showed that every point in the Cantor set can be written as the infinite sum x = Σi ai3-i where the ai equal 0 or 2. You can approximate the Cantor set by truncating the series at a certain number of terms. You can generate random points from the (approximate) Cantor set by choosing the coefficients randomly from the Bernoulli distribution with probability 0.5. The following DATA step generates 10,000 points from the Cantor set by using random values for the first 20 coefficients:

data Cantor;
call streaminit(123456);
k = 20;             /* maximum number of coefficients */
n = 10000;          /* sample size to generate */
do i = 1 to n;      /* generate n random points in Cantor set */
   x = 0;
   do j = 1 to k;   /* each point is sum of powers of 1/3 */
      b = 2 * rand("Bernoulli", 0.5);  /* random value 0 or 2 */
      x + b * 3**(-j);                 /* power of 1/3 */
keep x;
proc univariate data=Cantor;
   cdf x;

The call to PROC UNIVARIATE displays the empirical CDF for the random sample of points. And voilà! The Cantor distribution function appears! As a bonus, the output from PROC UNIVARIATE (not shown) indicates that the mean of the distribution is 1/2 (which is obvious by symmetry) and the variance is 1/8, which you might not have expected.

This short SAS program does two things: it shows how to simulate a random sample uniformly from the (approximate) Cantor set, and it indicates that the devil's staircase function is the distribution function for this a uniform random variable.

Did you know that the Cantor staircase function is a cumulative distribution? Simulate it in #SAS!
Click To Tweet

A random sample in SAS/IML

Alternatively, you can generate the same random sample by using SAS/IML and matrix computations. My previous article drew the Cantor function by systematically using all combinations of 0s and 2s to construct the elements of the Cantor set. However, you can use the same matrix-based approach but generate random coefficients, therefore obtaining a random sample from the Cantor set:

proc iml;
k = 20;          /* maximum number of coefficients */
n = 10##4;       /* sample size to generate */
B = 2 * randfun(n||k, "Bernoulli", 0.5);  /* random n x k matrix of 0/2 */
v = 3##(-(1:k)); /* powers of 1/3 */
x = B * v`;      /* sum of random terms that are either 0*power or 2*power */

Recall that ECDF is a step function that plots the ith ordered datum at the height i/n. You can approximate the empirical CDF in SAS/IML by using the SERIES subroutine. Technically, the lines that appear in the line plot have a nonzero slope, but the approximation looks very similar to the PROC UNIVARIATE output:

call sort(x, 1);             /* order elements in the Cantor set */
x = 0 // x // 1;             /* append end points */
y = do(0, 1, 1/(nrow(x)-1)); /* empirical CDF */
title "Empirical Distribution of a Uniform Sample from the Cantor Set";
call series(x, y);

Interpretation in probability

It is interesting that you can actually define the Cantor set in terms of rolling a fair six-sided die. Suppose that you roll a die infinitely often and we adopt the following rules:

  • If the die shows a 1 or 2, you get zero points.
  • If the die shows a 3 or 4, you get one point.
  • If the die shows a 5 or 6, you get two points.

This game is closely related to the Cantor set. Recall that the Cantor set can be written as the set of all base-3 decimal values in [0,1] for which the decimal expansion does not contain a 1. In this game, each point in the Cantor set corresponds to a sequence of rolls that never contain a 3 or 4. (Equivalently, the score is always even.) Obviously this would never happen in real life, which is an intuitive way of saying that the Cantor set has "measure zero."


The first time I saw the Cantor function depicted as an empirical distribution function was when I saw a very compact MATLAB formula like this:

stairs([min(x) sort(x)],0:1/length(x):1) % Plot the c.d.f of x

The previous formula is equivalent to my SAS/IML program, but in my program I broke the formula apart so that the components could be understood more easily. This formula appears in a set of probability demos by Peter Doyle. I had the privilege to interact with Professor Doyle at The Geometry Center (U. MN) in the early 1990s, so perhaps he was responsible for showing me the Cantor distribution.

These UNC course notes from Jan Hannig also discuss the Cantor distribution, but the original source is not cited.

tags: Just for Fun, Math

The post Cantor sets, the devil's staircase, and probability appeared first on The DO Loop.

6月 032016

In a world filled with jargon, it’s refreshing to hear from a subject-matter expert who can communicate in a direct and uncomplicated fashion – so that even a layperson would understand. You could say this is Randall Munroe’s mission. Munroe is masterful at using math, science and comics to make […]

The post Complicated stuff in simple words from Randall Munroe appeared first on JMP Blog.

1月 252016

A moving average (also called a rolling average) is a satistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling average of previous values. For example, the following graph shows the monthly closing price of IBM stock over a 20-year period. Three kinds of moving averages are overlaid on a scatter plot of the data.

Moving average of stock price

The IBM stock price increased in some time periods and decreased in others. The moving-average curves help to visualize these trends and identify these time periods. For a simple moving average, the smoothness of a curve is determined by the number of time points, k, that is used to compute the moving average. Small values of k result in curves that reflect the short-term ups and downs of the data; large values of k undulate less. For stock charts that show daily prices, the 30-day moving average and the 5-day moving average are popular choices.

How do you define a moving average?

The most common moving averages are the simple moving average (MA), the weighted moving average (WMA), and the exponentially weighted moving average (EWMA). The following list provides a brief description and mathematical formula for these kinds of moving averages. See the Wikipedia article on moving averages for additional details.

Let {y0, y1, ..., yt, ...} be the time series that you want to smooth, where yt is the value of the response at time t.

  • The simple moving average at time t is the arithmetic mean of the series at yt and the previous k-1 time points. In symbols,
          MA(t; k) = (1/k) Σ yi
    where the summation is over the k values {yt-k+1, ..., yt}.
  • The weighted moving average (WMA) at time t is a weighted average of the series at yt and the previous k-1 time points. Typically the weights monotonically decrease so that data from "long ago" contribute less to the average than recent data. If the weights sum to unity (Σ wi = 1) then
          WMA(t; k) = Σ wi yi
    If the weights do not sum to unity, then divide that expression by Σ wi.
  • The exponentially weighted moving average (EWMA) does not use a finite rolling window. Instead of the parameter k, the EWMA uses a decay parameter α, where 0 < α < 1. The smoothed value at time t is defined recursively as
          EWMA(t; α) = α yt + (1 - α) EWMA(t-1; α)
    You can "unwind" this equation to obtain the EWMA as a WMA where the weights decrease geometrically. The choice of α determines the smoothness of the EWMA. A value of α ≈ 1 implies that older data contribute very little to the average. Conversely, small values of α imply that older data contribute to the moving average almost as much as newer data.

Each of these definitions contains an ambiguity for the first few values of the moving average. For example, if t < k, then there are fewer than k previous values in the MA and WMA methods. Some practitioners assign missing values to the first k-1 values, whereas others average the values even when fewer than k previous data points exist. For the EWMA, the recursive definition requires a value for EWMA(0; α), which is often chosen to be y0.

My next blog post shows how to compute various moving averages in SAS. The article shows how to create the IBM stock price example, which is a time series plot overlaid with MA, WMA, and EWMA curves.

tags: Data Analysis, Getting Started, Math

The post What is a moving average? appeared first on The DO Loop.