Math

11月 272019
 
Visualization of a quadratic function and a linear subspace

This article discusses how to restrict a multivariate function to a linear subspace. This is a useful technique in many situations, including visualizing an objective function that is constrained by linear equalities. For example, the graph to the right is from a previous article about how to evaluate quadratic polynomials. The graph shows a heat map for a quadratic polynomial of two variables. The diagonal line represents a linear constraint between the X and Y variables. If you restrict the polynomial to the diagonal line, you obtain a one-dimensional function that you can easily graph and visualize. By repeating this process for other lines, you can construct a "stack" of lower-dimensional slices that enable you to understand the graph of the original bivariate function.

This example generalizes. For a function of many variables, you can restrict the function to a lower-dimensional subspace. By visualizing the restricted function, you can understand the high-dimensional function better. I previously demonstrated this technique for multidimensional regression models by using the SLICEFIT option in the EFFECTPLOT statement in SAS. This article shows how to use ideas from vector calculus to restrict a function to a parameterized linear subspace.

Visualize high-dimensional data and functions

There are basically two techniques for visualizing high-dimensional objects: projection and slicing. Many methods in multivariate statistics compute a linear subspace such that the projection of the data onto the subspace has a desirable property. One example is principal component analysis, which endeavors to find a low dimensional subspace that captures most of the variation in the data. Another example is linear discriminant analysis, which aims to find a linear subspace that maximally separates groups in the data. In both cases, the data are projected onto the computed subspaces to reveal characteristics of the data. There are also statistical methods that attempt to find nonlinear subspaces.

Whereas projection is used for data, slicing is often used to visualize the graphs of functions. In regression, you model a response variable as a function of the explanatory variables. Slicing the function along a linear subspace of the domain can reveal important characteristics about the function. That is the method used by the SLICEFIT option in the EFFECTPLOT statement, which enables you to visualize multidimensional regression models. The most common "slice" for a regression model is to specify constant values for all but one continuous variable in the model.

When visualizing an objective function that is constrained by a set of linear equalities, the idea is similar. The main difference is that the "slice" is usually determined by a general linear combination of variables.

Restrict a function to a one-dimensional subspace

A common exercise in multivariate calculus is to restrict a bivariate function to a line. (This is equivalent to a "slice" of the bivariate function.) Suppose that you choose a point, x0, and a unit vector, u, which represents the "direction". Then a parametric line through x0 in the direction of u is given by the vector expression v(t) = x0 + t u, where t is any real number. If you restrict a multivariate function to the image of v, you obtain a one-dimensional function.

For example, consider the quadratic polynomial in two variables:
f(x,y) = (9*x##2 + x#y + 4*y##2) - 12*x - 4*y + 6
The function is visualized by the heat map at the top of this article. Suppose x0 = {0, -1} and choose u to be the unit vector in the direction of the vector {3, 1}. (This choice corresponds to a linear constraint of the form x – 3y = 3.)

The following SAS/IML program constructs the parameterized line v(t) for a uniform set of t values. The quadratic function is evaluated at this set of values and the resulting one-dimensional function is graphed:

proc iml;
/* Evaluate the quadratic function at each column of X and return a row vector. */
start Func(XY);
   x = XY[1,]; y = XY[2,];
   return (9*x##2  + x#y + 4*y##2) - 12*x - 4*y + 6;
finish;
 
x0 = {0, -1};          /* evaluate polynomial at this point */
d  = {3, 1};           /* vector that determines direction */
u  = d / norm(d);      /* unit vector (direction) */
 
t = do(-2, 2, 0.1);    /* parameter values */
v = x0 + t @ u;        /* evenly spaced points along the linear subspace */
f = Func(v);           /* evaluate the 2-D function along the 1-D subspace */
title "Quadratic Form Evaluated on Linear Subspace";
call series(t, f) grid={x y};
Quadratic function restricted to linear subspace

The graph shows the restriction of the 2-D function to the 1-D subspace. According to the graph, the restricted function reaches a minimum value for t ≈ 0.92. The corresponding (x, y) values and the corresponding function value are shown below:

tMin = 0.92;           /* t* = parameter near the minimum */
vMin = x0 + tMin * u;  /* corresponding (x(t*), y(t*)) */
fMin = Func(vMin);     /* f(x(t*), y(t*)) */
print tMin vMin fMin;

If you look back to the heat map at the top of this article, you will see that the value (x,y) = (0.87,-0.71) correspond to the location for which the function achieves a minimum value when restricted to the linear subspace. The value of the function at that (x,y) value is approximately 6.6.

The SAS/IML program uses the Kronecker direct-product operator (@) to create points along the line. The operation is v = x0 + t @ u. The symbols x0 and u are both column vectors with two elements. The Kronecker product creates a 2 x k matrix, where k is the number of elements in the row vector t. The Kronecker product enables you to vectorize the program instead of looping over the elements of t and forming the individual vectors x0 + t[i]*u.

You can generalize this example to evaluate a multivariate function on a two-dimensional subspace. If u1 and u2 are two orthogonal, p-dimensional, unit vectors, the expression x0 + s*u1 + t*u2 spans a two-dimensional subspace of Rp. You can use the ExpandGrid function in SAS/IML to generate ordered pairs (s, t) on a two-dimensional grid.

Summary

In summary, you can slice a multivariate function along a linear subspace. Usually, 1-D or 2-D subspaces are used and the resulting (restricted) function is graphed. This helps you to understand the function. You can choose a series of slices (often parallel to each other) and "stack" the slices in order to visualize the multivariate function. Typically, this technique works best for functions that have three or four continuous variables.

You can download the SAS program that creates the graphs in this article.

The post Evaluate a function on a linear subspace appeared first on The DO Loop.

9月 302019
 

There are several different kinds of means. They all try to find an average value from among a set of numbers. Although the most popular mean is the arithmetic mean, the geometric mean can be useful for problems in statistics, finance, and biology. A common application of the geometric mean is to find an average growth rate for an asset or for a population.

What is the geometric mean?

The geometric mean of n nonnegative numbers is the n_th root of the product of the numbers:
GM = (x1 * x2 * ... * xn)1/n = (Π xi)1/n
When the numbers are all positive, the geometric mean is equivalent to computing the arithmetic mean of the log-transformed data and then using the exponential function to back-transform the result: GM = exp( (1/n) Σ log(xi) ).

Physical interpretation of the geometric mean

The geometric mean has a useful interpretation in terms of the volume of an n-dimensional rectangular solid. If GM is the geometric mean of n positive numbers, then GM is the length of the side of an n-dimensional cube that has the same volume as the rectangular solid with side lengths x1, x2, ..., xn. For example, a rectangular solid with sides 1.5, 2, and 3 has a volume V = 9. The geometric mean of those three numbers are ((1.5)(2)(3))1/3 ≈ 2.08. The volume of a cube with sides of length 2.08 is 9.

This interpretation in terms of volume is analogous to interpreting the arithmetic mean in terms of length. Namely, if you have n line segments of lengths x1, x2, ..., xn, then the total length of the segments is the same as the length of n copies of a segment of length AM, where AM is the arithmetic mean.

What is the geometric mean good for?

The geometric mean can be used to estimate the "center" of any set of positive numbers but is frequently used to estimate an average value in problems that deal with growth rates or ratios. For example, the geometric mean is an average growth rate for an asset or for a population. The following example uses the language of finance, although you can replace "initial investment" by "initial population" if you are interested in population growth

In precalculus, growth rates are introduced in terms of the growth of an initial investment amount (the principle) compounded yearly at a fixed interest rate, r. After n years, the principle (P) is worth
An = P(1 + r)n
The quantity 1 + r sometimes confuses students. It appears because when you add the principle (P) and the interest (Pr), you get P(1 + r).

In precalculus, the interest rate is assumed to be a constant, which is fine for fixed-rate investments like bank CDs. However, many investments have a growth rate that is not fixed but varies from year to year. If the growth rate of the investment is r1 during the first year, r2 during the second year, and rn during the n_th year, then after n years the investment is worth
An = P(1 + r1)(1 + r2)...(1 + rn)
  = P (Π xi), where xi = 1 - ri.

What is the average growth rate for the investment? One interpretation of the average growth rate is the fixed rate that would give the same return after n years. That hypothetical fixed-rate growth is found by using the geometric mean of the values x1, x2, ..., xn. That is, if GM is the geometric mean of the xi, the value
An = P*GMn,
which assumes a fixed interest rate, is exactly the same as for the varying-rate computation.

An example of the geometric mean: The growth rate of gold

Let's apply the ideas in the preceding section. Suppose that you bought $1000 of gold on Jan 1, 2010. The following table gives the yearly rate of return for gold during the years 2010–2018, along with the value of the $1000 investment at the end of each year.

According to the table, the value of the investment after 9 years is $1160.91, which represents a total return of about 16%. What is the fixed-rate that would give the same return after 9 years when compounded annually? That is found by computing the geometric mean of the numbers in the third column:
GM = (1.2774 * 1.1165 * ... * 0.9885)1/9 = 1.01672
In other words, the investment in gold yielded the same return as a fixed-rate bank CD at 1.672% that is compounded yearly for 9 years. The end-of-year values for both investments are shown in the following graph.

The geometric mean in statistics

The geometric mean arises naturally in situations in which quantities are multiplied together. This happens so often that there is a probability distribution, called the lognormal distribution, that models this situation. if Z has a normal distribution, then you can obtain a lognormal distribution by applying the exponential transformation: X = exp(Z) is lognormal. The Wikipedia article for the lognormal distribution states, "The lognormal distribution is important in the description of natural phenomena... because many natural growth processes are driven by the accumulation of many small percentage changes." For lognormal data, the geometric mean is often more useful than the arithmetic mean. In my next blog post, I will show how to compute the geometric mean and other associated statistics in SAS.

The post What is a geometric mean? appeared first on The DO Loop.

8月 122019
 
What is this math good for, anyway?
     –Every student, everywhere

I am a professional applied mathematician, yet many of the mathematical and statistical techniques that I use every day are not from advanced university courses but are based on simple ideas taught in high school or even in grade school. I've written many blog posts in which the solution to an interesting problem requires little more than high-school math. Even when the solution requires advanced techniques, high-school math often provides the basis for solving the problem.

In celebration of the upcoming school year, here are 12 articles that show connections between advanced topics and mathematical ideas that are introduced in high-school or earlier. If you have a school-age child, read some of the articles to prepare yourself for the inevitable mid-year whine, "But, Mom/Dad, when will I ever use this stuff?!"

Grade-school math

Obviously, most adults use basic arithmetic, fractions, decimals, and percents, but here are some less obvious uses of grade-school math:

High-school math

Algebra, linear transformations, geometry, and trigonometry are the main topics in high school mathematics. These topics are the bread-and-butter of applied mathematics:

  • Linear transformations: Anytime you create a graph on the computer screen, a linear transformation transforms the data from physical units (weight, cost, time) into pixel values. Although modern graphical software performs that linear transformation for you, a situation in which you have to manually apply a linear transformation is when you want to display data in two units (pounds and kilograms, dollars and Euros, etc). You can use a simple linear transformation to align the tick labels on the two axes.
  • Intersections: In high school, you learn to compute the intersection between two lines. You can extend the problem to find the intersection of two line segments. I needed that result to solve a problem in probability theory.
  • Solve a system of equations: In Algebra II, you learn to solve a system of equations. Solving linear systems is the foundation of linear algebra. Solving nonlinear systems is among the most important skills in applied mathematics.
  • Find the roots of a nonlinear equation: Numerically finding the roots of a function is taught in pre-calculus. It is the basis for all "inverse problems" in which you want to find inputs to a function that produce a specified output. In statistics, a common "inverse problem" is finding the quantile of a cumulative probability distribution.
  • Binomial coefficients: In algebra, many teachers use the mnemonic FOIL (First, Outer, Inner, Last) to teach students how to compute the quadratic expansion of (a + b)2. Later, students learn the binomial expansion of an arbitrary power, (a + b)n. The coefficients in this expansion are called the binomial coefficients and appear in Pascal's triangle as well as in many discrete probability distributions such as the negative binomial and hypergeometric distributions.
  • Pythagorean triples: In trigonometry, a huge number of homework problems involve using right triangles with side lengths that are proportional to (3, 4, 5) and (5, 12, 13). These are two examples of Pythagorean triples: right triangles whose side lengths are all integers. It turns out that you can use linear transformations to generate all primitive triples from the single triple (3, 4, 5). A high school student can understand this process, although the process is most naturally expressed in terms of matrix multiplication, which is not always taught in high school.

High-school statistics

Many high schools offer a unit on probability and statistics, and some students take AP Statistics.

Einstein famously said, "everything should be made as simple as possible, but not simpler." It is surprising to me how often an advanced technique can be simplified and explained by using elementary math. I don't claim that "everything I needed to know about math I learned in kindergarten," but I often return to elementary techniques when I describe how to solve non-elementary problems.

What about you? What are some elementary math or statistics concepts that you use regularly in your professional life? Are there fundamental topics that you learned in high school that are deeper and more useful than you realized at the time? Leave a comment.

The post The math you learned in school: Yes, it’s useful! appeared first on The DO Loop.

8月 122019
 
What is this math good for, anyway?
     –Every student, everywhere

I am a professional applied mathematician, yet many of the mathematical and statistical techniques that I use every day are not from advanced university courses but are based on simple ideas taught in high school or even in grade school. I've written many blog posts in which the solution to an interesting problem requires little more than high-school math. Even when the solution requires advanced techniques, high-school math often provides the basis for solving the problem.

In celebration of the upcoming school year, here are 12 articles that show connections between advanced topics and mathematical ideas that are introduced in high-school or earlier. If you have a school-age child, read some of the articles to prepare yourself for the inevitable mid-year whine, "But, Mom/Dad, when will I ever use this stuff?!"

Grade-school math

Obviously, most adults use basic arithmetic, fractions, decimals, and percents, but here are some less obvious uses of grade-school math:

High-school math

Algebra, linear transformations, geometry, and trigonometry are the main topics in high school mathematics. These topics are the bread-and-butter of applied mathematics:

  • Linear transformations: Anytime you create a graph on the computer screen, a linear transformation transforms the data from physical units (weight, cost, time) into pixel values. Although modern graphical software performs that linear transformation for you, a situation in which you have to manually apply a linear transformation is when you want to display data in two units (pounds and kilograms, dollars and Euros, etc). You can use a simple linear transformation to align the tick labels on the two axes.
  • Intersections: In high school, you learn to compute the intersection between two lines. You can extend the problem to find the intersection of two line segments. I needed that result to solve a problem in probability theory.
  • Solve a system of equations: In Algebra II, you learn to solve a system of equations. Solving linear systems is the foundation of linear algebra. Solving nonlinear systems is among the most important skills in applied mathematics.
  • Find the roots of a nonlinear equation: Numerically finding the roots of a function is taught in pre-calculus. It is the basis for all "inverse problems" in which you want to find inputs to a function that produce a specified output. In statistics, a common "inverse problem" is finding the quantile of a cumulative probability distribution.
  • Binomial coefficients: In algebra, many teachers use the mnemonic FOIL (First, Outer, Inner, Last) to teach students how to compute the quadratic expansion of (a + b)2. Later, students learn the binomial expansion of an arbitrary power, (a + b)n. The coefficients in this expansion are called the binomial coefficients and appear in Pascal's triangle as well as in many discrete probability distributions such as the negative binomial and hypergeometric distributions.
  • Pythagorean triples: In trigonometry, a huge number of homework problems involve using right triangles with side lengths that are proportional to (3, 4, 5) and (5, 12, 13). These are two examples of Pythagorean triples: right triangles whose side lengths are all integers. It turns out that you can use linear transformations to generate all primitive triples from the single triple (3, 4, 5). A high school student can understand this process, although the process is most naturally expressed in terms of matrix multiplication, which is not always taught in high school.

High-school statistics

Many high schools offer a unit on probability and statistics, and some students take AP Statistics.

Einstein famously said, "everything should be made as simple as possible, but not simpler." It is surprising to me how often an advanced technique can be simplified and explained by using elementary math. I don't claim that "everything I needed to know about math I learned in kindergarten," but I often return to elementary techniques when I describe how to solve non-elementary problems.

What about you? What are some elementary math or statistics concepts that you use regularly in your professional life? Are there fundamental topics that you learned in high school that are deeper and more useful than you realized at the time? Leave a comment.

The post The math you learned in school: Yes, it’s useful! appeared first on The DO Loop.

5月 222019
 

The eigenvalues of a matrix are not easy to compute. It is remarkable, therefore, that with relatively simple mental arithmetic, you can obtain bounds for the eigenvalues of a matrix of any size. The bounds are provided by using a marvelous mathematical result known as Gershgorin's Disc Theorem. For certain matrices, you can use Gershgorin's theorem to quickly determine that the matrix is nonsingular or positive definite.

The Gershgorin Disc Theorem appears in Golub and van Loan (p. 357, 4th Ed; p. 320, 3rd Ed), where it is called the Gershgorin Circle Theorem. The theorem states that the eigenvalues of any N x N matrix, A, are contained in the union of N discs in the complex plane. The center of the i_th disc is the i_th diagonal element of A. The radius of the i_th disc is the absolute values of the off-diagonal elements in the i_th row. In symbols,
Di = {z ∈ C | |z - Ai i| ≤ ri }
where ri = Σi ≠ j |Ai j|. Although the theorem holds for matrices with complex values, this article only uses real-valued matrices.

An example of Gershgorin discs is shown to the right. The discs are shown for the following 4 x 4 symmetric matrix:

At first glance, it seems inconceivable that we can know anything about the eigenvalues without actually computing them. However, two mathematical theorems tells us quite a lot about the eigenvalues of this matrix, just by inspection. First, because the matrix is real and symmetric, the Spectral Theorem tells us that all eigenvalues are real. Second, the Gershgorin Disc Theorem says that the four eigenvalues are contained in the union of the following discs:

  • The first row produces a disc centered at x = 200. The disc has radius |30| + |-15| + |5| = 50.
  • The second row produces a disc centered at x = 100 with radius |30| + |5| + |5| = 40.
  • The third row produces a disc centered at x = 55 with radius |-15| + |5| + |0| = 20.
  • The last row produces a disc centered at x = 25 with radius |5| + |5| + |0| = 10.

Although the eigenvalues for this matrix are real, the Gershgorin discs are in the complex plane. The discs are visualized in the graph at the top of this article. The true eigenvalues of the matrix are shown inside the discs.

For this example, each disc contains an eigenvalue, but that is not true in general. (For example, the matrix A = {1 −1, 2 −1} does not have any eigenvalues in the disc centered at x=1.) What is true, however, is that disjoint unions of discs must contain as many eigenvalues as the number of discs in each disjoint region. For this matrix, the discs centered at x=25 and x=200 are disjoint. Therefore they each contain an eigenvalue. The union of the other two discs must contain two eigenvalues, but, in general, the eigenvalues can be anywhere in the union of the discs.

The visualization shows that the eigenvalues for this matrix are all positive. That means that the matrix is not only symmetric but also positive definite. You can predict that fact from the Gershgorin discs because no disc intersects the negative X axis.

Of course, you don't have to perform the disc calculations in your head. You can write a program that computes the centers and radii of the Gershgorin discs, as shown by the following SAS/IML program, which also computes the eigenvalues for the matrix:

proc iml;
A = { 200  30 -15  5,
       30 100   5  5,
      -15   5  55  0, 
        5   5   0 15};
 
evals = eigval(A);                 /* compute the eigenvalues */
center = vecdiag(A);               /* centers = diagonal elements */
radius = abs(A)[,+] - abs(center); /* sum of abs values of off-diagonal elements of each row */
discs = center || radius || round(evals,0.01);
print discs[c={"Center" "Radius" "Eigenvalue"} L="Gershgorin Discs"];

Diagonally dominant matrices

For this example, the matrix is strictly diagonally dominant. A strictly diagonally dominant matrix is one for which the magnitude of each diagonal element exceeds the sum of the magnitudes of the other elements in the row. In symbols, |Ai i| > Σi ≠ j |Ai j| for each i. Geometrically, this means that no Gershgorin disc intersects the origin, which implies that the matrix is nonsingular. So, by inspection, you can determine that his matrix is nonsingular.

Gershgorin discs for correlation matrices

The Gershgorin theorem is most useful when the diagonal elements are distinct. For repeated diagonal elements, it might not tell you much about the location of the eigenvalues. For example, all diagonal elements for a correlation matrix are 1. Consequently, all Gershgorin discs are centered at (1, 0) in the complex plane. The following graph shows the Gershgorin discs and the eigenvalues for a 10 x 10 correlation matrix. The eigenvalues of any 10 x 10 correlation matrix must be real and in the interval [0, 10], so the only new information from the Gershgorin discs is a smaller upper bound on the maximum eigenvalue.

Gershgorin discs for unsymmetric matrices

Gershgorin's theorem can be useful for unsymmetric matrices, which can have complex eigenvalues. The SAS/IML documentation contains the following 8 x 8 block-diagonal matrix, which has two pairs of complex eigenvalues:

A = {-1  2  0       0       0       0       0  0,
     -2 -1  0       0       0       0       0  0,
      0  0  0.2379  0.5145  0.1201  0.1275  0  0,
      0  0  0.1943  0.4954  0.1230  0.1873  0  0,
      0  0  0.1827  0.4955  0.1350  0.1868  0  0,
      0  0  0.1084  0.4218  0.1045  0.3653  0  0,
      0  0  0       0       0       0       2  2,
      0  0  0       0       0       0      -2  0 };

The matrix has four smaller Gershgorin discs and three larger discs (radius 2) that are centered at (-1,0), (2,0), and (0,0), respectively. The discs and the actual eigenvalues of this matrix are shown in the following graph. Not only does the Gershgorin theorem bound the magnitude of the real part of the eigenvalues, but it is clear that the imaginary part cannot exceed 2. In fact, this matrix has eigenvalues -1 ± 2 i, which are on the boundary of one of the discs, which shows that the Gershgorin bound is tight.

Conclusions

In summary, the Gershgorin Disc Theorem provides a way to visualize the possible location of eigenvalues in the complex plane. You can use the theorem to provide bounds for the largest and smallest eigenvalues.

I was never taught this theorem in school. I learned it from a talented mathematical friend at SAS. I use this theorem to create examples of matrices that have particular properties, which can be very useful for developing and testing software.

This theorem also helped me to understand the geometry behind "ridging", which is a statistical technique in which positive multiples of the identity are added to a nearly singular X`X matrix. The Gershgorin Disc Theorem shows the effect of ridging a matrix is to translate all of the Gershgorin discs to the right, which moves the eigenvalues away from zero while preserving their relative positions.

You can download the SAS program that I used to create the images in this article.

Further reading

There are several papers on the internet about Gershgorin discs. It is a favorite topic for advanced undergraduate projects in mathematics.

The post Gershgorin discs and the location of eigenvalues appeared first on The DO Loop.

9月 242018
 

Last week I compared the overhand shuffle to the riffle shuffle. I used random operations to simulate both kinds of shuffles and then compared how well they mix cards. The article caused one my colleague and fellow blogger, Rob Pratt, to ask if I was familiar with a bit of shuffling trivia: if you perform a perfect riffle shuffle, the cards return to their original order after exactly eight perfect shuffles! This mathematical curiosity is illustrated in a beautiful animation by James Miles, which shows the results of eight perfect shuffles.

After being reminded of this interesting fact, I wondered how the result generalizes to decks of various sizes. That is, if you use a deck with N cards, what is the minimum number of perfect riffle shuffles (call it P(N)) that you need to restore the deck to its original order? I decided to run a SAS program to discover the answer. The result is summarized in the following graph, which plots P(N) versus the number of cards in a deck. All points are below the identity line, which implies that at most N – 1 shuffles are required for a deck that contains N cards. If you want to learn more about the graph and its interesting patterns, read on.

Number of perfect shuffles for deck of N cards to restore the original order

Coding the perfect riffle shuffle in SAS

The perfect riffle shuffle is easier to program than the Gilbert-Shannon-Reeds (GSR) model, which uses randomness to model how real people shuffle real cards. In a perfect riffle shuffle, you cut the deck exactly two equal parts. Then you interlace the cards from the top stack with the cards from the bottom stack. The new deck ordering is TBTBTB..., where T indicates a card from the top half of the original deck and B indicates a card from the bottom half.

There are actually two perfect riffle shuffles for an even number of cards: you can interlace the cards TBTBTB..., which is called an "out shuffle," or you can interlace the cards BTBTBT..., which is called an "in shuffle." For odd-numbered decks, there is also a choice of where to cut the deck: does the top part of the deck get the extra card or does the bottom part? My program always uses the "out shuffle" convention (the top card stays on top) and gives the top stack the extra card when there is an odd number of cards. Therefore, the shuffled deck looks like TBTB...TB for even-numbered decks and TBTB...TBT for odd-numbered decks. The following SAS/IML function takes a row vector that represents a card deck and performs a perfect riffle shuffle to reorder the cards in the deck:

proc iml;
start PerfectRiffle(deck);
   N = ncol(deck);                        /* send in deck as a row vector */
   shuffle = j(1,N,.);                    /* allocate new deck */
   nT = ceil(N/2);                        /* cut into two stacks; "Top" gets extra card when N odd */
   nB = N - nT;                           /* nT = size of Top; nB = size of Bottom */
   shuffle[, do(1,N,2)] = deck[,1:nT];   /* top cards go into odd positions */
   shuffle[, do(2,N,2)] = deck[,nT+1:N]; /* bottom cards go into even positions */
   return(shuffle);
finish;
 
/* test the algorithm on a deck that has N = 10 cards */
origDeck = 1:10;
deck = PerfectRiffle(origDeck);
print (origdeck//deck)[r={"Orig Deck" "Perfect Riffle"}];
Perfect shuffle on deck of N=10 cards

The output shows one perfect riffle of a deck that has 10 cards that are originally in the order 1, 2, ..., 10.

How many perfect riffle shuffles to restore order?

If you call the PerfectSuffle function repeatedly on the same deck, you can simulate perfect riffle shuffles. After each shuffle, you can test whether the order of the deck is in the original order. If so, you can stop shuffling. If not, you can shuffle again. The following SAS/IML loops test decks that contain up to 1,000 cards. The array nUntilRestore keeps track of how many perfect riffle shuffles were done for each deck size.

decksize = 1:1000;                       /* N = 1, 2, 3, ..., 1000 */
nUntilRestore = j(1, ncol(decksize), .); /* the results vector */
do k = 2 to ncol(decksize);              /* for the deck of size N */
   origDeck = 1:decksize[k];             /* original order is 1:N */
   deck = OrigDeck;                      /* set order of deck */
   origOrder = 0;                        /* flag variable */
   do nShuffles = 1 to decksize[k] until (origOrder); /* repeat shuffling */
      deck = PerfectRiffle( deck );      /* shuffle deck */
      origOrder = all(deck = origDeck);  /* until the order of the cards are restored */
   end;
   nUntilRestore[k] = nShuffles;         /* store the number of shuffles */
end;
 
/* print the results for N=1,..,99 */
x = j(10, 10, .);
x[2:100] = nUntilRestore[1:99];          /* present in 10x10 array */
rr = putn( do(0, 90, 10), "Z2.");        /* each row has 10 elements */
cc = ("0":"9");
print x[r=rr c=cc L="Number of Perfect Shuffles to Restore Order"];
 
title "Number of Perfect Shuffles to Restore Deck Order";
call scatter(deckSize, nUntilRestore) grid={x y} other="lineparm x=0 y=0 slope=1;"
     label={"N: Number of Cards in Deck" "P(N): Number of Perfect Shuffles to Restore Order"}
     procopt="noautolegend";
Number of perfect shuffles required to restore order in a deck of N cards, N-1..99

The table shows the number of perfect riffle shuffles required for decks that have fewer than 99 cards. The results are in a 10x10 grid. The first row shows decks that have less than 10 cards, the second row shows sizes 10-19, and so on. For example, to find the result for a 52-card deck, move down to the row labeled "50" and over to the column labeled "2". The result in that cell is 8, which is the number of perfect shuffles required to restore a 52-card deck to its original order.

Remarks on the number of perfect riffle shuffles

The graph at the top of this article shows the number of shuffles for decks that contain up to 1,000 cards. To me, the most surprising feature of the graph is the number of points that fall near diagonal lines of various rational slopes. The most apparent diagonal features are the lines whose slopes are approximately equal to 1, 1/2, and 1/3.

A complete mathematical description of this problem is beyond the scope of this blog article, but here are a few hints and links to whet your appetite.

  • The integer sequence P(N) is related to the "multiplicative order of 2 mod 2n+1" in the On-Line Encyclopedia of Integer Sequences (OEIS). The encyclopedia includes a graph that is similar to the one here, but is computed for N ≤ 10,000. That integer sequence applies to the number of perfect riffle shuffles of an EVEN number of cards.
  • The link to the OEIS shows a quick way to explicitly find P(N) for even values of N. P(N) is the smallest value of k for which 2k is congruent to 1 (mod N–1). For example, 8 is the smallest number for which 28 is congruent to 1 (mod 51) as you can compute explicitly: the value of mod(2**8, 51) is 1.
  • The points in the graph for which P(N) = N-1 are all prime numbers: 2, 3, 5, 11, 13, 19, 29, 37, 53, 59, 61, 67, .... However, notice that not all prime numbers are on this list.

There is much more to be said about this topic, but I'll stop here. If you have something to add, please leave a comment.

The post How many perfect riffle shuffles are required to restore a deck to its initial order? appeared first on The DO Loop.

9月 102018
 

Given a rectangular grid with unit spacing, what is the expected distance between two random vertices, where distance is measured in the L1 metric? (Here "random" means "uniformly at random.") I recently needed this answer for some small grids, such as the one to the right, which is a 7 x 6 grid. The graph shows that the L1 distance between the points (2,6) and (5,2) is 7, the length of the shortest path that connects the two points. The L1 metric is sometimes called the "city block" or "taxicab" metric because it measures the distance along the grid instead of "as the bird flies," which is the Euclidean distance.

The answer to the analogous question for the continuous case (a solid rectangle) is difficult to compute. The main result is that the expected distance is less than half of the diameter of the rectangle. In particular, among all rectangles of a given area, a square is the rectangle that minimizes the expected distance between random points.

Although I don't know a formula for the expected distance on a discrete regular grid, the grids in my application were fairly small so this article shows how to compute all pairwise distances and explicitly find the average (expected) value. The DISTANCE function in SAS/IML makes the computation simple because it supports the L1 metric. It is also simple to perform computer experiments to show that among all grids that have N*M vertices, the grid that is closest to being square minimizes the expected L1 distance.

L1 distance on a regular grid

An N x M grid contains NM vertices. Therefore the matrix of pairwise distances is NM x NM. Without loss of generality, assume that the vertices have X coordinates between 1 and N and Y coordinates between 1 and M. Then the following SAS/IML function defines the distance matrix for vertices on an N x M grid. To demonstrate the computation, the distance matrix for a 4 x 4 grid is shown, along with the average distance between the 16 vertices:

proc iml;
start DistMat(rows, cols);
   s = expandgrid(1:rows, 1:cols);   /* create (x, y) ordered pairs */
   return distance(s, "CityBlock");  /* pairwise L1 distance matrix */
finish;
 
/* test function on a 4 x 4 grid */
D = DistMat(4, 4);
AvgDist = mean(colvec(D));
print D[L="L1 Distance Matrix for 4x4 Grid"];
print AvgDist[L="Avg Distance on 4x4 Grid"];
L1 Distance matrix for vertices on a regular 4 x 4 grid Average L1 distance on a 4 x 4 grid

For an N x M grid, the L1 diameter of the grid is the L1 distance between opposite corners. That distance is always (N-1)+(M-1), which equates to 6 units for a 4 x 4 grid. As for the continuous case, the expected L1 distance is less than half the diameter. In this case, E(L1 distance) = 2.5.

A comparison of distances for grids of different aspect ratios

As indicated previously, the expected distance between two random vertices on a grid depends on the aspect ratio of the grid. A grid that is nearly square has a smaller expected distance than a short-and-wide grid that contains the same number of vertices. You can illustrate this fact by computing the distance matrices for grids that each contain 36 vertices. The following computation computes the distances for five grids: a 1 x 36 grid, a 2 x 18 grid, a 3 x 12 grid, a 4 x 9 grid, and a 6 x 6 grid.

/* average L1 distance on 36 x 36 grid in several configurations */
N=36;
rows = {1, 2, 3, 4, 6};
cols = N / rows;
AvgDist = j(nrow(rows), 1, .);
do i = 1 to nrow(rows);
   D = DistMat(rows[i], cols[i]);
   AvgDist[i] = mean(colvec(D));
end;
/* show average distance as a decimal and as a fraction */
numer = AvgDist*108; AvgDistFract = char(numer) + " / 108"; 
print rows cols AvgDist AvgDistFract;

The table shows that short-and-wide tables have an average distance that is much greater than a nearly square grid that contains the same number of vertices. When the points are arranged in a 6 x 6 grid, the distance matrix naturally decomposes into a block matrix of 6 x 6 symmetric blocks, where each block corresponds to a row. When the points are arranged in a 3 x 12 grid, the distance matrix decomposes into a block matrix of 12 x 12 blocks The following heat maps visualize the patterns in the distance matrices:

D = DistMat(6, 6);
call heatmapcont(D) title="L1 Distance Between Vertices in a 6 x 6 Grid";
D = DistMat(3, 12);
call heatmapcont(D) title="L1 Distance Between Vertices in a 3 x 12 Grid";
Visualization of L1 distance matrix for items arranged on a 6 x 6 grid Visualization of L1 distance matrix for items arranged on a 3 x 12 grid

Expected (average) distance versus number of rows

You can demonstrate how the average distance depends on the number of rows by choosing the number of vertices to be a highly composite number such as 5,040. A highly composite number has many factors. The following computation computes the average distance between points on 28 grids that each contain 5,040 vertices. A line chart then displays the average distance as a function of the number of rows in the grid:

N = 5040;               /* 5040 is a highly composite number */
rows = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 18, 20, 
       21, 24, 28, 30, 35, 36, 40, 42, 45, 48, 56, 60, 63, 70};
cols = N / rows;
AvgDist = j(nrow(rows), 1, .);
do i = 1 to nrow(rows);
   D = DistMat(rows[i], cols[i]);
   AvgDist[i] = mean(colvec(D));
end;
 
title "Average L1 Distance Between 5040 Objects in a Regular Grid";
call series(rows, AvgDist) grid={x y} other="yaxis offsetmin=0 grid;";
Expected L1 distance between vertices on a regular grid with 5,040 points. The curve demonstraes that nearly square grids have a much lower average distance than short and wide grids.

Each point on the graph represents the average distance for an N x M grid where NM = 5,040. The horizontal axis displays the value of N (rows). The graph shows that nearly square grids (rows greater than 60) have a much lower average distance than very short and wide grids (rows less than 10). The scale of the graph makes it seem like there is very little difference between the average distance in a grid with 40 versus 70 rows, but that is somewhat of an illusion. A 40 x 126 grid (aspect ratio = 3.15) has an average distance of 55.3; a 70 x 72 grid (aspect ratio = 1.03) has an average distance of 47.3.

Applications and conclusions

In summary, you can use the DISTANCE function in SAS/IML to explicitly compute the expected L1 distance (the "city block" distance) between random points on a regular grid. You can minimize the average pairwise distance by making the grid as square as possible.

City planning provides a real-world application of the L1 distance. If you are tasked with designing a city with N buildings along a grid, then the average distance between buildings is smallest when the grid is square. Of course, in practice, some buildings have a higher probability of being visited than others (such as a school or a grocery). You should position those buildings in the center of town to shorten the average distance that people travel.

The post Distances on rectangular grids appeared first on The DO Loop.

9月 062018
 

Continued fractions show up in surprising places. They are used in the numerical approximations of certain functions, including the evaluation of the normal cumulative distribution function (normal CDF) for large values of x (El-bolkiny, 1995, p. 75-77) and in approximating the Lambert W function, which has applications in the modeling of heavy-tailed error processes. This article shows how to use SAS to convert a rational number into a finite continued fraction expansion and vice versa.

What is a continued fraction?

I discussed continued fractions in a previous post about the contiunued fraction expansion of pi. Recall that every real number x can be represented as a continued fraction:

In this continued fraction, a0 is the integer portion of the number and the ai for i > 0 are positive integers that represent the noninteger portion. Because all of the fractions have 1 in the numerator, the continued fraction can be compactly represented by specifying only the integers: x = [a0; a1, a2, a3, ...]. Every rational number is represented by a finite sequence; every irrational number requires an infinite sequence. Notice that the decimal representation of a number does NOT have this nice property: 1/3 is rational but its decimal representation is infinite.

Convert a finite continued fraction to a rational number

My article about the continued fraction expansion of pi contains a few lines of SAS code that compute the decimal representation of a finite continued fraction. Since every finite continued fraction corresponds to a rational number, you can modify the algorithm to compute the rational number instead of a decimal approximation.

The algorithm is straightforward. Given a vector
a = [a0; a1, a2, ..., ak],
start at the right side, form the fraction sk = 1 / ak. Then move to the left and compute the fraction sk-1 = ak-1 + sk. Continue moving to the left until all fractions are added. In the SAS/IML language (or in the SAS DATA step), you can represent each fraction as a two-element vector where the first element is the numerator of the fraction and the second is the denominator of the fraction. This leads to the following algorithm that computes a rational number from a finite continued fraction expansion:

proc iml;
start ContinuedFrac(_a);
   a = colvec(_a);
   f = a[nrow(a)] // 1;         /* trick: start with reciprocal */
   do i = nrow(a)-1 to 1 by -1; /* evaluate from right to left */
      f = f[2:1];               /* take reciprocal */
      f[1] = a[i]*f[2] + f[1];  /* compute new numerator */
   end;
   return f;
finish;
 
a = {3, 4, 12, 4}; 
f = ContinuedFrac(a);      /* 649 / 200 */
pi = {3, 7,15,1,292,1,1,1,2,1,3,1};
f_pi = ContinuedFrac(pi);  /* 5419351 / 1725033 */
e = {2, 1,2,1,1,4,1,1,6,1,1,8};
f_e = ContinuedFrac(e);    /* 23225 / 8544 */
print (f||f_pi||f_e)[r={"Numer" "Denom"} c={"f" "f_pi" "f_e"}];
Convert (finite) continued fractions to rational numbers

The examples show a few continued fraction expansions and the rational numbers that they represent:

  • The continued fraction expansion (cfe) [3; 4, 12, 4] corresponds to the fraction 649 / 200.
  • The 12-term partial cfe for pi corresponds to the fraction 5419351 / 1725033. You might have learned in grade school that 22/7 is an approximation to pi. So to is this fraction.
  • The partial cfe for e (using 12 terms) corresponds to the fraction 23225 / 8544.

Rational numbers to continued fractions

You can program the inverse algorithm to produce the cfe from a rational number. The algorithm is called the Euclidean algorithm and consists of finding the integer part, subtracting the integer part from the original fraction to obtain a remainder (expressed as a fraction), and then repeating this process. The process terminates for every rational number. In the SAS/IML language, the Euclidean algorithm looks like this:

start EuclidAlgRational(val, nSteps=10);
   a = j(nSteps, 1, .);
   x = val;
   done = 0;
   do i = 1 to nSteps while(^done);
      a[i] = floor(x[1] / x[2]);
      x[1] = x[1] - a[i]*x[2];  /* remainder */
      if x[1]=0 then done=1;
      x = x[2:1];               /* reciprocal */
   end;
   idx = loc(a=.);
   if ncol(idx)>0 then a = colvec(remove(a, idx));
   return a;
finish;
 
v = {649, 200};          /* = [3,4,12,4] */
a = EuclidAlgRational(v);
v = {5419351, 1725033};  /* pi = [3; 7,15,1,292,1,1,1,2,1,3,1,...] */
a_pi = EuclidAlgRational(v);
v = {23225, 8544};       /* e = [2; 1,2,1,1,4,1,1,6,1,1,8,...] */
a_e = EuclidAlgRational(v);
print a a_pi a_e;
Convert rational numbers to (finite) continued fractions

Each column of the table shows the continued fraction representations of a rational number. The output shows that the ContinuedFrac and EuclidAlgRational functions are inverse operations: each function undoes the action of the other.

Two amazing properties of the continued fraction representation

There are some very interesting mathematical properties of continued fractions. Two statistical properties are discussed in Barrow (2000), "Chaos in Numberland: The secret life of continued fractions." Both properties are illustrated by the first 97 terms in the continued fraction expansion of pi, which are
π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1,
15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7,
1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2,
4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24, 1, 2, 1, 3, 1, 2, 1, ...]

The first property concerns the probability distribution of the ak for sufficiently large values of k. Notice that the continued fractions in this article mostly contain small integers such as 1, 2, 3, and 4. It turns out that this is generally true. For almost every irrational number in (0,1) and for k sufficiently large, the k_th term ak is more likely to be a small integer than a large integer. In fact, the Gauss-Kuzmin theorem states that the distribution of ak (thought of as a random variable) asymptotically approaches the probability mass function P(ak = t) = -log2(1 - 1/(1+t)2), which is shown below for t ≤ 20. Thus for a random irrational number, although the value of, say, the 1000th term in the continued fraction expansion could be any positive integer, it is probably less than 20!

The second interesting mathematical property is Khinchin's Theorem, which states that for almost every irrational number in (0,1), the geometric mean of the first t digits in the continued fraction expansion converges to a constant called Khinchin's constant, which has the value 2.685452001.... For example, the first 97 terms in the continued fraction expansion of pi have a geometric mean of 2.685252.

Gauss-Kuzmin distribution of the coefficients of a continued fraction expnsion

The post The continued fraction representation of a rational number appeared first on The DO Loop.

7月 162018
 

My colleague Robert Allison recently blogged about using the diameter of Texas as a unit of measurement. The largest distance across Texas is about 801 miles, so Robert wanted to find the set of all points such that the distance from the point to Texas is less than or equal to 801 miles.

Robert's implementation was complicated by the fact that he was interested in points on the round earth that are within 801 miles from Texas as measured along a geodesic. However, the idea of "thickening" or "inflating" a polygonal shape is related to a concept in computational geometry called the offset polygon or the inflated polygon. A general algorithm to inflate a polygon is complicated, but this article demonstrates the basic ideas that are involved. This article discusses offset regions for convex and nonconvex polygons in the plane. The article concludes by drawing a planar region for a Texas-shaped polygon that has been inflated by the diameter of the polygon. And, of course, I supply the SAS programs for all computations and images.

Offset regions as a union of circles and rectangles

Assume that a simple polygon is defined by listing its vertices in counterclockwise order. (Recall that a simple polygon is a closed, nonintersecting, shape that has no holes.) You can define the offset region of radius r as the union of the following shapes:

  • The polygon itself
  • Circles of radius r centered at each vertex
  • Rectangles of width r positions outside the polygon along each edge

The following graphic shows the offset region (r = 0.5) for a convex "house-shaped" polygon. The left side of the image shows the polygon with an overlay of circles centered at each vertex and outward-pointing rectangles along each edge. The right side of the graphic shows the union of the offset regions (blue) and the original polygon (red):

The image on the right shows why the process is sometimes called an "inflating" a polygon. For a convex polygon, the edges are pushed out by a distance r and the vertices become rounded. For large values of r, the offset region becomes a nearly circular blob, although the boundary is always the union of line segments and arcs of circles.

You can draw a similar image for a nonconvex polygon. The inflated region near a convex (left turning) vertex looks the same as before. However, for the nonconvex (right turning) vertices, the circles do not contribute to the offset region. Computing the offset region for a nonconvex polygon is tricky because if the distance r is greater than the minimum distance between vertices, nonlocal effects can occur. For example, the following graphic shows a nonconvex polygon that has two "prongs." Let r0 be the distance between the prongs. When you inflate the polygon by an amount r > r0/2, the offset region can contain a hole, as shown. Furthermore, the boundary of the offset regions is not a simple polygon. For larger values of r, the hole can disappear. This demonstrates why it is difficult to construct the boundary of an offset region for nonconvex polygons.

Inflating a Texas-shaped polygon

The shape of the Texas mainland is nonconvex. I used PROC GREDUCE on the MAPS.US data set in SAS to approximate the shape of Texas by a 36-sided polygon. The polygon is in a standardized coordinate system and has a diameter (maximum distance between vertices) of r = 0.2036. I then constructed the inflated region by using the same technique as shown above. The polygon and its inflated region are shown below.

The image on the left, which shows 36 circles and 36 rectangles, is almost indecipherable. However, the image on the right is almost an exact replica of the region that appears in Robert Allison's post. Remember, though, that the distances in Robert's article are geodesic distances on a sphere whereas these distances are Euclidean distances in the plane. For the planar problem, you can classify a point as within the offset region by testing whether it is inside the polygon itself, inside any of the 36 rectangles, or within a distance r of a vertex. That computation is relatively fast because it is linear in the number of vertices in the polygon.

I don't want to dwell on the computation, but I do want to mention that it requires fewer than 20 SAS/IML statements! The key part of the computation uses vector operations to construct the outward-facing normal vector of length r to each edge of the polygon. If v is the vector that connects the i_th and (i+1)_th vertex of the polygon, then the outward-facing normal vector is given by the concise vector expression r * (v / ||v||) * M, where M is a rotation matrix that rotates by 90 degrees. You can download the SAS program that computes all the images in this article.

In conclusion, you can use a SAS program to construct the offset region for an arbitrary simple polygon. The offset region is the union of circles, rectangles, and the original polygon, which means that it is easy to test whether an arbitrary point in the plane is in the offset region. That is, you can test whether any point is within a distance r to an arbitrary polygon.

The post Offset regions: Find all points within a specified distance from a polygon appeared first on The DO Loop.

7月 092018
 

Back in high school, you probably learned to find the intersection of two lines in the plane. The intersection requires solving a system of two linear equations. There are three cases: (1) the lines intersect in a unique point, (2) the lines are parallel and do not intersect, or (3) the lines are coincident. Thus, for two lines, the intersection problem has either 1, 0, or infinitely many solutions. Most students quickly learn that the lines always intersect when their slopes are different, whereas the special cases (parallel or coincident) occur when the lines have the same slope.

Recently I had to find the intersection between two line segments in the plane. Line segments have finite extent, so segments with different slopes may or may not intersect. For example, the following panel of graphs shows three pairs of line segments in the plane. In the first panel, the segments intersect. In the second panel, the segments have the same slopes as in the first panel, but these segments do not intersect. In the third panel, the segments intersect in an interval. This article shows how to construct a linear system of equations that distinguishes between the three cases and compute an intersection point, if it exists.

Parameterization of line segments

Let p1 and p2 be the endpoints of one segment and let q1 and q2 be the endpoints of the other. Recall that a parametrization of the first segment is (1-s)*p1 + s*p2, where s ∈ [0,1] and the endpoints are treated as 2-D vectors. Similarly, a parametrization of the second segment is (1-t)*q1 + t*q2, where t ∈ [0,1]. Consequently, the segments intersect if and only if there exists values for (s,t) in the unit square such that
(1-s)*p1 + s*p2 = (1-t)*q1 + t*q2
You can rearrange the terms to rewrite the equation as
(p2-p1)*s + (q1-q2)*t = q1 - p1

This is a vector equation which can be rewritten in terms of matrices and vectors. Define the 2 x 2 matrix A whose first column contains the elements of (p2-p1) and whose second column contains the elements of (q1-q2). Define b = q1 - p1 and x = (s,t). If the solution of the linear system A*x = b is in the unit square, then the segments intersect. If the solution is not in the unit square, the segments do not intersect. If the segments have the same slope, then the matrix A is singular and you need to perform additional tests to determine whether the segments intersect.

A vector solution for the intersection of segments

As shown above, the intersection of two planar line segments is neatly expressed in terms of a matrix-vector system. In SAS, the SAS/IML language provides a natural syntax for expressing and solving matrix-vector equations. The following SAS/IML function constructs and solves a linear system. For simplicity, this version does not handle the degenerate case of two segments that have the same slope. That case is handled in the next section.

start IntersectSegsSimple(p1, p2, q1, q2);
   b = colvec(q1 - p1); 
   A = colvec(p2-p1) || colvec(q1-q2); /* nonsingular when segments have different slopes */
   x = solve(A, b);                    /* x = (s,t) */
   if all(0<=x && x<=1) then           /* if x is in [0,1] x [0,1] */
      return (1-x[1])*p1 + x[1]*p2;    /* return intersection */
   else                                /* otherwise, segments do not intersect */
      return ({. .});                  /* return missing values */
finish;
 
/* Test 1: intersection at (0.95, 1.25) */
p1 = {1.8 2.1};   p2 = {0.8 1.1};
q1 = {1 1.25};    q2 = {0 1.25};
z = IntersectSegsSimple(p1,p2,q1,q2);
print z;
 
/* Test 2: no intersection */
p1 = {-1 0.5};  p2 = {1 0.5};
q1 = {0 1};     q2 = {0 2};
v = IntersectSegsSimple(p1, p2, q1, q2);
print v;

The function contains only a few statements. The function is called to solve the examples in the first two panels of the previous graph. The SOLVE function solves the linear system (assuming that a solution exists), and the IF-THEN statement tests whether the solution is in the unit square [0,1] x [0,1]. If so, the function returns the point of intersection. If not, the function returns a pair of missing values.

The general case: Handling overlapping line segments

For many applications, the function in the previous section is sufficient because it handles the generic cases. For completeness the following module also handles segments that have identical slopes. The DET function determines whether the segments have the same slope. If so, the segments could be parallel or collinear. To determine whether collinear segments intersect, you can test for three conditions:

  • The endpoint q1 is inside the segment [p1, p2].
  • The endpoint q2 is inside the segment [p1, p2].
  • The endpoint p1 is inside the segment [q1, q2].

Notice that the condition "p2 is inside [q1,q2]" does not need to be checked separately because it is already handled by the existing checks. If any of the three conditions are true, there are infinitely many solutions (or the segments share an endpoint). If none of the conditions hold, the segments do not intersect. For overlapping segments, the following function returns an endpoint of the intersection interval.

/* handle all cases: determine intersection of two planar line segments 
   [p1, p2] and [q1, q2] */
start Intersect2DSegs(p1, p2, q1, q2);
   b = colvec(q1 - p1); 
   A = colvec(p2-p1) || colvec(q1-q2);
   if det(A)^=0 then do;         /* nonsingular system: 0 or 1 intersection */
      x = solve(A, b);           /* x = (s,t) */
      if all(0<=x && x<=1) then  /* if x is in [0,1] x [0,1] */
         return (1-x[1])*p1 + x[1]*p2;  /* return intersection */
      else                       /* segments do not intersect */
         return ({. .});         /* return missing values */
   end;
   /* segments are collinear: 0 or infinitely many intersections */
   denom = choose(p2-p1=0, ., p2-p1);  /* protect against division by 0 */
   s = (q1 - p1) / denom;        /* Is q1 in [p1, p2]? */
   if any(0<=s && s<=1) then
      return q1;
   s = (q2 - p1) / denom;        /* Is q2 in [p1, p2]? */
   if any(0<=s && s<=1) then
      return q2;
   denom = choose(q2-q1=0, ., q2-q1);  /* protect against division by 0 */
   s = (p1 - q1) / denom;        /* Is p1 in [q1, q2]? */
   if any(0<=s && s<=1) then
      return p1;
   return ({. .});               /* segments are disjoint */
finish;
 
/* test overlapping segments; return endpoint of one segment */
p1 = {-1 1};  p2 = {1 1};
q1 = {0 1};   q2 = {2 1};
w = Intersect2DSegs(p1, p2, q1, q2);
print w;

In summary, by using matrices, vectors, and linear algebra, you can easily solve for the intersection of two line segments or determine that the segments do not intersect. The general case needs some special logic to handle degenerate configurations, but the code that solves the generic cases is straightforward when expressed in a vectorized language such as SAS/IML.

The post The intersection of two line segments appeared first on The DO Loop.