Pi Day

3月 112020
Regular polygons approximate a circle

Recently, I saw a graphic on Twitter by @neilrkaye that showed the rapid convergence of a regular polygon to a circle as you increase the number of sides for the polygon. The author remarked that polygons that have 40 or more sides "all look like circles to me." That is, a regular polygon with a large number of sides is visually indistinguishable from a circle.

I had two reactions to the graphic. The first was "Cool! I want to create a similar figure in SAS!" (I have done so on the right; click to enlarge.) The second reaction was that the idea is both mathematically trivial and mathematically sophisticated. It is trivial because it is known to Archimedes more than 2,000 years ago and is readily understood by using high school math. But it is sophisticated because it is a particular instance of a mathematical limit, which is the foundation of calculus and the mathematical field of analysis.

The figure also demonstrates the fact that you can approximate a smooth object (a circle, a curve, a surface, ...) by a large number of small, local, linear approximations. It is not an exaggeration to say that that concept is among the most important concepts in applied mathematics. It is used in numerical methods to solve nonlinear equations, integrals, differential equations, and more. It is used in numerical optimization. It forms the foundations of computer graphics. A standard technique in computational algorithms that involves a nonlinear function is to approximate the function by the first term of the Taylor expansion—a process known as linearization.

An approximation of pi

Archimedes used this process (local approximation by a linear function) to approximate pi (π), which is the ratio of the circumference of a circle to its diameter. He used two sets of regular polygons: one that is inscribed in the unit circle and the other that circumscribes the unit circle. The circumference of an inscribed polygon is less than the circumference of the circle. The circumference of a circumscribed polygon is greater than the circumference of the circle. They converge to a common value, which is the circumference of the circle. If you apply this process to a unit circle, you approximate the value 2π. Archimedes used the same construction to approximate the area of the unit circle, which is π.

You can use Archimedes's ideas to approximate π by using trigonometry and regular polygons, as shown in the following paragraph.

Suppose that a regular polygon has n sides. Then the central angle between adjacent vertices is θ = 2π/n radians. The following diagram illustrates the geometry of inscribed and circumscribed polygons. The right triangle that is shown is half of the triangle between adjacent vertices. Consequently,

  • The half-length of a side is b, where b = cos(θ/2) for the inscribed polygon and b = tan(θ/2) for the circumscribed polygon.
  • The height of the triangle is h, where h = |sin(θ/2)| for the inscribed polygon and h = 1 for the circumscribed polygon.
  • The circumfernce of the regular polygon is 2 n b
  • The area of the regular polygon is 2 n ( b h/2 ).

Approximate pi by using Archimedes's method

Although Archimedes did not have access to a modern computer, you can easily write a SAS DATA step program to reproduce Archimedes's approximations to the circumference and area of the unit circle, as shown below:

/* area and circomference */
data ApproxCircle;
pi = constant('pi');           /* Hah! We must know pi in order to evaluate the trig functions! */
south = 3*pi/2;                /* angle that is straight down */
do n = 3 to 100;
   angle = 2*pi/n;
   theta = south + angle/2;    /* first vertex for this n-gon */
   /* Circumference and Area for circumscribed polygon */
   b = tan(angle/2);
   h = 1;
   C_Out = 2*n * b;
   A_Out = 2*n * 0.5*b*h;
   /* Circumference and Area for inscribed polygon */
   b = cos(theta);
   h = abs( sin(theta) );
   C_In = 2*n * b;
   A_In = 2*n * 0.5*b*h;
   /* difference between the circumscribed and inscribed circles */
   CDiff = C_Out - C_In;
   ADiff = A_Out - A_In;
label CDiff="Circumference Difference" ADiff="Area Difference";
keep n C_In C_Out A_In A_Out CDiff ADiff;
/* graph the circumference and area of the n-gons as a function of n */
ods graphics / reset height=300px width=640px;
%let curveoptions = curvelabel curvelabelpos=min;
title "Circumference of Regular Polygons";
proc sgplot data=ApproxCircle;
   series x=n y=C_Out / &curveoptions;
   series x=n y=C_In  / &curveoptions;
   refline 6.2831853 / axis=y;              /* reference line at 2 pi */
   xaxis grid values=(0 to 100 by 10) valueshint;
   yaxis values=(2 to 10) valueshint label="Regular Polygon Approximation";
title "Area of Regular Polygons";
proc sgplot data=ApproxCircle;
   series x=n y=A_Out / &curveoptions;
   series x=n y=A_In  / &curveoptions;
   refline 3.1415927 / axis=y;              /* reference line at pi */
   xaxis grid values=(0 to 100 by 10) valueshint;
   yaxis  values=(2 to 10) valueshint label="Regular Polygon Approximation";

As you can see from the graph, the circumference and area of the regular polygons converge quickly to 2π and π, respectively. After n = 40 sides, the curves are visually indistinguishable from their limits, which is the same result that we noticed visually when looking at the grid of regular polygons.

The DATA step also computes the difference between the measurements of the circumscribed and inscribed polygons. You can print out a few of the differences to determine how close these estimates are to the circumference and area of the unit circle:

proc print data=ApproxCircle noobs;
   where n in (40, 50, 56, 60, 80, 100);
   var n CDiff ADiff;

The table shows that the difference between the circumference of the circumscribed and inscribed polygons is about 0.02 when n = 40. For n = 56, the difference is less than 0.01, which means that the circumference of a regular polynomial approximates the circumference of the unit circle to two decimal places when n ≥ 56. If you use a regular 100-gon, the circumference is within 0.003 of the circumference of the unit circle. Although it is not shown, it turns out you need to use 177 sides before the difference is within 0.001, meaning that a 177-gon approximates the circumference of the unit circle to three decimal places.

Similar results hold for the area of the polygons and the area of a unit circle.

In conclusion, not only does a regular n-gon look very similar to the circle when n is large, but you can quantify how quickly the circumference and areas of an n-gon converges to the values 2 π and π, respectively. For n=56, the polygon values are accurate to two decimal places; for n=177, the polygon values are accurate to three decimal places.

Approximating a smooth curve by a series of discrete approximations is the foundation of calculus and modern numerical methods. The idea had its start in ancient Greece, but the world had to wait for Newton, Leibnitz, and others to provide the mathematical machinery (limits, convergence, ...) to understand the concepts rigorously.

The post Polygons, pi, and linear approximations appeared first on The DO Loop.

3月 132019

It's time to celebrate Pi Day! Every year on March 14th (written 3/14 in the US), math-loving folks celebrate "all things pi-related" because 3.14 is the three-decimal approximation to the mathematical constant, π. Although children learn that pi is approximately 3.14159..., the actual definition of π is the ratio of a circle's circumference to its diameter. Equivalently, it is distance around half of the unit circle. (The unit circle has a unit radius, so its diameter is 2.) The value for pi, therefore, depends on the definition of a circle.

But we all know what a circle looks like, don't we? How can there be more than one circle?

Generalizing the circle

A circle is defined as the locus of points in the plane that are a given distance from a given point. This definition depends on the definition of a "distance," and it turns out that there are infinitely many ways to measure the distance between two points in the plane. The Euclidean distance between two points is the most familiar distances, but there are other definitions. For two points a = (x1, y1) and b = (x2, y2), you can define the "Lp distance" between a and b by the formula
Dp = ( |x1 – x2|p + |y1 – y2|p )1/p
This definition defines a distance metric for every value of p ≥ 1. If you set p=2 in the formula, you get the usual L2 (Euclidean) distance. If you set p=1, you get the L1 metric, which is known as the "taxicab" or "city block" distance.

You might think that the Euclidean distance is the only relevant distance, but it turns out that some of these other distances have practical applications in statistics, machine learning, linear algebra, and many fields of applied mathematics. For example, the 2-norm (L2) distance is used in least-squares regression whereas the 1-norm (L1) distance is used in robust regression and quantile regression. A combination of the two distances is used for ridge regression, LASSO regression, and "elastic net" regression.

Here's the connection to pi: If you can define infinitely many distance formulas, then there are infinitely many unit circles, one for each value of p ≥ 1. And if there are infinitely many circles, there might be infinitely many values of pi. (Spoiler alert: There are!)

Would the real circle please stand up?

You can easily solve for y as a function of x and draw the unit circle for a representative set of values for p. The following graph was generated by the SAS step and PROC SGPLOT. You can download the SAS program that generates the graphs in this article.

The L1 unit circle is a diamond (the top half is shown), the L2 unit circle is the familiar round shape, and as p gets large the unit circle for the Lp distance approaches the boundary of the square defined by the four points (±1, ±1). For more information about Lp circles and metrics, see the Wikipedia article "Lp Space: The p-norm in finite dimensions."

Here comes the surprise: Just as each Lp metric has its own unit circle, each metric has its own numerical value for pi, which is the length of the unit semicircle as measured by that metric.

π(p): The length of the unit semicircle for the Lp distance metric

So far, we've only used geometry, but it's time to use a little calculus. This presentation is based on Keller and Vakil (2009, p. 931-935), who give more details about the formulas in this section.

For a curve that is represented as a graph (y as a function of x), you can obtain the length of the curve by integrating the arclength. In Calculus 2, the arclength formula is derived for Euclidean distance, but it is straightforward to give the formula for the Lp distance:
s(p) = ∫ (1 + |dy/dx|p)1/p dx

To obtain a value for pi in the Lp metric, you can integrate the arclength for the upper half of the Lp unit circle. Equivalently, by symmetry, you can integrate one-eighth of the unit circle and multiply by 4. A convenient choice for the limits of integration is [0, 2-1/p] because 2-1/p is the x value where the 45-degree line intersects the unit circle for the Lp metric.

Substituting for the derivative gives the following formula (Keller and Vakil, 2009, p. 932):
π(p) = 4 ∫ (1 + u(x))1/p dx, where u(x) = |x-p - 1|1-p and the interval of integration is [0, 2-1/p].

A pi for each Lp metric

For each value of p, you get a different value for pi. You can use your favorite numerical integration routine to approximate π(p) by integrating the formula for various values of p ≥ 1. I used SAS/IML, which supports the QUAD function for numerical integration. The arclength computation for a variety of values for p is summarized by the following graph. The graph shows the computation of π(p), which is the length of the semicircle in the Lp metric, versus values of p for p in [1, 11].

The graph shows that the L1 value for pi is 4. The value decreases rapidly as p approaches 2 and reaches a minimum value when p=2 and the value of pi is 3.14159.... For p > 2, the graph of π(p) increases slowly. You can show that π(p) asymptotically approaches the value 4 as p approaches infinity.

On Pi Day, some places have contests to see who can recite the most digits of pi. I encourage you to enter the contest and say "Pi, in the L1 metric, is FOUR point zero, zero, zero, zero, ...." If they refuse to give you the prize, tell them to read this article! 😉

Reflections on pi

One the one hand, this article shows that there is nothing special about the value 3.14159.... For an Lp metric, the ratio of the circumference of a circle to its diameter can be any value between π and 4. On the other hand, the graph shows that π is the unique minimizer of the graph. Among an infinitude of circles and metrics, the well-known Euclidean distance is the only Lp metric for which pi is 3.14159....

If you ask me, our value of π is special, without a doubt!


Download the SAS program that creates the graphs in this article.

The post The value of pi depends on how you measure distance appeared first on The DO Loop.

3月 122018

Welcome to my annual Pi Day post. Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi.

Pi is a mathematical constant that never changes. Pi is the same value today as it was in ancient Babylon and Greece. The timeless constancy of pi is a comforting presence in a world of rapid change.

Abramowitz and Stegun, Handbook of Mathematical Functions

But even though the value of pi does not change, our knowledge about pi does change and grow. I was reminded of this recently when I opened my worn copy of the Handbook of Mathematical Functions (more commonly known as "Abramowitz and Stegun," the names of its editors). When the 1,046-page Handbook was published in 1964, it was the premier reference volume for applied mathematicians and mathematical scientists. Interestingly, pi is not even listed in the index! It does appear on p. 3 under "Mathematical Constants," which gives a 25-digit approximation of many mathematical constants such as pi, e, and sqrt(2).

How to define pi?

Fast forward to the age of the internet. In 2010, the Handbook was transformed into an expanded online, searchable, interactive web site. The new Handbook is called The NIST Digital Library of Mathematical Functions. This is very exciting because the Handbook is now available (for free!) to everyone!

If you search for pi in the online Digital Library, you find that the editors chose to define pi as the value of the integral

This seems to be a strange way to define pi. Pi is the ratio of the circumference and diameter of a circle, and upon first glance that formula doesn't seem related to a circle. A more geometric choice would be an integrand such as sqrt(1 + t2), which connects pi to the area under the unit circle.

Of course, the integral in the Digital Library is equal to pi, but it is not obvious. You might recall from calculus that the antiderivative of 1/(1+t2) is arctan(t). Therefore the expression is just a complicated way to write 4 arctan(1). Ah! This makes more sense because arctan(1) is equal to π/4. In fact, before SAS introduced the CONSTANT function, SAS programmers used to define pi by using the computation pi = 4*ATAN(1). Nevertheless, I think expressing arctan(1) as an integral is unnecessarily obtuse.

Using the Cauchy distribution to define pi

I am not enamored with the editors' choice of an integral to define pi, but if I were to use that integrand to define pi, I would use a variation that has applications in probability and statistics. Statisticians sometimes use the Cauchy distribution, which is a fat-tailed distribution that has the interesting mathematical property that the distribution has no mean (expected) value! (Mathematicians say that "the first moment does not exist.") Researchers in robust statistical methods sometimes use Cauchy-distributed errors to generate extreme outliers in simulated data.

The Cauchy probability density function (PDF) is 1/π 1/(1+t2), which means that the integral of the PDF on the interval [-∞, ∞] is 1. Equivalently, the integral of 1/(1+t2) on the interval [-∞, ∞] is π:

This definition of pi seems more natural than the integral on [0, 1]. I could make other suggestions (such as the integral of arccos on [-1, 1]), but I think I'll stop here.

The purpose of this post is to celebrate pi, which is so ubiquitous and important that it can be defined in numerous ways. A secondary purpose is to highlight the availability of the NIST Digital Library of Mathematical Functions, which is an online successor of the venerable Handbook of Mathematical Functions. I am thrilled with the availability of this amazing resource, regardless of how they define pi!

To complete this Pi Day post, I leave you with a pi-ku. A pi-ku is like a haiku, but each line contains syllables the number of syllables in the decimal expansion of pi. A common structure for a pi-ku is 3-1-4. The following pi-ku celebrates the new Digital Library:

Handbook of
Functions? Online!

The post Pi, special functions, and distributions appeared first on The DO Loop.

3月 132017

It is time for Pi Day, 2017! Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. This year I use SAS software to show an amazing fact: you can find your birthday (or any other date) within the first 10 million digits of pi!

Patterns within the digits in pi

Mathematicians conjecture that the decimal expansion of pi exhibits many properties of a random sequence of digits. If so, you should be able to find any sequence of digits within the decimal digits of pi.

If you want to search for a particular date, such as your birthday, you need to choose a pattern of digits that represents the date. For example, Pi Day was first celebrated on 14MAR1988. You can represent that date in several ways. This article uses the MMDDYY representation, which is 031488. You could also use a representation such as 31488, which drops the leading zero for months or days less than 10. Or use the DDMMYY convention, which is 140399.

Can you find your birthday within the digits of pi?
Click To Tweet

In 2015 I showed how to use SAS software to download the first ten million digits of pi from an internet site. The program then uses PROC PRINT to print six consecutive digits of pi beginning at the 433,422th digit:

/* read data over the internet from a URL */
filename rawurl url "http://www.cs.princeton.edu/introcs/data/pi-10million.txt"
                /* proxy='http://yourproxy.company.com:80' */ ;
data PiDigits;
   infile rawurl lrecl=10000000;
   input Digit 1. @@;
   Position = _n_;
/* Pi Day "birthday" 03/14/88 represented as 031488 */
proc print noobs data=PiDigits(firstobs=433422 obs=433427);
   var Position Digit;

Look at that! The six-digit pattern 031488 appears in the decimal digits of pi! This location also contains the alternative five-digit representation 31488, but you can find that five-digit sequence much earlier, at the 19,466th digit:

/* Alternative representation: Pi Day birthday = 31488 */
proc print noobs data=PiDigits(firstobs=19466 obs=19470);
   var Position Digit;

How did I know where to look for these patterns? Read on to discover how to use SAS to find a particular pattern digits within the decimal expansion of pi.

Finding patterns within the digits in pi

Last week I showed how to use SAS to search for a particular pattern within a long sequence of digits. Let's use that technique to search for the six-digit Pi Day "birthday," pattern 031488. The following call to PROC IML in SAS defines a function that implements the search algorithm. The program then reads in the first 10 million digits of pi and conducts the search for the pattern:

proc iml;
/* FindPattern: Finds a specified pattern within a long sequence of digits.
   Input: target : row vector of the target pattern, such as {0 3 1 4 8 8}
          digits : col vector of the digits in which to search
   Prints the number of times the pattern appears and the first location of the pattern.
   See https://blogs.sas.com/content/iml/2017/03/10/find-pattern-in-sequence-of-digits.html
start FindPattern(target, digits);
   p = ncol(target);               /* length of target sequence */
   D = lag(digits, (p-1):0);       /* columns shift the digits */
   D = D[p:nrow(digits),];         /* delete first p rows */
   X = (D=target);                 /* binary matrix */
   /* sum across columns. Which rows contain all 1s? */
   b = (X[,+] = p);                /* b[i]=1 if i_th row matches target */
   NumRepl = sum(b);               /* how many times does target appear? */
   if NumRepl=0 then FirstLoc = 0;  else FirstLoc = loc(b)[1];
   result = NumRepl // FirstLoc;
   labl = "Pattern = "  + rowcat(char(target,1));  /* convert to string */
   print result[L=labl F=COMMA9. rowname={"Num Repl", "First Loc"}];
/* read in 10 million digits of pi */
use PiDigits;  read all var {"Digit"};  close;
target = {0 3  1 4  8 8};  /* six-digit "birthday" of Pi Day */
call FindPattern(target, Digit);
target = {3  1 4  8 8};    /* five-digit "birthday" */
call FindPattern(target, Digit);

Success! The program shows the starting location for each pattern within the digits of pi. The starting locations match the values of the FIRSTOBS= option that was used in PROC PRINT in the previous section.

Search for your birthday within the digits of pi

You can use this program to search for your birthday, your anniversary, or any other special date. (If you prefer to use the SAS DATA step, see the comments of my previous article.) If you don't have SAS, don't despair! I got the idea for this article from a nifty web page on PBS.org that contains an applet that you can use to find your birthday among the digits of pi.

The PBS applet does not require any special software. However, I noticed that it gives slightly different answers from the SAS program I wrote. One trivial difference is that the applet starts with the "3" digit of pi, whereas the SAS program starts with the "1" in the tenths decimal place. So the two programs should give locations that differ by one place. Another difference is that the applet appears to always represent months and days that are less than 10 as a one-digit value, so that the PBS applet represents 02JAN2003 as "1203" rather than "010203." However, I have observed (but cannot explain) that the PBS applet seems to consistently report a location that is three digits more than the SAS-reported location. For example, the applet reports 02JAN2003 (1203) as occurring at the 60,875th digit, whereas the SAS program reports the location as the 60,872th digit.

Some unique dates within the digits of pi

We know that the Pi Day "birthday" date appears, but what about other dates? I wrote a SAS program that searches for all six-digit MMDDYY representation of dates from 01JAN1900 to 21DEC1999. I verified all dates are contained in the first 10 million digits of pi except for one. The date 01DEC1954 (120154) is the only date that does not appear!

I also discovered some other interesting properties while searching for dates (in the MMDDYY format) within the first 10 million digits of pi:

  1. First appearance: The first date to appear is 28JUN1962 (062862), which appears in the 71st decimal location.
  2. Latest (first) appearance: The date 23NOV1960 (112360) does not appear until the 9,982,545th location.
  3. Rarest: The date 01DEC1954 (120154) is the only date that does not appear. (But the five-digit representation (12154) does appear.)
  4. Second rarest: There are 15 dates that only appear one time.
  5. Most frequent: The date 22JUL1982 (072282) appears 25 times.
  6. Distribution of appearances: Most dates appear between seven and 12 times. The following graph shows the distribution of the number of times that each date appears.
Distribution of the number of times that each date MMDDYY appears in first 10M digits of pi

If you want to discover other awesome facts, you can explore the data yourself. You can download the results (in CSV format) of the exhaustive search. If you want to see how I searched the set of all MMDDYY patterns, you can download the SAS program that I used to create the analyses in this article.

The post Find your birthday in the digits of pi appeared first on The DO Loop.

3月 142016

Math lovers, do you know what day it is? It's Pi Day, which we celebrate every year on March 14 because the date 3-14 matches the first three digits of pi, 3.14. This year, I'm celebrating with poetry, combining my love of math with my love of language. Word Spy explains that a pi-ku is […]

Let's celebrate Pi Day with a pi-ku was published on SAS Voices.

3月 082016

Pi Day (3/14) is next week, and once again, we are encouraging everyone in the data visualization community to use this day as motivation to help clean up ineffective pie charts within their organization or in public spaces like Wikipedia. I think there is still a mindset outside of our community that a pie […]

The post Progress on #OneLessPie appeared first on JMP Blog.

3月 152014
It's wonderful to see the #onelesspie effort gathering interest, especially on twitter. For my introductory post yesterday, I wanted to focus on encouraging people to improve pie charts everywhere. In this post, I want to show you how I remade the pie chart in JMP. Here's the original: The first [...]