Numerical Analysis

2月 192018

Your statistical software probably provides a function that computes quantiles of common probability distributions such as the normal, exponential, and beta distributions. Because there are infinitely many probability distributions, you might encounter a distribution for which a built-in quantile function is not implemented. No problem! This article shows how to numerically compute the quantiles of any probability distribution from the definition of the cumulative distribution (CDF).

In SAS, the QUANTILE function computes the quantiles for about 25 distributions. This article shows how you can use numerical root-finding methods (and possibly numerical integration) in SAS/IML software to compute the quantile function for ANY continuous distribution. I have previously written about related topics and particular examples, such as the following:

The quantile is the root of an integral equation

Quantiles are the solutions to the equation CDF(x)-p=0, where p is a probability

Computing a quantile would make a good final exam question for an undergraduate class in numerical analysis. Although some distributions have an explicit CDF, many distributions are defined only by a probability density function (the PDF, f(x)) and numerical integration must be used to compute the cumulative distribution (the CDF, F(x)). A canonical example is the normal distribution. I've previously shown how to use numerical integration to compute a CDF from a PDF by using the definition F(x) = ∫ f(t) dt, where the lower limit of the integral is –∞ and the upper limit is x.

Whether the CDF is defined analytically or through numerical integration, the quantile for p is found implicitly as the solution to the equation F(x) = p, where p is a probability in the interval (0,1). This is illustrated by the graph at the right.

Equivalently, you can define G(x; p) = F(x) – p so that the quantile is the root of the equation G(x; p) = 0. For well-behaved densities that occur in practice, a numerical root is easily found because the CDF is monotonically increasing. (If you like pathological functions, see the Cantor staircase distribution.)

Example: Create a custom distribution

SAS/IML software provides the QUAD subroutine, which provides numerical integration, and the FROOT function, which solves for roots. Thus SAS/IML is an ideal computational environment for computing quantiles for custom distributions.

As an example, consider a distribution that is a mixture of an exponential and a normal distribution:
F(x) = 0.4 Fexp(x; 0.5) + 0.6 Φ(x; 10, 2),
where Fexp(x; 0.5) is the exponential distribution with scale parameter 0.5 and Φ(x; 10, 2) is the normal CDF with mean 20 and standard deviation 2. In this case, you do not need to use numerical integration to compute the CDF. You can compute the CDF as a linear combination of the exponential and normal CDFs, as shown in the following SAS/IML function:

/* program to numerically find quantiles for a custom distribution */
proc iml;
/* Define the cumulative distribution function here. */
start CustomCDF(x); 
   F = 0.4*cdf("Exponential", x, 0.5) +
       0.6*cdf("Normal", x, 10, 2);
   return F;

The previous section shows the graph of the CDF on the interval [0, 16]. The vertical and horizontal lines correspond to the first, second and third quartiles of the distribution. The quartiles are close to the values Q1 ≈ 0.5, Q2 ≈ 8, and Q3 ≈ 10.5. The next section shows how to compute the quantiles.

Compute quantiles for an arbitrary distribution

As long as you can define a function that evaluates the CDF, you can find quantiles. For unbounded distributions, it is usually helpful to plot the CDF so that you can visually estimate an interval that contains the quantile. (For bounded distributions, the support of the distribution contains all quantiles.) For the mixture distribution in the previous section, it is clear that the quantiles are in the interval [0, 16].

The following program finds arbitrary quantiles for whichever CDF is evaluated by the CustomCDF function. To find quantiles for a different function, you can modify the CustomCDF and change the interval on which to find the quantiles. You do not need to modify the RootFunc or CustomQuantile functions.

/* Express CDF(x)=p as the root for the function CDF(x)-p. */
start RootFunc(x) global(gProb);
   return CustomCDF(x) - gProb;  /* quantile for p is root of CDF(x)-p */
/* You need to provide an interval on which to search for the quantiles. */
start CustomQuantile(p, Interval) global(gProb);
   q = j(nrow(p), ncol(p), .);              /* allocate result matrix */
   do i = 1 to nrow(p)*ncol(p);             /* for each element of p... */
      gProb = p[i];                         /*    set global variable   */
      q[i] = froot("RootFunc", Interval);   /*    find root (quantile)  */ 
   return q;
/* Example: look for quartiles in interval [0, 16] */
probs = {0.25 0.5 0.75};         /* Q1, Q2, Q3 */
intvl = {0 16};                  /* interval on which to search for quantiles */
quartiles = CustomQuantile(probs, intvl);
print quartiles[colname={Q1 Q2 Q3}];
Quartiles for mixture of exponential and normal distributions

In summary, you can compute an arbitrary quantile of an arbitrary continuous distribution if you can (1) evaluate the CDF at any point and (2) numerically solve for the root of the equation CDF(x)-p for a probability value, p. Because the support of the distribution is arbitrary, the implementation requires that you provide an interval [a,b] that contains the quantile.

The computation should be robust and accurate for non-pathological distributions provided that the density is not tiny or zero at the value of the quantile. Although this example is illustrated in SAS, the same method will work in other software.

The post Compute the quantiles of any distribution appeared first on The DO Loop.

11月 202017

This article shows how to simulate beta-binomial data in SAS and how to compute the density function (PDF). The beta-binomial distribution is a discrete compound distribution. The "binomial" part of the name means that the discrete random variable X follows a binomial distribution with parameters N (number of trials) and p, but there is a twist: The parameter p is not a constant value but is a random variable that follows the Beta(a, b) distribution.

The beta-binomial distribution is used to model count data where the counts are "almost binomial" but have more variance than can be explained by a binomial model. Therefore this article also compares the binomial and beta-binomial distributions.

Simulate data from the beta-binomial distribution

To generate a random value from the beta-binomial distribution, use a two-step process. The first step is to draw p randomly from the Beta(a, b) distribution. Then you draw x from the binomial distribution Bin(p, N). The beta-binomial distribution is not natively supported by the RAND function SAS, but you can call the RAND function twice to simulate beta-binomial data, as follows:

/* simulate a random sample from the beta-binomial distribution */
%let SampleSize = 1000;
data BetaBin;                         
a = 6; b = 4; nTrials = 10;           /* parameters */
call streaminit(4321);
do i = 1 to &SampleSize;
   p = rand("Beta", a, b);            /* p[i] ~ Beta(a,b)  */
   x = rand("Binomial", p, nTrials);  /* x[i] ~ Bin(p[i], nTrials) */
keep x;

The result of the simulation is shown in the following bar chart. The expected values are overlaid. The next section shows how to compute the expected values.

Beta-binomial distribution and expected values in SAS

The PDF of the beta-binomial distribution

The Wikipedia article about the beta-binomial distribution contains a formula for the PDF of the distribution. Since the distribution is discrete, some references prefer to use "PMF" (probability mass function) instead of PDF. Regardless, if X is a random variable that follows the beta-binomial distribution then the probability that X=x is given by
PDF of the beta-binomial distribution
where B is the complete beta function.

The binomial coefficients ("N choose x") and the beta function are defined in terms of factorials and gamma functions, which get big fast. For numerical computations, it is usually more stable to compute the log-transform of the quantities and then exponentiate the result. The following DATA step computes the PDF of the beta-binomial distribution. For easy comparison with the distribution of the simulated data, the DATA step also computes the expected count for each value in a random sample of size N. The PDF and the simulated data are merged and plotted on the same graph by using the VBARBASIC statement in SAS 9.4M3. The graph was shown in the previous section.

data PDFBetaBinom;           /* PMF function */
a = 6; b = 4; nTrials = 10;  /* parameters */
do x = 0 to nTrials;
   logPMF = lcomb(nTrials, x) +
            logbeta(x + a, nTrials - x + b) -
            logbeta(a, b);
   PMF = exp(logPMF);        /* probability that X=x */
   EX = &SampleSize * PMF;   /* expected value in random sample */
keep x PMF EX;
/* Merge simulated data and PMF. Overlay PMF on data distribution. */
data All;
merge BetaBin PDFBetaBinom(rename=(x=t));
title "The Beta-Binomial Distribution";
title2 "Sample Size = &SampleSize";
proc sgplot data=All;
   vbarbasic x / barwidth=1 legendlabel='Simulated Sample';      /* requires SAS 9.4M3 */
   scatter x=t y=EX /  legendlabel='Expected Value'
      markerattrs=GraphDataDefault(symbol=CIRCLEFILLED size=10);
   inset "nTrials = 10"  "a = 6"  "b = 4" / position=topleft border;
   yaxis grid;
   xaxis label="x" integer type=linear;             /* force TYPE=LINEAR */

Compare the binomial and beta-binomial distributions

One application of the beta-binomial distribution is to model count data that are approximately binomial but have more variance ("thicker tails") than the binomial model predicts. The expected value of a Beta(a, b) distribution is a/(a + b), so let's compare the beta-binomial distribution to the binomial distribution with p = a/(a + b).

The following graph overlays the two PDFs for a = 6, b = 4, and nTrials = 10. The blue distribution is the binomial distribution with p = 6/(6 + 4) = 0.6. The pink distribution is the beta-binomial. You can see that the beta-binomial distribution has a shorter peak and thicker tails than the corresponding binomial distribution. The expected value for both distributions is 6, but the variance of the beta-binomial distribution is greater. Thus you can use the beta-binomial distribution as an alternative to the binomial distribution when the data exhibit greater variance than expected under the binomial model (a phenomenon known as overdispersion).

Compare the binomial and beta-binomial distributions with the same mean. The beta-binomial distribution has greater variance than the binomial.


The beta-binomial distribution is an example of a compound distribution. You can simulate data from a compound distribution by randomly drawing the parameters from some distribution and then using those random parameters to draw the data. For the beta-binomial distribution, the probability parameter p is drawn from a beta distribution and then used to draw x from a binomial distribution where the probability of success is the value of p. You can use the beta-binomial distribution to model data that have greater variance than expected under the binomial model.

The post Simulate data from the beta-binomial distribution in SAS appeared first on The DO Loop.

9月 112017

Did you know that you can get SAS to compute symbolic (analytical) derivatives of simple functions, including applying the product rule, quotient rule, and chain rule? SAS can form the symbolic derivatives of single-variable functions and partial derivatives of multivariable functions. Furthermore, the derivatives are output in a form that can be pasted into a SAS program. The trick is to use PROC NLIN with the LIST option.

In SAS/IML, the nonlinear optimization routines will use analytical derivatives if they are provided; otherwise, they will automatically generate numerical finite-difference approximations to the derivatives. I rarely specify derivatives because it can be time-consuming and prone to error. But recently I realized that PROC NLIN and other SAS procedures automatically generate symbolic derivatives, and you can "trick" the procedure into displaying the derivatives so that they can be used in other applications. To get PROC NLIN to output symbolic derivatives, do the following:

  • Create a data set to use for the DATA= option of PROC NLIN.
  • List the variables that you want to take derivatives of on the PARMS statement. In open code, assign a value to constants that appear in the function.
  • Specify the function to differentiate on the MODEL statement.

Symbolic derivatives of functions of one variable

Here is a one-variable example. Suppose you want to take the derivative of
f(x) = x3 + x sin(x) + exp( x2 ).
The function and its derivative is shown to the right.

The following SAS statements create a "fake" data set that defines a variable named F. The call to PROC NLIN sets up the problem. The LIST option on the PROC NLIN statement generates the 'ProgList' table, which contains the derivative function:

data _dummy;
   f = 1;                                    /* name of function */
proc nlin data=_dummy list;
   parms x=0;                                /* list variables */
   model f = x**3 + x*sin(x) + exp(x**2);    /* define function */
   ods select ProgList;                      /* output symbolic derivatives */
Symbolic derivatibves in SAS for a univariate function

The output shows the expressions for the function ("MODEL.f") and the derivative with respect to x. The derivative df/dx is written as "@MODEL.f/@x", where the symbol '@' is used for the 'd'. You can see that SAS took the simple derivative of the x3 term, applied the product rule to the x*sin(x) term, and applied the chain rule to the exp( x2 ) term. You can also see that the expressions are written in SAS notation, with "**" indicating the power operator. SAS functions are written in upper case (SIN, COS, and EXP).

Symbolic partial derivatives of multivariate functions

In exactly the same way, you can obtain partial derivatives. In a partial derivative, all variables are held constant except one. Suppose you have the function for the normal density,
f(x) = 1/(sqrt(2 π) σ) exp( -(x - μ)**2 / (2 σ**2) )
Suppose that you want to compute the partial derivatives ∂f/∂μ and ∂f/∂σ. To compute these derivatives, list μ and σ on the PARMS statement and assign values to the other symbols (x and π) that appear in the function. It does not matter what value you assign to the x and π, you are just preventing PROC NLIN from complaining that there are unassigned variables. You could assign the value 1 to both variables, but I prefer to tell the procedure that π equals 3.14. In the following program, note that PROC NLIN reuses the fake data set that defines the F variable:

proc nlin data=_dummy list;
   parms mu=0 sigma=1;                        /* list variables */
   x=1; pi = constant('pi');                  /* assign arbitrary values to constants */
   model f = exp( -(x-mu)**2 / (2*sigma**2) ) / (sqrt(2*pi)*sigma); /* define function */
   ods select ProgList;                      /* output symbolic derivatives */
Symbolic partial derivatibves in SAS for a multivariate function

The derivative with respect to σ requires using the chain rule and the quotient rule. Notice that the derivative includes a call to the original function ("MODEL.f"). Notice also that there is repetition of various expressions such as -(x - μ)**2 / (2 σ**2). If you copy and paste these expressions into a SAS program that computes the derivative, the computation will be slightly inefficient because the program will evaluate the same expression multiple times. One way to improve the efficiency is to collect sub-expressions into a temporary variable, as shown in the next section.

Sub-expressions and the chain rule

An experienced statistician will recognize that the expression z = (x - μ) / σ arises naturally as the standardized variable. Using such an expression might simplify the derivatives.

The following program computes the same derivatives as before, except this time the programmer defines the expression z = (x - μ) / σ and uses z in the function. When SAS takes the derivative, it computes dz/dμ and dz/dσ as part of the computation, as shown below:

proc nlin data=_dummy list;
   parms mu=0 sigma=1;                        /* list variables */
   x=1; pi = constant('pi');                  /* assign arbitrary values to constants */
   const = sqrt(2*pi);
   z = (x-mu) / sigma;                        /* intermediate expression */
   model f = exp(-z**2 / 2) / (const*sigma);  /* define function */
   ods select ProgList;                       /* output symbolic derivatives */
Symbolic derivatibves in SAS that use sub-expressions and apply the chain rule

Notice that the derivatives use several sub-expressions such as dz/dμ and dz/dσ. This computation is more efficient than the previous because it reuses previously computed quantities. For another example, define f1 = 1 / sqrt(2*pi*sigma) and f2 = exp(-z**2 / 2) and define the MODEL as f = f1*f2. You will see that each term in the derivative is efficiently evaluated in terms of previously expressions.

Discussion of symbolic derivatives

As I said previously, I do not usually specify analytical derivatives for my optimizations. I find that numerical derivatives (which are automatically computed) are fast and accurate for 99% of the optimizations that I perform.

Furthermore, some of the objective functions that I optimize do not have analytical derivatives. Others are matrix computations for which it would be difficult and time-consuming to write down the analytical derivative. (For an example, see the article on the derivative of the bivariate normal cumulative distribution.)

If you decide to specify an analytical derivative in SAS, make sure that the derivative is correct and efficient. The technique in this article will help to ensure that the derivative is correct. By strategically defining sub-expressions as in the previous section, you can help SAS generate derivatives that are computationally efficient.

Be aware that the derivatives are generated by applying the chain rule, product rule, and quotient rule without any algebraic simplifications of the result. Thus the result can be unnecessarily complicated. For an example, ask SAS to display the derivative of log( (1+x)/(1-x) ) / 2. The naive result is complicated. If you rewrite the function as log(1+x)/2 - log(1-x)/2, then the derivative is much simpler. However, in neither case will SAS simplify the derivative to obtain 1 / (1-x**2).

For more of my thoughts on using derivatives in SAS, see "Two hints for specifying derivatives." What about your thoughts? What do you think about using PROC NLIN to generate automatic derivatives of simple functions? Do you think this technique will be useful to you?

The post Symbolic derivatives in SAS appeared first on The DO Loop.

8月 282017

The singular value decomposition (SVD) could be called the "billion-dollar algorithm" since it provides the mathematical basis for many modern algorithms in data science, including text mining, recommender systems (think Netflix and Amazon), image processing, and classification problems. Although the SVD was mathematically discovered in the late 1800s, computers have made the SVD an indispensable tool in computational statistics and data science.

SVD: A fundamental theorem of linear algebra

Mathematically, the singular value decomposition is a fundamental theorem of linear algebra. )You could argue that it is THE fundamental theorem, but Gil Strang names a different result.) The singular value decomposition says that every n x p matrix can be written as the product of three matrices: A = U Σ VT where

  • U is an othogonal n x n matrix
  • Σ is a diagonal n x p matrix. In practice, the diagonal elements are ordered so that Σii ≥ Σjj for all i < j.
  • V is an othogonal p x p matrix and VT represents a matrix transpose.

The SVD represents the essential geometry of a linear transformation. It tells us that every linear transformation is a composition of three fundamental actions. Reading the equation from right to left:

  1. The matrix V represents a rotation or reflection of vectors in the p-dimensional domain.
  2. The matrix Σ represents a linear dilation or contraction along each of the p coordinate directions. If np, this step also canonically embeds (or projects) the p-dimensional domain into (or onto) the n-dimensional range.
  3. The matrix U represents a rotation or reflection of vectors in the n-dimensional range.

Thus the SVD specifies that every linear transformation is fundamentally a rotation or reflection, followed by a scaling, followed by another rotation or reflection. The Strang (1993) article about the fundamental theorem of linear algebra includes the following geometric interpretation of the singular value decomposition of a 2 x 2 matrix:

Geometric interpretation of the singular value decomposition (SVD) as the product of a rotation/reflection, followed by a scaling, followed by another rotation/reflection.

The diagram shows that the transformation induced by the matrix A (the long arrow across the top of the diagram) is equivalent to the composition of the three fundamental transformations, namely a rotation, a scaling, and another rotation.

SVD: The fundamental theorem of multivariate data analysis

Because of its usefulness, the singular value decomposition is a fundamental technique for multivariate data analysis. A common goal of multivariate data analysis is to reduce the dimension of the problem by choosing a small linear subspace that captures important properties of the data. The SVD is used in two important dimension-reducing operations:

  • Low-rank approximations: Recall that the diagonal elements of the Σ matrix (called the singular values) in the SVD are computed in decreasing order. The SVD has a wonderful mathematical property: if you choose some integer k ≥ 1 and let D be the diagonal matrix formed by replacing all singular values after the k_th by 0, then then matrix U D VT is the best rank-k approximation to the original matrix A.
  • Principal components analysis: The principal component analysis is usually presented in terms of eigenvectors of a correlation matrix, but you can show that the principal component analysis follows directly from the SVD. (In fact, you can derive the eigenvalue decomposition of a matrix, from the SVD.) The principal components of AT A are the columns of the V matrix; the scores are the columns of U. Dimension reduction is achieved by truncating the number of columns in U, which results in the best rank-k approximation of the data.

Compute the SVD in SAS

In SAS, you can use the SVD subroutine in SAS/IML software to compute the singular value decomposition of any matrix. To save memory, SAS/IML computes a "thin SVD" (or "economical SVD"), which means that the U matrix is an n x p matrix. This is usually what the data analyst wants, and it saves considerable memory in the usual case where the number of observations (n) is much larger than the number of variables (p). Technically speaking, the U for an "economical SVD" is suborthogonal: UT U is the identity when np and U UT is the identity when np.

As for eigenvectors, the columns of U and V are not unique, so be careful if you compare the results in SAS to the results from MATLAB or R. The following example demonstrates a singular value decomposition for a 3 x 2 matrix A. For the full SVD, the U matrix would be a 3 x 3 matrix and Σ would be a 3 x 2 diagonal matrix. For the "economical" SVD, U is 3 2 and Σ is a 2 x 2 diagonal matrix, as shown:

proc iml;
A = {0.3062 0.1768,
    -0.9236 1.0997,
    -0.4906 1.3497 };
call svd(U, D, V, A);  /* A = U*diag(D)*V` */
print U[f=6.3], D[f=6.3], V[f=6.3];

Notice that, to save memory, only the diagonal elements of the matrix &Signa; are returns in the vector D. You can explicitly form &Sigma = diag(D) if desired. Geometrically, the linear transformation that corresponds to V rotates a vector in the domain by about 30 degrees, the matrix D scales it by 2 and 0.5 in the coordinate directions, and U inserts it into a three-dimensional space, and applies another rotation.

My next blog post shows how the SVD enables you to reduce the dimension of a data matrix by using a low-rank approximation, which has applications to image compression and de-noising.

The post The singular value decomposition: A fundamental technique in multivariate data analysis appeared first on The DO Loop.

8月 232017

All statisticians are familiar with the classical arithmetic mean. Some statisticians are also familiar with the geometric mean. Whereas the arithmetic mean of n numbers is the sum divided by n, the geometric mean of n nonnegative numbers is the n_th root of the product of the numbers. The geometric mean is always less than the arithmetic mean.

Did you know that there is a hybrid quantity, called the arithmetic-geometric mean, which is defined by combining the two quantities? The arithmetic-geometric mean is only defined for two positive numbers, x and y. It is defined as the limit of an alternating iterative process:

  • Define a1 = (x + y)/2 and g1 = sqrt(x y) to be the arithmetic and geometric means, respectively.
  • Iteratively define an+1 = (an + gn)/2 and gn+1 = sqrt(an gn).

Interestingly, this process always converges. The number that it converges to is called the arithmetic-geometric mean of x and y, which is denoted by AGM(x,y). The AGM is between the geometric mean and the arithmetic mean. I am not aware of a multivariate generalization of the AGM function.

In SAS you can use the SAS/IML language or PROC FCMP to create a user-defined function that computes the arithmetic-geometric mean. Since the AGM is not a vector operation, I show the PROC FCMP approach:

proc fcmp outlib=work.funcs.MathFuncs;
function AGM(x, y);
   if x<0 | y<0 then return( . );
   epsilon = 1e-10;               /* precision of computation */
   a = mean(x, y);   
   g = geomean(x,y);   
   do while (abs(a - g) > epsilon);
      a_new = mean(a, g);   
      g_new = geomean(a, g);   
      a = a_new; 
      g = g_new;
   return( mean(a, g) );
/* test the function */
options cmplib=work.funcs;  /* define location of function */
data _NULL_;
   agm = AGM(1, 0.8); 
   put agm 18.14;

An example of calling the new AGM function is shown for the two numbers 1 and 0.8. The arithmetic mean of 1 and 0.8 is 0.9. The geometric mean of 1 and 0.8 is sqrt(0.8) = 0.8944. The computation shows that the arithmetic-geometric mean is 0.8972, which is between the arithmetic and geometric means.

The following program computes the graph of the AGM function and displays a contour plot of the result. Note that AGM(x, x) = x for any nonnegative value of x.

data grid;
do x = 0 to 5 by 0.1;
   do y = 0 to 5 by 0.1;
      agm = AGM(x, y);
/* see */
proc sgrender data=grid template=ContourPlotParm;
dynamic _TITLE="Graph of Arithmetic-Geometric Mean AGM(x,y)"
        _X="x" _Y="y" _Z="AGM";
Graph of the arithmetic-geometric mean

So, what is the arithmetic-geometric mean good for? The AGM seems mainly useful in applied mathematics and number theory for computing elliptic functions and elliptic integrals. I am not aware of any applications to statistics, perhaps because the AGM is not defined for a sample that contains n elements when n > 2.

Although the AGM function is not used in statistics, the iterative algorithm that defines the AGM is a technique that I have seen often in computational statistics. When a researcher wants two conditions to hold simultaneously, one possible method is to alternately solve for each condition and prove that this iterative process converges to a solution that satisfies both conditions simultaneously. The iterative process can be an effective way to implement an optimization problem if each sub-optimization is simple.

In multivariate statistics, this technique is used in the method of alternating least squares, which is used in several SAS procedures such as PROC TRANSREG, VARCLUS, and MDS. In linear algebra, this technique is used in the method of alternating projections. I have used this technique to compute the closest correlation matrix to an arbitrary matrix.

One last interesting fact. There is another kind of mean called the harmonic mean. You can compute the arithmetic-harmonic mean by applying the same iterative algorithm, but replace the GEOMEAN function with the HARMEAN function. The process converges and the arithmetic-harmonic mean converges is equal to the geometric mean! Thus although this two-step definition might seem contrived, it is "natural" in the sense that it produces the geometric mean (of two numbers) from the arithmetic and harmonic means.

The post The arithmetic-geometric mean appeared first on The DO Loop.

8月 072017

A SAS customer asked, "I computed the eigenvectors of a matrix in SAS and in another software package. I got different answers? How do I know which answer is correct?"

I've been asked variations of this question dozens of times. The answer is usually "both answers are correct."

The mathematical root of the problem is that eigenvectors are not unique. It is easy to show this: If v is an eigenvector of the matrix A, then by definition A v = λ v for some scalar eigenvalue λ. Notice that if you define u = α v for a scalar α ≠ 0, then u is also an eigenvector because A u = α A v = α λ v = λ u. Thus a multiple of an eigenvector is also an eigenvector.

Most statistical software (including SAS) tries to partially circumvent this problem by standardizing an eigenvector to have unit length (|| v || = 1). However, note that v and -v are both eigenvectors that have the same length. Thus even a standardized eigenvector is only unique up to a ± sign, and different software might return eigenvectors that differ in sign. In fact for some problems, the same software can return different answers when run on different operating systems (Windows versus Linux), or when using vendor-supplied basic linear algebra subroutines such as the Intel Math Kernel Library (MKL).

To further complicate the issue, software might sort the eigenvalues and eigenvectors in different ways. Some software (such as MATLAB) orders eigenvalues by magnitude, which is the absolute value of the eigenvalue. Other software (such as SAS) orders eigenvalues according to the value (of the real part) of the eigenvalues. (For most statistical computations, the matrices are symmetric and positive definite (SPD). For SPD matrices, which have real nonnegative eigenvalues, these two orderings are the same.)

Eigenvectors of an example matrix

To illustrate the fact that different software and numerical algorithms can produce different eigenvectors, let's examine the eigenvectors of the following 3 x 3 matrix:

The eigenvectors of this matrix will be computed by using five different software packages: SAS, Intel's MKL, MATLAB, Mathematica, and R. The eigenvalues for this matrix are unique and are approximately 16.1, 0, and -1.1. Notice that this matrix is not positive definite, so the order of the eigenvectors will vary depending on the software. Let's compute the eigenvectors in five different ways.

Method 1: SAS/IML EIGEN Call: The following statements compute the eigenvalues and eigenvectors of M by using a built-in algorithm in SAS. This algorithm was introduce in SAS version 6 and was the default algorithm until SAS 9.4.

proc iml;
reset FUZZ;         /* print very small numbers as 0 */
M = {1  2  3,
     4  5  6,
     7  8  9};
reset EIGEN93;      /* use "9.3" algorithm; no vendor BLAS (option required for SAS 9.4m3) */
call eigen(EigVal, SASEigVec, M);
print EigVal, SASEigVec[colname=("E1":"E3")];

Notice that the eigenvalues are sorted by their real part, not by their magnitude. The eigenvectors are returned in a matrix. The i_th column of the matrix is an eigenvector for the i_th eigenvalue. Notice that the eigenvector for the largest eigenvalue (the first column) has all positive components. The eigenvector for the zero eigenvalue (the second column) has a negative component in the second coordinate. The eigenvector for the negative eigenvalue (the third column) has a negative component in the third coordinate.

Method 2: Intel MKL BLAS: Starting with SAS/IML 14.1, you can instruct SAS/IML to call the Intel Math Kernel Library for eigenvalue computation if you are running SAS on a computer that has the MKL installed. This feature is the default behavior in SAS/IML 14.1 (SAS 9.4m3), which is why the previous example used RESET NOEIGEN93 to get the older "9.3 and before" algorithm. The output for the following statements assumes that you are running SAS 9.4m3 or later and your computer has Intel's MKL.

reset NOEIGEN93;   /* use Intel MKL, if available */
call eigen(EigVal, MKLEigVec, M);
print MKLEigVec[colname=("E1":"E3")];

This is a different result than before, but it is still a valid set of eigenvectors The first and third eigenvectors are the negative of the eigenvectors in the previous experiment. The eigenvectors are sorted in the same order, but that is because SAS (for consistency with earlier releases) internally sorts the eigenvectors that the MKL returns.

Method 3: MATLAB: The following MATLAB statements compute the eigenvalue and eigenvectors for the same matrix:

M = [1, 2, 3; 4, 5, 6; 7, 8, 9];
[EigVec, EigVal] = eig(M);
EigVec =
-0.2320   -0.7858    0.4082
-0.5253   -0.0868   -0.8165
-0.8187    0.6123    0.4082

The eigenvalues are not displayed, but you can tell from the output that the eigenvalues are ordered by magnitude: 16.1, -1.1, and 0. The eigenvectors are the same as the MKL results (within rounding precision), but they are presented in a different order.

Method 4: Mathematica: This example matrix is used in the Mathematica documentation for the Eigenvectors function:

Eigenvectors[{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}]
0.283349  -1.28335    1
0.641675  -0.141675  -2
1          1          1

This is a different result, but still correct. The symbolic computations in Mathematica do not standardize the eigenvectors to unit length. Instead, they standardize them to have a 1 in the last component. The eigenvalues are sorted by magnitude (like the MATLAB output), but the first column has opposite signs from the MATLAB output.

Method 5: R: The R documentation states that the eigen function in R calls the LAPACK subroutines. Thus I expect it to get the same result as MATLAB.

M <- matrix(c(1:9), nrow=3, byrow=TRUE)
r <- eigen(M)
           [,1]        [,2]       [,3]
[1,] -0.2319707 -0.78583024  0.4082483
[2,] -0.5253221 -0.08675134 -0.8164966
[3,] -0.8186735  0.61232756  0.4082483

Except for rounding, this result is the same as the MATLAB output.


This article used a simple 3 x 3 matrix to demonstrate that different software packages might produce different eigenvectors for the same input matrix. There were four different answers produced, all of which are correct. This is a result of the mathematical fact that eigenvectors are not unique: any multiple of an eigenvector is also an eigenvector! Different numerical algorithms can produce different eigenvectors, and this is compounded by the fact that you can standardize and order the eigenvectors in several ways.

Although it is hard to compare eigenvectors from different software packages, it is not impossible. First, make sure that the eigenvectors are ordered the same way. (You can skip this step for symmetric positive definite matrices.) Then make sure they are standardized to unit length. If you do those two steps, then the eigenvectors will agree up to a ± sign.

The post The curse of non-unique eigenvectors appeared first on The DO Loop.

3月 012017

Monte Carlo techniques have many applications, but a primary application is to approximate the probability that some event occurs. The idea is to simulate data from the population and count the proportion of times that the event occurs in the simulated data.

For continuous univariate distributions, the probability of an event is the area under a density curve. The integral of the density from negative infinity to a particular value is the definition of the cumulative distribution function (CDF) for a distribution. Instead of performing numerical integration, you can use Monte Carlo simulation to approximate the probability.

One-dimensional CDFs

In SAS software, you can use the CDF function to compute the CDF of many standard univariate distributions. For example, the statement prob = cdf("Normal", -1) computes the probability that a standard normal random variable takes on a value less than -1.

The CDF function is faster and more accurate than a Monte Carlo approximation, but let's see how the two methods compare. You can estimate the probability P(X < -1) by generating many random values from the N(0,1) distribution and computing the proportion that is less than -1, as shown in the following SAS DATA step

data _NULL_;
call streaminit(123456);
N = 10000;                       /* sample size */
do i = 1 to N;
   x = rand("Normal");           /* X ~ N(0,1) */
   cnt + (x < -1);               /* sum of counts for value less than -1 */
Prob = cdf("Normal", -1);        /* P(x< -1) */
MCEst = cnt / N;                 /* Monte Carlo approximation */
put Prob=;
put MCEst=;
Prob =0.1586552539

The Monte Carlo estimate is correct to two decimal places. The accuracy of this Monte Carlo computation is proportional to 1/sqrt(N), where N is the size of the Monte Carlo sample. Thus if you want to double the accuracy you need to quadruple the sample size.

Two-dimensional CDFs

SAS provides the PROBBNRM function for computing the CDF of a bivariate normal distribution, but does not provide a built-in function that computes the CDF for other multivariate probability distributions. However, you can use Monte Carlo techniques to approximate multivariate CDFs for any multivariate probability distribution for which you can generate random variates.

I have previously blogged about how to use the PROBBNRM function to compute the bivariate normal CDF. The following SAS/IML statements demonstrate how to use a Monte Carlo computation to approximate the bivariate normal CDF. The example uses a bivariate normal random variable Z ~ MVN(0, Σ), where Σ is the correlation matrix with Σ12 = 0.6.

The example computes the probability that a bivariate normal random variable is in the region G = {(x,y) | x<x0 and y<y0}. The program first calls the built-in PROBBNRM function to compute the probability. Then the program calls the RANDNORMAL function to generate 100,000 random values from the bivariate normal distribution. A binary vector (group) indicates whether each observation is in G. The MEAN function computes the proportion of observations that are in the region.

proc iml;
x0  = 0.3; y0 = 0.4; rho = 0.6; 
Prob = probbnrm(x0, y0, rho);      /* P(x<x0 and y<y0) */
call randseed(123456);
N = 1e5;                           /* sample size */
Sigma = { 1   &rho,                /* correlation matrix */
         &rho 1 };
mean = {0 0};
Z = randnormal(N, mean, Sigma);    /* sample from MVN(0, Sigma) */
group = (Z[,1] < x0 & Z[,2] < y0); /* binary vector */
MCEst = mean(group);               /* = sum(group=1) / N  */
print Prob MCEst;

You can use a scatter plot to visualize the Monte Carlo technique. The following statements create a scatter plot and use the DROPLINE statement in PROC SGPLOT to indicate the region G. Of the 100000 random observations, 49750 of them were in the region G. These observations are drawn in red. The observations that are outside the region are drawn in blue.

ods graphics / width=400px height=400px;
title "Estimate of P(x < x0  and y < y0) is 0.4978";
title2 "x0 = 0.3; y0 = 0.4; rho = 0.6";
call scatter(Z[,1], Z[,2]) group=group grid={x y}
     procopt="noautolegend aspect=1"
     option="transparency=0.9  markerattrs=(symbol=CircleFilled)"
     other="dropline x=0.3 y=0.4 / dropto=both;";

Higher dimensions

The Monte Carlo technique works well in low dimensions. As the dimensions get larger, you need to generate a lot of random variates in order to obtain an accurate estimate. For example, the following statements generate 10 million random values from the five-dimensional distribution of uncorrelated normal variates and estimate the probability of all components being less than 0:

d = 5;                         /* dimension */
N = 1e7;                       /* sample size */
mean = j(1, d, 0);             /* {0,0,...,0} */
Z = randnormal(N, mean, I(d)); /* Z ~  MVN (0, I)  */
v0 = {0 0 0 0 0};              /* cutoff values in each component */
ComponentsInRegion = (Z < v0)[,+]; /* number of components in region */
group = (ComponentsInRegion=d);    /* binary indicator vector */
MCEst = mean(group);               /* proportion of obs in region */
print (1/2**d)[label="Prob"] MCEst;

Because the normal components are independent, the joint probability is the product of the probabilities for each component: 1/25 = 0.03125. The Monte Carlo estimate is accurate to three decimal places.

The Monte Carlo technique can also handle non-rectangular regions. For example, you can compute the probability that a random variable is in a spherical region.

The Monte Carlo method is computationally expensive for high-dimensional distributions. In high dimensions (say, d > 10), you might need billions of random variates to obtain a reasonable approximation to the true probability. This is another example of the curse of dimensionality.

The post Monte Carlo estimates of joint probabilities appeared first on The DO Loop.

11月 162016

At a conference last week, a presenter showed SAS statements that compute the logarithm of a probability density function (PDF). The log-PDF is a a common computation because it occurs when maximizing the log-likelihood function.

The presenter computed the expression in SAS by using an expression that looked like
y = LOG(PDF("distribution", x, params));
In other words, he computed the PDF and then transformed the density by applying the LOG function.

There is a better way. It is more efficient and more accurate to use the LOGPDF function to compute the log-PDF directly.

Special functions for log-transformed distributions #SASTip
Click To Tweet

Log-transformed distributions

SAS provides several functions for computing with log-transformed distributions. In addition to the LOGPDF function, you can use the LOGCDF function to compute probabilities for the log-distribution.

Let's use the gamma distribution to see why the LOGPDF function is usually more efficient. The gamma density function with unit scale parameter is given by
f(x; a) = xa-1 exp(-x) / Γ(a)
where Γ(a) is the value of the gamma function evaluated at a.

The log-transform of the gamma density is
log(f(x; a)) = (a-1)log(x) – x – log(Γ(a))
This formula, which is used when you compute LOGPDF("gamma", x, 2), is cheap to evaluate and does not suffer from numerical underflow when x is large. In particular, recall that in double-precision arithmetic, exp(-x) underflows if x > 709.78, so the PDF function cannot support extremely large values of x. In contrast, the formula for the log-PDF does not experience any numerical problems when x is large. The log-PDF function is also very fast to compute.

The following DATA step illustrates how to use the LOGPDF function to compute the log-gamma density. It computes the log-gamma PDF two ways: by using the LOGPDF function and by using the log-transform of the gamma PDF. The results show that the numerical values are nearly the same, but that the PDF function fails for large values of x:

data LogPDF(drop=a);
a = 2;
do x = 100, 700, 720, 1000;
   LogOfPDF = log( pdf("Gamma", x, a) ); 
   LOGPDF = logpdf("Gamma", x, a);        /* faster and more accurate */
   diff = LOGPDF - LogOfPDF;
label LOGOfPDF = "LOG(PDF(x))";
proc print noobs label; run;
Density of the log-gamma(2) distribution for large x

By the way, SAS provides other functions that are optimized for computing the log-transformation of several well known funcions and quantities:

  • The LOGBETA function computes the logarithm of the beta function. You should use logbeta(a,b) instead of log(beta(a,b)).
  • The LGAMMA function computes the logarithm of the gamma function. You should use lgamma(a) instead of log(gamma(a)).
  • The LCOMB function computes the logarithm of the number of combinations of n objects taken k at a time. You should use lcomb(n,k) instead of log(comb(n,k)).
  • The LFACT function computes the quantity log(n!) for specified values of n. You should use lfact(n) instead of log(fact(n)).

In general, when software provides a function for directly computing the logarithm of a quantity, you should use it. The special-purpose function is typically faster, more accurate, and will handle arguments that are beyond the domain of the non-specialized function.

tags: Numerical Analysis

The post Need to log-transform a distribution? There's a SAS function for that! appeared first on The DO Loop.

8月 312016
Branches of the real-valued Lambert W function

This article describes how you can evaluate the Lambert W function in SAS/IML software. The Lambert W function is defined implicitly: given a real value x, the function's value w = W(x) is the value of w that satisfies the equation w exp(w) = x. Thus W is the inverse of the function g(w) = w exp(w).

Because the function g has a minimum value at w=-1, there are two branches to the Lambert W function. Both branches are shown in the adjacent graph. The top branch is called the principal (or positive) branch and is denoted Wp. The principal branch has domain x ≥ -1/e and range W(x) ≥ -1. The lower branch, denoted by Wm, is defined on -1/e ≤ x < 0 and has range W(x) ≤ = -1. The subscript "m" stands for "minus" since the range contains only negative values. Some authors use W0 for the upper branch and W-1 for the lower branch.

Both branches are used in applications, but the lower branch Wm is useful for the statistical programmer because you can use it to simulate data from heavy-tailed distribution. Briefly: for a class of heavy-tailed distributions, you can simulate data by using the inverse CDF method, which requires evaluating the Lambert W function.

Defining the Lambert W function in SAS/IML

You can download the SAS/IML functions that evaluate each branch of the Lambert W function. Both functions use an analytical approximation for the Lambert W function, followed by one or two iterations of Halley's method to ensure maximum precision:

  • The LambertWp function implements the principal branch Wp. The algorithm constructs a direct global approximation on [-1/e, ∞) based on Boyd (1998).
  • The LambertWm function implements the second branch Wm. The algorithm follows Chapeau-Blondeau and Monir (2002) (hereafter CBM). The CBM algorithm approximates Wm(x) on three different domains. The approximation uses a series expansion on [-1/e, -0.333), a rational-function approximation on [-0.333, -0.033], and an asymptotic series expansion on (-0.033, 0).

Calling the Lambert W function in SAS/IML

Download the SAS/IML program for this article and save it to a file called Then the following SAS/IML program shows how to call the functions for the upper and lower branches and plot the graphs:

%include "";      /* specify path to file */
proc iml;
load module=(LambertWp LambertWm);
title "Lambert W Function: Principal Branch";
x = do(-1/exp(1), 1, 0.01);
Wp = LambertWp(x);
call series(x, Wp) grid={x y} other="refline 0/axis=x; refline 0/axis=y";
title "Lambert W Function: Lower Branch";
x = do(-1/exp(1), -0.01, 0.001);
Wm = LambertWm(x);
call series(x, Wm) grid={x y};

The graphs are not shown because they are indistinguishable from the graph at the beginning of article.

You can use a simple error analysis to investigate the accuracy f these functions. After computing w = W(x), apply the inverse function g(w) = w exp(w) and see how close the result is to the input values. The LambertWp and LambertWm functions support an optional second parameter that specifies how many Halley iterations are performed. For LambertWm, only one iteration is performed by default. You can specify "2" as the second parameter to get two Halley iterations. The following statements show that the maximum error for the default computation is 2E-11 on (-1/e, -0.01). The error decreases to 5E-17 if you request two Halley iterations.
x = do(-1/exp(1), -0.01, 0.001);
Wm = LambertWm(x);            /* default precision: 1 Halley iteration */
g = Wm # exp(Wm);             /* apply inverse function */
maxError1 = max(abs(x - g));  /* maximum error */
Wm = LambertWm(x, 2);         /* higher precision: 2 Halley iterations */
g = Wm # exp(Wm);
maxError2 = max(abs(x - g));
print maxError1 maxError2;

In summary, the SAS/IML language provides an excellent environment for implementing new functions or statistical procedures that are not built into SAS. In this article I implemented methods from journal articles for evaluating the upper and lower branches of the Lambert W function, which is the inverse function of g(w) = w exp(w). Although the Lambert W function does not have a closed-form solution, you can implement an approximation to the function and then use one or two Halley iterations to quickly converge within machine precision.

tags: Numerical Analysis

The post The Lambert W function in SAS appeared first on The DO Loop.

8月 242016

Edmond Halley (1656-1742) is best known for computing the orbit and predicting the return of the short-period comet that bears his name. However, like many scientists of his era, he was involved in a variety of mathematical and scientific activities. One of his mathematical contributions is a numerical method for finding the roots of a real-valued function. The technique, which is called Halley's method, is similar to Newton's method, but converges more rapidly in the neighborhood of a root.

There are dozens of root-finding methods, but Newton's method is the gold standard for finding roots of general nonlinear functions. It converges quadratically and is easy to implement because the formula requires only the evaluation of a function and its first derivative. Other competing methods do not use derivatives at all, or they use higher-order derivatives.

Halley's method falls into the latter category. Like Newton's method, it requires an initial guess for the root. It evaluates the function and its first two derivatives at an approximate location of the root and uses that information to iteratively improve the approximation. This article compares Halley's method with Newton's method and suggests a class of functions for which Halley's method is preferable.

Halley's root-finding method: An alternative to Newton's method #SASTip
Click To Tweet

An example of finding a root in SAS/IML

Consider the function f(x) = x exp(x). If you want to find the values of x for which f(x)=c, you can recast the problem as a root-finding problem and find the roots of the function f(x)-c. For this particular function, the equation f(x)-c has one root if c ≥ 0 and two roots if -1/e < c < 0.

Halley's method: The roots of the function x*exp(x) + 1/(2e)

To be concrete, let c = -1/(2e) so that the equation has two roots. The function f(x)-c is shown to the right and the two roots are marked by red line segments. You can use the built-in FROOT function in SAS/IML to locate the roots. The FROOT function does not use Newton's method. Instead, it requires that you specify an interval on which to search for a root. From the graph, you can specify two intervals that bound roots, as follows:

proc iml;
/* The equation f(x)-c for f(x) = x exp(x) */
start func(x) global(g_target);
   return x # exp(x) - g_target;
g_target = -1 / (2*constant('e')); /* parameter; function has two roots */
intervals = {-3 -2,                /* one root in [-3,-2] */
             -1  0};               /* another root in [-1,0] */
roots = froot("func", intervals);  /* find roots in each interval */
print roots;

The FROOT function is effective and efficient. It always finds an approximate root provided that you can supply a bounding interval on which the function changes signs. However, sometimes you don't know a bounding interval, you only know an approximate value for the root. In that case, Newton-type methods are preferable.

Newton's method versus Halley's method

Halley's method is an extension of Newton's method that incorporates the second derivative of the target function. Whereas Newton's method iterates the formula xn+1 = xn - f(xn) / f′(xn), Halley's method contains an extra term in the denominator:
xn+1 = xn - f(xn) / [f′(xn+1) - 0.5 f(xn) f″(xn) / f′(xn)].
Notice that if f″(xn)=0, then Halley's method reduced to Newton's method.

For many functions, the computational cost of evaluating the extra term in Halley's formula does not offset the gain in convergence speed. In other words, it is often quicker and easier to use Newton's simpler formula than Halley's more complicated formula. However, Halley's method is worth implementing when the ratio f″(x) / f′(x) can be evaluated cheaply. The current function provides an example: f′(x) = (x+1) exp(x) and f″(x) = (x+2) exp(x), so the ratio is simply (x+2) / (x+1). This leads to the following SAS/IML functions. One function implements Newton's iteration and the other implements Halley's iteration:

/* Compute iterations of Newton's method for several initial guesses:
   f(x) = x#exp(x) - c */
start NewtonIters(x0, numIters=1) global(g_target);
   z = j(numIters+1, ncol(x0));
   z[1,] = x0;
   do i = 2 to numIters+1;
      x = z[i-1,];
      fx = x # exp(x) - g_target;         /* f(x)   */
      dfdx = (1+x) # exp(x);              /* f'(x)  */
      z[i,] = x - fx / dfdx;              /* new approximate root */
   return z;
/* Compute iterations of Halley's method for several initial guesses:
   f(x) = x#exp(x) - c */
start HalleyIters(x0, numIters=1) global(g_target);
   z = j(numIters+1, ncol(x0));
   z[1,] = x0;
   do i = 2 to numIters+1;
      x = z[i-1,];
      fx = x # exp(x) - g_target;         /* f(x)   */
      dfdx = (1+x) # exp(x);              /* f'(x)  */
      R = (2+x) / (1+x);                  /* ratio f''(x) / f'(x) */
      z[i,] = x - fx / (dfdx - 0.5*fx#R); /* new approximate root */
   return z;

Notice that the functions are practically identical. Halley's method computes an extra term (R) and includes that term in the iterative formula that converges to the root. I wrote the function in vectorized form so that you can pass in multiple initial guesses and the functions will iterate all guesses simultaneously. The following statements provide four initial guesses and apply both Newton's and Halley's method:

x0 = {-3 -2 -0.5 0.25};        /* multiple initial guesses */
N = NewtonIters(x0, 5);        /* compute 5 Newton iterations */
H = HalleyIters(x0, 3);        /* compute 3 Halley iterations */                
rNames = "Iter=0":"Iter=5";
print N[L="Newton's Method" c=("Guess1":"Guess4") r=rNames];
print H[L="Halley's Method" c=("Guess1":"Guess4") r=rNames];

The tables show that Halley's method tends to converge to a root faster than Newton's method. For the four initial conditions, Newton's method required three, four, or five iterations to converge to within 1e-6 of the root. In contrast, Haley's method used three or fewer iterations to reach the same level of convergence.

Again, for Halley's method to be useful in practice, the ratio f″(x) / f′(x) must be easy to evaluate. To generalize this example, the ratio simplifies if the function is the product of a simple function and an exponential: f(x) = P(x)*exp(x). For example, if P(x) is a polynomial, then f″ / f′ is a rational function.

In summary, Halley's method is a powerful alternative to Newton's method for finding roots of a function f for which the ratio f″(x) / f′(x) has a simple expression. In that case, Halley's root-finding method is easy to implement and converges to roots of f faster than Newton's method for the same initial guess.

tags: Numerical Analysis

The post Halley's method for finding roots appeared first on The DO Loop.