Numerical Analysis

9月 072022
 

Monotonic transformations occur frequently in math and statistics. Analysts use monotonic transformations to transform variable values, with Tukey's ladder of transformations and the Box-Cox transformations being familiar examples. Monotonic distributions figure prominently in probability theory because the cumulative distribution is a monotonic increasing function. For a continuous distribution that is strictly monotonic, the quantile function is also monotonic.

In a recent project, I needed to determine whether a certain function is monotonic increasing. Because the function is only known at a finite sequence of values, the reduced problem is to determine whether a sequence is increasing. I have previously shown that you can use the DIF function in SAS to test for an increasing sequence. This article shows how to apply the method to the problem of deciding whether a function is increasing.

What is a monotone sequence?

There are two types of monotonicity. In a weakly monotonic sequence, adjacent terms can be equal. That is, the difference between adjacent terms is allowed to be 0. In a strongly monotonic sequence (also called strictly monotonic), adjacent terms cannot be equal. For a strictly increasing sequence, the difference between adjacent terms is always positive. For a strictly decreasing sequence, the difference between adjacent terms is always negative. By itself, the term monotonic means the sequence is either increasing or decreasing.

Thus, there are four different tests for monotonicity. You can test whether a sequence is weakly increasing, strictly increasing, weakly decreasing, or strictly decreasing.

Test for monotone increasing in SAS

The following SAS/IML module evaluates a vector and tests whether the elements are increasing. It uses the DIF function to produce a vector that contains the difference between adjacent elements of the input vector. That is, if x = {x1, x2, x3, ..., xn} is the input vector, then d = diff(x,1,1) is the vector {x2-x1, x3-x2, ..., xn-x[n-1]}. The second argument specifies the lag. The third argument is a flag that specifies whether the result is padded with missing values. (See the Appendix for more information.) By default, the function tests for a weakly increasing sequence.

proc iml;
/* Test whether a sequence of elements is monotonic increasing.
   Valid options are 
   strict=0 : (Default) Return 1 if a sequence is nondecreasing
   strict=1 : Return 1 if a sequence is strictly increasing
*/
start IsIncr(_x, strict=0);
   x = colvec(_x);
   if nrow(x)=1 then return(1);      /* one element is always monotonic! */
   d = dif(x,1,1);                   /* lag=1; delete initial missing value */
   if strict then 
      return( all(d > 0) );
   return( all(d >= 0) );
finish;
 
/* test whether sequences are increasing */
x = {0,2,2,2,6,7,9};
y = {0,1,3,4,6,7,9};
z = {0,1,3,4,2,7,9};
b1 = IsIncr(x);     /* test weakly increasing */
b2 = IsIncr(x, 1);  /* test strictly increasing */
b3 = IsIncr(y, 1);  /* test strictly increasing */
b4 = IsIncr(z);     /* test weakly increasing */
 
print b1 b2 b3 b4;

The IsIncr function is called four times:

  1. The first call returns the value 1 because the elements of x are weakly monotonic (nondecreasing).
  2. The second call returns the value 0 because the elements of x are not strictly increasing.
  3. The third call returns the value 1 because the elements of y are strictly increasing.
  4. The fourth call returns the value 0 because the elements of z are not monotonic.

Test for monotone decreasing in SAS

If a sequence {x1, x2, x3, ...} is monotone increasing, then the sequence obtained by multiplying each element by -1 is monotone decreasing. Therefore, it is trivial to write a function that tests whether a sequence is decreasing: simply test whether the negative of the sequence is increasing! This is accomplished by the following SAS/IML function:

/* Test whether a sequence of elements is monotonic decreasing.
   strict=0 : (Default) Return 1 if a sequence is nonincreasing
   strict=1 : Return 1 if a sequence is strictly decreasing
*/
start IsDecr(x, strict=0);
   return IsIncr(-x, strict);
finish;
 
/* test whether sequence is increasing */
u = {9,8,7,7,6,2,0};
b5 = IsDecr(u);     /* test weakly decreasing */
b6 = IsDecr(u, 1);  /* test strictly decreasing */
print b5 b6;

The sequence is weakly decreasing but not strictly decreasing. The first call to the IsDecr function returns 1. The second call returns 0.

An application: Test whether a transformation is monotonic

I used the IsIncr function to address a general question: Can you detect whether an unknown univariate function is monotonic increasing?

In a recent project, I was modeling the cumulative distribution function (CDF) of an unknown continuous distribution. By definition, a CDF must be increasing, but for some values of the parameters, the model is not an increasing function. I needed to identify these "infeasible" parameter values in a quick and reliable manner.

Mathematically, an increasing function, F, has the property that for every increasing sequence, {x_i}, the image of that sequence, {F(x_i)}, is also increasing. Numerically, you can generate an increasing sequence in the domain and ask whether the image of that sequence is also increasing.

Here's an example. Suppose you want to test whether the following functions are increasing:
F1(x) = (5 - 4*x) log( x/(1-x) )
F2(x) = (5 - 5.2*x) log( x/(1-x) )
The functions are defined on the domain (0, 1). The following program defines an increasing sequence on (0,1) and tests whether the image of the sequence is also increasing for F1 and for F2:

start Func1(x);
   return (5 - 4*x)#log( x/(1-x) );
finish;
start Func2(x);
   return (5 - 5.2*x)#log( x/(1-x) );
finish;
 
dt = 0.005;
x = do(dt, 1-dt, dt);  /* increasing sequence on a fine grid */
y1 = Func1(x);         /* image of sequence under F1 */
y2 = Func2(x);         /* image of sequence under F2 */
b1 = IsIncr(y1);
b2 = IsIncr(y2);
print b1 b2;

The output indicates that the first function in increasing, but the second function is not. The following graph of the second function shows, indeed, that the function is not strictly increasing.

The accuracy of this method depends on choosing a sequence on a fine grid of points in the domain. It assumes that the derivative of the function is bounded so that function cannot quickly increase and decrease on one of the small gaps between consecutive elements of the sequence.

Summary

This article shows a way to test whether a function is monotonic on an interval. A common application is to test for an increasing function, but it is equally easy to test for a decreasing function. You can test whether the function is weakly increasing or strictly increasing.

These numerical tests evaluate a function at a finite set of points. Consequently, they can rule out the hypothesis of monotonicity, but they do not prove that the function is increasing everywhere. Nevertheless, these tests work well in practice if you choose the input sequence on a fine grid in the domain. If you have a mathematical bound on the derivative of the function, you can say more.

Appendix: A useful option for the DIF function

By default, the DIF function in SAS/IML returns a vector that is the same size as the input vector. If k is the lag parameter, the first k elements are missing values. The differences between elements are contained in the elements k+1, k+2, and so forth. If you specify, the value 1 for the third option, the DIF function deletes the leading missing values. The result is a vector that has n – k elements, where n is the length of the input vector and k is the lag parameter.

For example, the following example passes in k=1 and deletes the leading missing value for the result:

t = {0,3,3,2,7};   /* input argument has 5 elements */
d = dif(t, 1, 1);  /* LAG=1; delete leading missing value */
print d;           /* result has 4 elements */

The post A test for monotonic sequences and functions appeared first on The DO Loop.

7月 182022
 

A colleague was struggling to compute a right-tail probability for a distribution. Recall that the cumulative distribution function (CDF) is defined as a left-tail probability. For a continuous random variable, X, with density function f, the CDF at the value x is
    F(x) = Pr(X ≤ x) = ∫ f(t) dt
where the integral is evaluated on the interval (-∞, x). That is, it is the area under the density function "to the left" of x. In contrast, the right-tail probability is the complement of this quantity. The right-tail probability is defined by
    S(x) = Pr(X ≥ x) = 1 – F(x)
It is the area under the density function "to the right" of x.

The right-tail probability is also called the right-side or upper-tail probability. The function, S, that gives the right-tail probability is called the survival distribution function (SDF). Similarly, a quantile that is associated with a probability that is close to 0 or 1 might be called an extreme quantile. This article discusses why a naive computation of an extreme quantile can be inaccurate. It shows how to compute extreme probabilities and extreme quantiles by using special functions in SAS:

Accurate computations of extreme probabilities

My colleague contacted me to discuss some issues related to computing an extreme quantile for p ≈ 1. To demonstrate, I will use the standard lognormal distribution, whose PDF is shown to the right. (However, this discussion applies equally well to other distributions.) When p is small, the lognormal quantile is close to 0. When p is close to 1, the lognormal quantile is arbitrarily large because the lognormal distribution is unbounded.

The following SAS DATA step uses the QUANTILE function to compute extreme quantiles in the left and right tails of the lognormal distribution. The quantiles are associated with the probabilities p=1E-k and p=1 – 1E-k for k=8, 9, 10, ..., 15.

%let Distrib = Lognormal;
data Have;
do pow = -8 to -15 by -1;
   p = 10**pow;                   
   Lx = quantile("&Distrib", p);   /* left-tail quantile: find x such that CDF(x) = p */
   Rp = 1 - p;                     /* numerically, 1-p = 1 for p < MACEPS */
   Rx = quantile("&Distrib", Rp);  /* right-tail quantile: find x such that CDF(x) = 1-p */
   output;
end;
format Rp 18.15;
run;
 
proc print noobs; run;

The second column shows the probability, p, and the Lx column shows the corresponding left-tail quantile. For example, the first row shows that 1E-8 is the probability that a standard lognormal random variate has a value less than 0.00365. The Rp column shows the probability 1 – p. The Rx column shows the corresponding quantile. For example, the first row shows that 1 – 1E-8 is the probability that a standard lognormal random variate has a value greater than 273.691.

From experimentation, it appears that the QUANTILE function returns a missing value when the probability argument is greater than or equal to 1 – 1E-12. The SAS log displays a note that states
    NOTE: Argument 2 to function QUANTILE('Normal',1) at line 5747 column 9 is invalid.
This means that the function will not compute extreme-right quantiles. Why? Well, I can think of two reasons:

  • When p is very small, the quantity 1 – p does not have full precision. (This is called the "loss-of-precision problem" in numerical analysis.) In fact, if p is less than machine epsilon, the expression 1 – p EXACTLY EQUALS 1 in floating-point computations! Thus, we need a better way to specify probabilities that are close to 1.
  • For unbounded distributions, the CDF function for an unbounded distribution is extremely flat for values in the right tail. Because the pth quantile is obtained by solving for the root of the function CDF(x) - p = 0, the computation to find an accurate extreme-right quantile is numerically imprecise for unbounded distributions.

Compute extreme quantiles in SAS

The previous paragraph does not mean that you can't compute extreme-right quantiles. It means that you should use a function that is specifically designed to handle probabilities that are close to 1. The special function is the SQUANTILE function, which uses asymptotic expansions and other tricks to ensure that extreme-right quantiles are computed accurately. If x is the value returned by the SQUANTILE function, then 1 – CDF(x) = p. You can, of course, rewrite this expression to see that x satisfies the equation CDF(x) = 1 – p.

The syntax for the SQUANTILE function is almost the same as for the QUANTILE function. For any distribution, QUANTILE("Distrib", 1-p) = SQUANTILE("Distrib", p). That is, you add an 'S' to the function name and replace 1 – p with p. Notice that by specifying p instead of 1 – p, the right-tail probability can be specified more accurately.

The following SAS DATA step is a modification of the previous DATA step. It uses the SQUANTILE function to compute the extreme-right quantiles:

data Want;
do pow = -8 to -15 by -1;
   p = 10**pow;
   Lx = quantile("&Distrib", p);   /* find x such that CDF(x) = p */
   Rp = 1 - p;
   Rx = squantile("&Distrib", p);  /* find x such that 1-CDF(x)=p */
   output;
end;
format Rp 18.15;
drop pow;
run;
 
proc print noobs; run;

The table shows that the SQUANTILE function can compute quantiles that are far in the right tail. The first row of the table shows that the probability is 1E-8 that a lognormal random variate exceeds 273.69. (Equivalently, 1 – 1E-8 is the probability that the variate is less than 273.69.) The last row of the table shows that 1E-15 is the probability that a lognormal random variate exceeds 2811.14.

Compute extreme probabilities in SAS

When p is small, the "loss-of-precision problem" (representing 1 – p in finite precision) also affects the computation of the cumulative probability (CDF) of a distribution. That is, the expression CDF("Distrib", 1-p) might not be very accurate because 1 – p loses precision when p is small. To counter that problem, SAS provides the SDF function, which computes the survival distribution function (SDF). The SDF returns the right-tail quantile x for which 1 – CDF(x) = p. This is equivalent to finding the left-tail quantile that satisfies CDF(x) = 1 – p.

The following DATA step computes several one-sided probabilities. For each value of x, the program computes the probability that a random lognormal variate is greater than x:

%let Distrib = lognormal;
data SDF;
do x = 500 to 2500 by 500;              /* extreme quantiles  */
   Rp       = 1 - cdf("&Distrib", x);   /* right-tail probability */
   RpBetter = sdf("&Distrib", x);       /* better computation of right-tail probability */
   output;
end;
run;
 
proc print noobs; run;

The difference between the probabilities in the Rp and RpBetter columns are not as dramatic as for the quantiles in the previous section. That is because the quantity 1 – CDF(x) does not lose precision when x is large because CDF(x) ≈ 1. Nevertheless, you can see small differences between the columns, and the RpBetter column will tend to be more accurate when x is large.

Summary

This article shows two tips for working with extreme quantiles and extreme probabilities. When the probability is close to 1, consider using the SQUANTILE function to compute quantiles. When the quantile is large, consider using the SDF function to compute probabilities.

The post Tips for computing right-tail probabilities and quantiles appeared first on The DO Loop.

6月 272022
 

Recently, I needed to solve an optimization problem in which the objective function included a term that involved the quantile function (inverse CDF) of the t distribution, which is shown to the right for DF=5 degrees of freedom. I casually remarked to my colleague that the optimizer would have to use finite-difference derivatives because "this objective function does not have an analytical derivative." But even as I said it, I had a nagging thought in my mind. Is that statement true?

I quickly realized that I was wrong. The quantile function of ANY continuous probability distribution has an analytical derivative as long as the density function (PDF) is nonzero at the point of evaluation. Furthermore, the quantile function has an analytical expression for the second derivative if the density function is differential.

This article derives an exact analytical expression for the first and second derivatives of the quantile function of any probability distribution that has a smooth density function. The article demonstrates how to apply the formula to the t distribution and to the standard normal distribution. For the remainder of this article, we assume that the density function is differentiable.

The derivative of a quantile function

It turns out that I had already solved this problem in a previous article in which I looked at a differential equation that all quantile functions satisfy. In the appendix of that article, I derived a relationship between the first derivative of the quantile function and the PDF function for the same distribution. I also showed that the second derivative of the quantile function can be expressed in terms of the PDF and the derivative of the PDF.

Here are the results from the previous article. Let X be a random variable that has a smooth density function, f. Let w = w(p) be the p_th quantile. Then the first derivative of the quantile function is
\(\frac{dw}{dp} = \frac{1}{f(w(p))}\)
provided that the denominator is not zero. The second derivative is
\(\frac{d^2w}{dp^2} = \frac{-f^\prime(w)}{f(w)} \left( \frac{dw}{dp} \right)^2 = \frac{-f^\prime(w)}{f(w)^3}\).

Derivatives of the quantile function for the t distribution

Let's confirm the formulas for the quantile function of the t distribution with five degrees of freedom. You can use the PDF function in SAS to evaluate the density function for many common probability distributions, so it is easy to get the first derivative of the quantile function in either the DATA step or in the IML procedure, as follows:

proc iml;
reset fuzz=1e-8;
DF = 5;
/* first derivative of the quantile function w = F^{-1}(p) for p \in (0,1) */
start DTQntl(p) global(DF);
   w = quantile("t", p, DF);  /* w = quantile for p */
   fw = pdf("t", w, DF);      /* density at w */
   return( 1/fw );
finish;
 
/* print the quantile and derivative at several points */
p = {0.1, 0.5, 0.75};
Quantile = quantile("t", p, DF);  /* w = quantile for p */
DQuantile= DTQntl(p);
print p Quantile[F=Best6.] DQuantile[F=Best6.];

This table shows the derivative (third column) for the quantiles of the t distribution. If you look at the graph of the quantile function, these values are the slope of the tangent lines to the curve at the given values of the p variable.

Second derivatives of the quantile function for the t distribution

With a little more work, you can obtain an analytical expression for the second derivative. It is "more work" because you must use the derivative of the PDF, and that formula is not built-in to most statistical software. However, derivatives are easy (if unwieldy) to compute, so if you can write down the analytical expression for the PDF, you can write down the expression for the derivative.

For example, the following expression is the formula for the PDF of the t distribution with ν degrees of freedom:

\( f(x)={\frac {\Gamma ({\frac {\nu +1}{2}})} {{\sqrt {\nu \pi }}\,\Gamma ({\frac {\nu }{2}})}} \left(1+{\frac {x^{2}}{\nu }}\right)^{-(\nu +1)/2} \)
Here, \(\Gamma\) is the gamma function, which is available in SAS by using the GAMMA function. If you take the derivative of the PDF with respect to x, you obtain the following analytical expression, which you can use to compute the second derivative of the quantile function, as follows:
/* derivative of the PDF for the t distribution */
start Dtpdf(x) global(DF);
   nu = DF;
   pi = constant('pi');
   A = Gamma( (nu +1)/2 ) / (sqrt(nu*pi) * Gamma(nu/2));
   dfdx = A * (-1-nu)/nu # x # (1 + x##2/nu)##(-(nu +1)/2 - 1); 
   return dfdx;
finish;
 
/* second derivativbe of the quantile function for the t distribution */
start D2TQntl(p) global(df);
   w = quantile("t", p, df);
   fw = pdf("t", w, df);
   dfdx = Dtpdf(w);
   return( -dfdx / fw##3 );
finish;
 
D2Quantile = D2TQntl(p);
print p Quantile[F=Best6.] DQuantile[F=Best6.] D2Quantile[F=Best6.];

Derivatives of the normal quantiles

You can use the same theoretical formulas for the normal quantiles. The second derivative, however, does not require that you compute the derivative of the normal PDF. It turns out that the derivative of the normal PDF can be written as the product of the PDF and the quantile function, which simplifies the computation. Specifically, if f is the standard normal PDF, then \(f^\prime(w) = -f(w) w\).
and so the second derivative of the quantile function is
\(\frac{d^2w}{dp^2} = w \left( \frac{dw}{dp}\right)^2\).
This is shown in the following program:

/* derivatives of the quantile function for the normal distribution */
proc iml;
/* first derivative of the quantile function */
start DNormalQntl(p);
   w = quantile("Normal", p);
   fw = pdf("Normal", w);
   return 1/fw;
finish;
/* second derivative of the quantile function */
start D2NormalQntl(p);
   w = quantile("Normal", p);
   dwdp = DNormalQntl(p);
   return( w # dwdp##2 );
finish;
 
p = {0.1, 0.5, 0.75};
Quantile = quantile("Normal", p);
DQuantile = DNormalQntl(p);
D2Quantile = D2NormalQntl(p);
print p Quantile[F=Best6.] DQuantile[F=Best6.] D2Quantile[F=Best6.];

Summary

In summary, this article shows how to compute the first and second derivatives of the quantile function for any probability distribution (with mild assumptions for the density function). The first derivative of the quantile function can be defined in terms of the quantile function and the density function. For the second derivative, you usually need to compute the derivative of the density function. The process is demonstrated by using the t distribution. For the standard normal distribution, you do not need to compute the derivative of the density function; the derivatives depend only on the quantile function and the density function.

The post The derivative of a quantile function appeared first on The DO Loop.

4月 252022
 

To a numerical analyst, numerical integration has two meanings. Numerical integration often means solving a definite integral such as \(\int_{a}^b f(s) \, ds\). Numerical integration is also called quadrature because it computes areas. The other meaning applies to solving ordinary differential equations (ODEs). My article about new methods for solving ODEs in SAS uses the words "integrate" and "integrator" a lot. This article caused a colleague to wonder about the similarities and differences between integration as a method for solving a definite integral and integration as a method for solving an initial-value problem for an ODE. Specifically, he asked whether you could use an ODE solver to solve a definite integral?

It is an intriguing question. This article shows that, yes, you can use an ODE solver to compute some definite integrals. However, just as a builder should use the most appropriate tools to build a house, so should a programmer use the best tools to solve numerical problems. Just because you can solve an integral by using an ODE does not mean that you should discard the traditional tried-and-true methods of solving integrals.

The most famous differential equation in probability and statistics

There are many famous differential equations in physics and engineering because Newton's Second Law is a differential equation. In probability and statistics, the most famous differential equation is the relationship between the two fundamental ways of looking at a continuous distribution. Namely, the derivative of the cumulative distribution function (CDF) is the probability density function (PDF). We usually write the relationship \(F^\prime(x) = f(x)\), where F is the CDF and f is the PDF. Thus, the CDF is the solution of a first-order differential equation that has the PDF on the right-hand side. If you know the PDF and an initial condition, you can solve the differential equation to find the CDF.

However, there is a problem. By integrating both sides of the differential equation, you obtain \(F(x) = \int_{-\infty}^x f(s) \, ds\). Because the lower limit of integration is -∞, you cannot solve the differential equation as an initial-value problem (IVP). An initial-value problem requires that you know the value of the function for some finite value of the lower limit.

However, you can get around this issue for some specific distributions. Some distributions have a lower limit of 0 instead of -∞. If the density at x=0 is finite, you can solve the ODE for those distributions.

You can also use an ODE to solve for the CDF of symmetric distributions. If f is a symmetric distribution, the point of symmetry is the median value of the distribution. So, if s0 is the symmetric value, you can solve the ODE \(F^\prime(x) = f(x)\) for any value xs0 by using the initial condition \(F(s_0) = 0.5\). Thus, you can use an IVP solve any integral of the form \(\int_{s_0}^b f(s) \, ds\). By symmetry, you can solve any integral of the form \(\int_a^{s_0} f(s) \, ds\). By combining these results, you can use an IVP to solve the integral on any finite-length interval [a, b].

This symmetry idea is shown in the graph to the right for the case of the standard normal distribution. The normal distribution is symmetric about the median s0=0. This means that the CDF passes through the point (0, 1/2). You can integrate the ODE to obtain the CDF curve for any positive value of x. The computations are shown in the next section.

Solve an integral to find the normal probability

I like to know the correct answer BEFORE I do any computation. The usual way to compute an integral in SAS is to use the QUAD subroutine in SAS/IML software. (QUAD is short for quadrature.) This subroutine is designed to compute one-dimensional integrals. To use the QUAD subroutine, you need to define the integrand and interval of integration. For example, the following statements define the integral for the normal PDF on the interval [0, 3]:

proc iml;
/* Use QUAD subroutine to integrate the standard normal PDF.
   Note: For this integrand, you can call the CDF function to compute the integral:
         CDF = cdf("Normal", 3) - 0.5; */
start NormalPDF(x);
   return pdf("Normal", x);   /* f(x) = 1/sqrt(2*pi) *exp(-x##2/2) */
finish;
 
/* find the area under pdf on [0, 3] */
call quad(Area, "NormalPDF", {0 3});
print Area;

The QUAD subroutine makes it simple to integrate any continuous function on a finite interval. In fact, the QUAD subroutine can also integrate over infinite intervals! And, if you specify the points of discontinuity, it can integrate functions that are not continuous at finitely many points.

Solve an ODE to find the normal probability

Just for fun, let's see if we can get the same answer by using the ODE subroutine to solve a differential equation. We want to compute the integral \(F(a) = \int_0^a \phi(s) \, ds\), where φ is the standard normal PDF. The equivalent formulation as a differential equation is \(F^{\prime}(x) = \phi(x)\), where F(0)=0.5. For this example, we want to solution at x=3.

The module that evaluates the right-hand side of the ODE must take two arguments. The first argument is the independent variable, which is x. The second argument is the state of the system, which is not used for this problem. You also need to specify the initial condition, which is F(0)=0.5. Lastly, you also need to specify the limits of integration which is [0, 3]. The following statements use the ODE subroutine in SAS/IML software to solve the IVP defined by F`(x)=PDF("Normal",x) and F(0)=0.5:

/* ODE is A` = 1/sqrt(2*pi) *exp(-x##2/2) 
   A(0) = 0.5  */
start NormalPDF_ODE(s, y);
   return pdf("Normal", s);
finish;
 
c = 0.5;                     /* initial value at x=0 */
x = {0 3};                  /* integrate from x=0 to x=3 */
h={1.e-7, 1, 1e-4};          /* (min, max, initial) step size for integrator */
call ode(result, "NormalPDF_ODE", c, x, h) eps=1e-7;
A = result - c;
print A;

Success! The ODE gives a similar numerical result for the integral of the PDF, which is the CDF at x=3.

Integrating an ODE is a poor way to compute an integral

Although it is possible to use an ODE to solve some integrals on a finite domain, this result is more of a curiosity than a practical technique. A better choice is to use a routine designed for numerical integration, such as the QUAD subroutine. The QUAD subroutine can integrate on any domain, including infinite domains. For example, the following statements integrate the normal PDF on [-3, 3] and (-∞, 3), respectively.

call quad(Area2, "NormalPDF", {-3 3});  /* area on [-3, 3] */
call quad(Area3, "NormalPDF", {0 .P});  /* area on [0, infinity) */
print Area2 Area3;

In addition, the QUAD subroutine can be used to compute double integrals by using iterated integrals. As far as I know, you cannot use an ODE to compute a double integral.

Whereas the ODE subroutine requires knowing an initial value on the curve, the QUAD routine does not. The ODE subroutine also requires separate calls to integrate forward and backward from the initial condition, whereas the QUAD routine does not.

Summary

The phrase "numerical integration" has two meanings in numerical analysis. The usual meaning is "quadrature," which means to evaluate a definite integral. However, it is also used to describe the numerical solution of an ODE. In certain cases, you can use an ODE to solve a definite integral on a bounded domain. However, the ODE requires special additional information such as knowing an "initial condition." So, yes, you can (sometimes) solve integrals by using a differential equation, but the process is more complicated than if you use a standard method for solving integrals.

The post Two perspectives on numerical integration appeared first on The DO Loop.

4月 112022
 

Many discussions and articles about SAS Viya emphasize its ability to handle Big Data, perform parallel processing, and integrate open-source languages. These are important issues for some SAS customers. But for customers who program in SAS and do not have Big Data, SAS Viya is attractive because it is the development environment in which SAS R&D adds new features, options, and algorithms. Since 2018, most new development has been in SAS Viya. This article discusses one new feature in the SAS IML product: new state-of-the-art algorithms for solving ordinary differential equations (ODEs) that are formulated as initial value problems (IVPs).

In statistics, differential equations arise in several areas, including compartment models in the field of pharmacokinetics. The example in this article is a compartment model, although it is from epidemiology, not pharmacokinetics.

Solving ODEs in SAS

The SAS/IML product introduced the ODE subroutine way back in SAS 6. The ODE subroutine solves IVPs by using an adaptive, variable-order, variable-step-size integrator. This integrator is very good for a wide assortment of problems. The integrator was state-of-the-art in the 1980s, but researchers have made a lot of progress in solving ODEs since then.

The new ODEQN subroutine was introduced in Release 2021.1.5 of SAS IML software. See the documentation for a full description of the features of the ODEQN subroutine. Briefly, it has a simple calling syntax and supports about 36 different integration methods, including the following:

  • BDF: backward differentiation formulas for stiff problems
  • Adams-Moulton method for non-stiff problems
  • Hybrid (BDF + Adams): Automatic detection of stiffness and switching
  • Explicit Runge-Kutta methods: Popular solvers in the sciences and engineering fields

Each class of solvers supports several "orders," which equate to low- or high-order Taylor series approximations.

The new ODEQN subroutine has several advantages over the previous solver, but the following features are noteworthy:

  • Self-starting: the subroutine automatically chooses an initial step size
  • Variable-order methods use higher-order approximations to take larger steps, when possible
  • Automatically detect stiffness (Hybrid method)
  • Support piecewise-defined vector fields
  • Support backward-in-time integration
  • BLAS-aware linear algebra scales to handle large systems

Example: A basic SIR model in epidemiology

Many introductory courses in differential equations introduce the SIR model in epidemiology. The SIR model is a simple model for the transmission and recovery of a population that is exposed to an infectious pathogen. More sophisticated variations of the basic SIR model have been used by modelers during the COVID-19 pandemic. The SIR model assumes that the population is classified into three populations: Susceptible to the disease, Infected with the disease, and Recovered from the disease.

In the simplest SIR model, there are no births and no deaths. Initially, almost everyone is susceptible except for a small proportion of the population that is infected. The susceptible population can become infected by interacting with a member of the infected population. Everyone in the infected population recovers, at which point the model assumes that they are immune to reinfection. The SIR model has two parameters. One parameter (b) controls the average rate at which a susceptible-infected interaction results in a new infected person. Another parameter (k) represents how quickly an infected person recovers and is no longer infectious.

The following SAS IML module defines the SIR equations, which model the proportion of the population that is susceptible, infected, and recovered. Initially, 99.99% of the population is susceptible, 0.01% is infected, and 0% is recovered. The new ODEQN subroutine in SAS Viya solves the system of equations for 100 units of time. The output of the routines is a 100 x 3 matrix. The i_th row is the proportion of the population in each category after i units of time have passed. Notice that the syntax of the ODEQN subroutine is quite simple: You specify the module that defines the equations, the initial conditions, and the time points at which you want a solution.

/* Program requires SAS IML in SAS Viya 2021.1.5 */
proc iml;
/* SIR model: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology */
start SIR(t, y) global(b, k);
   S = y[1]; I = y[2]; R = y[3];
   dSdt = -b * S*I;
   dIdt =  b * S*I - k*I;
   dRdt =            k*I;
   return ( dSdt || dIdt || dRdt );
finish;
 
/* set parameters and initial conditions */
b = 0.5;      /* b param: transition rate from S to I */
k = 1/14;     /* k param: transition rate from I to R */
IC = {0.9999  0.0001   0};   /* S=99.99% I=0.01%  R=0 */
t = T(0:100);  /* compute spread for each day up to 100 days */
 
call odeqn(soln, "SIR", IC, t);  /* in SAS Viya only */

You can graph the solution of this system by plotting the time-evolution of the three populations:

/* write solution to SAS data set */
S = t || soln;
create SIROut from S[c={'Time' 'Susceptible' 'Infected' 'Recovered'}];
append from S;
close;
QUIT;
 
title "Solution to SIR Model";
title2 "b = 1/2; k = 1/14";
proc sgplot data=SIROut;
   series x=Time y=Susceptible;
   series x=Time y=Infected;
   series x=Time y=Recovered;
   xaxis grid; yaxis grid label="Percent of Group";
run;

For the parameters (b, k) = (0.5, 0.07), the infected population rapidly increases until Day 25 when almost 60% of the population is infected. By this time, the population of susceptible people is only about 10% of the population. For these parameters, nearly everyone in the population becomes infected by Day 60. The highly infectious (but not lethal) epidemic is essentially over by Day 80 when almost everyone has been infected, recovered, and is now immune.

Example: An SIR model for a change in behavior

One of the advantages of mathematical models is that you can simulate "what if" scenarios. For example, what if everyone in the population changes their behavior on Day 20 to reduce the transmission rate? This might include the adoption of non-pharmaceutical interventions such as mask-wearing, social distancing, and quarantining the infected. What if these behaviors change the effective rate of transmission from 0.5 (very high) to 0.1 (moderate)? You can modify the module that defines the ODE to reflect that parameter change at t=20. As mentioned earlier, the ODEQN subroutine can integrate piecewise-defined systems if you specify the time at which the system changes its behavior. The following call to the ODEQN subroutine uses the TDISCONT= option to tell the integrator that the system of ODEs changes at t=20.

%let TChange = 20;
start SIR(t, y) global(k);
   if t < &TChange then b = 0.5;   /* more spread */
   else                 b = 0.1;   /* less spread */
 
   S = y[1]; I = y[2]; R = y[3];
   dSdt = -b * S*I;
   dIdt =  b * S*I - k*I;
   dRdt =            k*I;
   return ( dSdt || dIdt || dRdt );
finish;
 
/* set parameters and initial conditions */
k = 1/14;     /* k param: transition rate from I to R */
IC = {0.9999  0.0001   0};   /* S=99.99% I=0.01%  R=0 */
t = T(0:100);  /* compute spread for each day up to 100 days */
call odeqn(soln, "SIR", IC, t) TDISCON=&TChange;

If you plot the new solutions, you obtain the following graph:

In the new scenario, the increase in the infected population is reversed by the intervention on Day 20. From there, the proportion of infected persons decreases over time. Eventually, the disease dies out. By Day 100, 75% of the population was infected and recovered, leaving 25% that are still susceptible to a future outbreak.

Summary

Although moving to SAS Viya offers many advantages for SAS customers who use Big Data and want to leverage parallel processing, Viya also provides new features for traditional SAS programmers. One example is the new ODEQN subroutine in SAS IML, which solves ODEs that are initial value problems by using state-of-the-art numerical algorithms. In addition to computational improvements in accuracy and speed, the ODEQN subroutine has a simple syntax and makes many decisions automatically, which makes the routine easier for non-experts to use.

The post New methods for solving differential equations in SAS appeared first on The DO Loop.

4月 062022
 

The graph to the right is the quantile function for the standard normal distribution, which is sometimes called the probit function. Given any probability, p, the quantile function gives the value, x, such that the area under the normal density curve to the left of x is exactly p. This function is the inverse of the normal cumulative distribution function (CDF).

Did you know that this curve is also the solution of a differential equation? You can write a nonlinear second-order differential equation whose solution is exactly the probit function! This means that you can compute quantiles by solving a differential equation.

This article derives the differential equation and solves it by using the built-in ODE subroutine in SAS/IML software.

A differential equation for quantiles

Let w(p) = PROBIT(p) be the standard normal quantile function. The w function satisfies the following differential equation:
\(\frac{d^2w}{dp^2} = w \left( \frac{dw}{dp}\right)^2\)
The derivation of this formula is shown in the appendix.

A differential equation looks scary, but it merely reveals a relationship between the derivatives of a function at every point in the domain. For example: at the point p=1/2, the value of the probit function is w(1/2) = 0, and the slope of the function is w'(1/2) = sqrt(2 π). From those two values, the differential equation tells you that the second derivative at p=0 is exactly 0*2π = 0.

The appendix shows how to use calculus to prove the relationship. You can also use the NLPFDD subroutine in SAS/IML to compute finite-difference derivatives at an arbitrary value of p. The derivatives satisfy the differential equation to within the accuracy of the computation.

Use the differential equation to compute quantiles

An interesting application of the ordinary differential equation (ODE) is that you can use it to compute quantiles. At the point, p=1/2, the probit function has the value w(1/2) = 0, and the slope of the function is w'(1/2) = sqrt(2 π). Therefore, the probit function is the particular solution to the previous ODE for those initial conditions. Consequently, you can obtain the quantiles by solving an initial value problem (IVP).

I previously showed how to use the ODE subroutine in SAS/IML to solve initial value problems. For many IVPs, the independent variable represents time, and the solution represents the evolution of the state of the system, given an initial state. For this IVP, the independent variable represents probability. By knowing the quantile of the standard normal distribution at one location (p=1/2), you can find any larger quantile by integrating forward, or any smaller quantile by integrating backward.

The ODE subroutine solves systems of first-order differential equations, so you must convert the second-order equation into a system that contain two first-order equations. Let v = dw/dp = w`. Then the differential equation can be written as the following first order system:

   w` = v
   v` = w * v**2

The following SAS/IML statement use the differential equation to find the quantiles of the standard normal distribution for p={0.5, 0.68, 0.75, 0.9, 0.95, 0.99}:

proc iml;
/* Solve first-order system of ODES with IC
   w(1/2) = 0
   v(1/2) = sqrt(2*pi)
*/
start NormQntl(t,y);
   w = y[1]; v = y[2];
   F = v  // w*v**2;
   return(F);
finish;
 
/* initial conditions at p=1/2 */
pi = constant('pi');
c = 0 || sqrt(2*pi);  
 
/* integrate forward in time to find these quantiles */
p = {0.5, 0.68, 0.75, 0.9, 0.95, 0.99};
 
/* ODE call requires a column vector for the IC, and a row vec for indep variable */ 
c = colvec(c);               /* IC */
t = rowvec(p);               /* indep variable */
h={1.e-7, 1, 1e-4};          /* (min, max, initial) step size for integrator */
call ode(result, "NormQntl", c, t, h) eps=1e-6;
Traj  = (c` // result`);     /* solutions in rows. Transpose to columns */
qODE = Traj[,1];             /* first column is quantile; second col is deriv */
 
/* compare with the correct answers */
Correct = quantile("Normal", p);
Diff = Correct - qODE;       /* difference between ODE solution and QUANTILE */
print p Correct qODE Diff;

The output shows the solution to the system of ODEs at the specified values of p. The solution is very close to the more exact quantiles as computed by the QUANTILE function. This is fairly impressive if you think about it. The ODE subroutine has no idea that it is solving for normal quantiles. The subroutine merely uses a general-purpose ODE solver to generate a solution at the specified values of p. In contrast, the QUANTILE routine is a special-purpose routine that is engineered to produce the best-possible approximations of the normal quantiles.

To get quantiles for p < 0.5, you can use the symmetry of the normal distribution. The quantiles satisfy the relation w(1-p) = -w(p).

Summary

A differential equation reveals the relationship between a function and its derivatives. It turns out that there is an ordinary differential equation that is satisfied by the quantile function of any smooth probability distribution. This article shows the ODE for the standard normal distribution. You can solve the ODE in SAS to compute quantiles if you also know the value of the quantile function and its derivative at some point, such as a median at p=1/2.

Appendix: Derive the quantile ODE

For the mathematically inclined, this section derives a differential equation for any smooth probability distribution. The equations are given in a Wikipedia article, but the article does not include a derivation.

Let X be a random variable that has a smooth CDF, F. The density function, f, is the first derivative of F. By definition, the quantile function, w, is the inverse of the CDF function. That means that for any probability, p, the CDF and quantile function satisfy the equation \(F(w(p)) = p\).

Take the deivative of both sides with respect to p and apply the chain rule:
\(F^\prime(w(p)) \frac{dw}{dp} = 1\)
Substitute f for the derivative and solve for dw/dp:
\(\frac{dw}{dp} = \frac{1}{f(w(p))}\)

Again, take the derivative of both sides:
\(\frac{d^2w}{dp^2} = \frac{-1}{f(w(p))^2} \frac{df}{dp} \frac{dw}{dp}\).
You can use the relationship dw/dp = 1/f to obtain
\(\frac{d^2w}{dp^2} = \frac{-f(w)^\prime}{f(w)} \left( \frac{dw}{dp} \right)^2\).
In this derivation, I have not used any special properties of the distribution. Therefore, it is satisfied by any probability distribution that is sufficiently smooth and for any values of p for which the equation is well-defined.

The ODE simplifies for some probability density functions. For example, the standard normal density is \(f(w) = \frac{1}{\sqrt{2\pi}} \exp(-w^2/2 )\) and thus
\(f^\prime(w) = \frac{1}{\sqrt{2\pi}} \exp(-w^2/2 )(-w) = -f(w) w\).
Consequently, for the standard normal distribution, the ODE simplifies to
\(\frac{d^2w}{dp^2} = w \left( \frac{dw}{dp}\right)^2\).

The post A differential equation for quantiles of a distribution appeared first on The DO Loop.

3月 212022
 

Statistical programmers need to access numerical constants that help us to write robust and accurate programs. Specifically, it is necessary to know when it is safe to perform numerical operations such as raising a number to a power without exceeding the largest number that is representable in finite-precision arithmetic. This article discusses five constants that every statistical programmer must know: PI, MACEPS, EXACTINT, BIG, and SMALL. In the SAS language, you can find the values of these constants by using the CONSTANT function in Base SAS.

The following table shows the values of the constants that are discussed in this article:

data Constants;
length Const $10;
format Value Best12.;
input Const @@;
Value = constant(Const);
datalines;
PI MACEPS EXACTINT 
BIG LOGBIG SQRTBIG SMALL
;
 
proc print noobs; run;

Pi and other mathematical constants

The CONSTANT function provides values for various mathematical constants. Of these, the most important constant is π ≈ 3.14159..., but you can also obtain other mathematical constants such as the natural base (e) and the golden ratio (φ). To get an approximation for π, use pi = constant('pi'). The number π is essential for working with angles and trigonometric functions. For an example, see the article, "Polygons, pi, and linear approximations," which uses π to create regular polygons.

Machine epsilon

Arguably the most important constant in computer science is machine epsilon. This number enables you to perform floating-point comparisons such as deciding whether two numbers are equal or determining the rank of a numerical matrix. To get machine epsilon, use eps = constant('maceps').

The largest representable integer

It is important to know the largest number that can be represented accurately in finite precision on your machine. This number is available as bigInt = constant('exactint'). This number enables you to find the largest factorial number that you can compute (18!) and the largest row of Pascal's triangle (56) that you can faithfully represent in double precision (56).

The largest representable double

Except for π, the constants I use the most are related to BIG ≈ 1.8E308. I primarily use LOGBIG and SQRTBIG, which you can compute as
logbig = constant('logbig');
sqrtbig = constant('sqrtbig');
These constants are useful for preventing overflow when you perform arithmetic operations on large numbers:

  • The quantity exp(x) can only be computed when x is less than LOGBIG ≈ 709.78.
  • The LOGBIG option supports the BASE suboption (BASE > 1), which you can use to ensure that raising a number to a power does not overflow. For example, constant('logbig', 2) returns 1024 because 2**1024 is the largest power of 2 that does not exceed BIG.
  • The SQRTBIG option tells you whether you can square a number without overflowing.

The smallest representable double

What is the smallest positive floating-point number on your computer? It is given by small = constant('small'). There are also LOGSMALL and SQRTSMALL versions, which you can use to prevent overflow. I don't use these constants as frequently as their 'BIG' counterparts. In my experience, underflow is usually not a problem in SAS.

Summary

This article discusses five constants that every statistical programmer must know: PI, MACEPS, EXACTINT, BIG, and SMALL. Whether you need mathematical constants (such as π) will depend on the programs that you write. The MACEPS constant is used to compare floating-point numbers. The other constants are used by computer scientists and numerical analysts to ensure that programs can correctly compute with very large (or very small) numbers without encountering floating-point overflow (or underflow).

The post Five constants every statistical programmer should know appeared first on The DO Loop.

3月 162022
 

A previous article showed how to use SAS to compute finite-difference derivatives of smooth vector-valued multivariate functions. The article uses the NLPFDD subroutine in SAS/IML to compute the finite-difference derivatives. The article states that the third output argument of the NLPFDD subroutine "contains the matrix product J`*J, where J is the Jacobian matrix. This product is used in some numerical methods, such as the Gauss-Newton algorithm, to minimize the value of a vector-valued function."

This article expands on that statement and shows that you can use the SAS/IML matrix language to implement the Gauss-Newton method by writing only a few lines of code.

The NLPFDD subroutine has been in SAS/IML software since SAS version 6. In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer finite-difference function that has a syntax similar to the NLPFDD subroutine.

A review of least-squares minimization

Statisticians often use least-squares minimization in the context of linear or nonlinear least-squares (LS) regression. In LS regression, you have an observed set of response values {Y1, Y2, ..., Yn}, and you fit a parametric model to obtain a set of predicted values {P1, P2, ..., Pn}. These predictions depend on parameters, and you try to find the parameters that minimize the sum of the squared residuals, || Pi - Yi ||2 for i=1, 2, ..., n.

In this formula, the parameters are not explicitly written, but this formulation is the least-squares minimization of a function that maps each parameter to the sum of squared residuals. Abstractly, if F: Rn → Rm is a smooth vector-valued function with m components, then you can write it as F(x) = (F1(x), F2(x), ..., Fm(x)). In most applications, the dimension of the domain is less than the dimension of the range: n < m. The goal of least-square minimization is to find a value of x that minimizes the sum of the squared components of F(x). In vector form, we search for a value of x that (locally) minimizes || F(x) ||2. Equivalently, the problem is often formulated so that you minimize (1/2) || F(x) ||2. The factor of 1/2 simplifies the expression for the derivative of the objective function.

Visualize a least-squares function

The previous article used the following vector-valued function:

proc iml;
 
q = {0, 2, 1};               /* define a point in R^3 */
start VecF(st) global(q);    /* compute vector from q to a point on a cylinder */
   s = st[1]; t = st[2];
   F = j(3, 1, .);
   F[1] = cos(s) - q[1];
   F[2] = sin(s) - q[2];
   F[3] =     t  - q[3];
   return( F );
finish;

Geometrically, this function represents the vector from q = (0, 2, 1) to a point on a cylinder in R3. Therefore, the vector-valued function is minimized when (s,t) = (pi/2, 1), since that is the point on the cylinder closest to the point q, which is not on the cylinder.

Let's visualize the function (1/2) || F(s,t) ||2 for (s,t) values near (pi/2. 1). You can use the EXPANDGRID function to create a uniform grid of points in the (s,t) domain. For each point on the grid, you can use the following statements to compute (one-half) the sum of the squares of the vector-valued function. To compute the sum of squared components, you can use the SSQ function or the NORM function, but I have decided to use the inner-product F`*F.

/* create heat map of VecF */
s = do(1, 2.1, 0.02);       /* horiz grid points */
t = do(0.5, 1.5, 0.02);     /* vert grid points */
st = expandgrid(s, t);
z = j(nrow(st), 1);
do i = 1 to nrow(st);
   F = VecF( st[i,] ); 
   z[i] = 0.5 * F`*F;       /* 1/2 sum of squares of components */
end;
 
result = st||z;
create Heat from result[c={'s' 't' 'z'}]; append from result; close;
 
submit;
title "1/2 || F(x,y) ||^2";
proc sgplot data=Heat;
   heatmapparm x=s y=t colorresponse=z / colormodel=ThreeColorRamp;
   refline 1.571/axis=x label='pi/2' labelpos=min; refline 1/axis=y;
run;
endsubmit;

The graph shows the values of z = (1/2) ||F(s,t)||2 on a uniform grid. The reference lines intersect at (pi/2, 1), which is the minimum value of z.

The Gauss-Newton method for minimizing least-squares problems

One way to solve a least-squares minimization is to expand the expression (1/2) ||F(s,t)||2 in terms of the component functions. You get a scalar function of (s,t), so you can use a traditional optimization method, such as the Newton-Raphson method, which you can call by using the NLPNRA subroutine in SAS/IML. Alternatively, you can use the NLPLM subroutine to minimize the least-squares problem directly by using the vector-valued function. In this section, I show a third method: using matrix operations in SAS/IML to implement a basic Gauss-Newton method from first principles.

The Gauss-Newton method is an iterative method that does not require using any second derivatives. It begins with an initial guess, then modifies the guess by using information in the Jacobian matrix. Since each row in the Jacobian matrix is a gradient of a component function, the Gauss-Newton method is similar to a gradient descent method for a scalar-valued function. It tries to move "downhill" towards a local minimum.

Rather than derive the formula from first principles, I will merely present the Gauss-Newton algorithm, which you can find in Wikipedia and in course notes for numerical analysis (Cornell, 2017):

  1. Make an initial guess for the minimizer, x0. Let x = x0 be the current guess.
  2. Evaluate the vector FF(x) and the Jacobian J ≡ [ ∂Fi / ∂xj ] for i=1,2,..,m and j=1,2...,n. The NLPFDD subroutine in SAS/IML enables you to evaluate the function and approximate the Jacobian matrix with a single call.
  3. Find a direction that decreases || F(x) || by solving the linear system (J`J) h = -J`*F for h.
  4. Take a step in the direction of h and update the guess: xnew = x + α h for some α in (0, 1).
  5. Go to Step 2 until the problem converges.

For any point x in the domain of F, the NLPFDD subroutine returns three results: F, J, and the matrix product J`*J. These are exactly the quantities that are needed to perform the Gauss-Newton method! The following statements implement the algorithm for an initial guess x0 = (2, 0.5):

/* Gauss-Newton line-search method for minimizing || VecF(x) || */
x0 = {2  0.5};       /* initial guess for (s,t) */
parm = {3, 00, .};   /* function has 3 components, use forward difference */
nSteps = 10;         /* take 10 steps regardless of convergence */
alpha = 0.6;         /* step size for line search */
 
path = j(nSteps+1, 2);  /* save and plot the approximations */
path[1,] = x0;
/* start the Gauss-Newton method */
x = x0;
do i = 1 to nSteps;
   call nlpfdd(F, J, JtJ, "VecF", x) par=parm; /* get F, J, and J`J at x */
   h = solve(JtJ, -J`*F);   /* h points downhill */
   x = x + alpha*h`;        /* update the current guess */
   path[i+1,] = x;          /* remember this point */
end;
 
/* write the iterates to a data set */
create path from path[c={'px' 'py'}]; append from path; close;
 
submit alpha;
data All; set Heat path; run;
 
title "Path of Gauss-Newton Minimization";
title2 "alpha = &alpha";
proc sgplot data=All noautolegend;
   heatmapparm x=s y=t colorresponse=z / colormodel=ThreeColorRamp;
   refline 1.571/axis=x label='pi/2' labelpos=min; refline 1/axis=y;
   series x=px y=py / markers;
run;
endsubmit;

The graph overlays the path of the Gauss-Newton approximations on the heat map of (1/2) || F(s,t) ||2. For each iterate, you can see that the search direction appears to be "downhill." A more sophisticated algorithm would monitor the norm of h to determine when to step. You can also use different values of the step length, α, or even vary the step length as you approach the minimum. I leave those interesting variations as an exercise.

The main point I want to emphasize is that the NLPFDD subroutine returns all the information you need to implement the Gauss-Newton method. Several times, I have been asked why the NLPFDD subroutine returns the matrix J`*J, which seems like a strange quantity. Now you know that J`*J is one of the terms in the Gauss-Newton method and other similar least-squares optimization methods.

Verify the Gauss-Newton Solution

As I mentioned earlier, SAS/IML software supports two subroutines for solving least-squares minimization problems. The NLPHQN subroutine is a hybrid quasi-Newton algorithm that is similar to (but more sophisticated than) the Gauss-Newton method. Let's use the NLPHQN subroutine to solve the same problem and compare the solutions:

/* solve same problem by using NLPHQN (hybrid quasi-Newton method) */
optn = {3 1};     /* tell NLPQN that VecF has three components; print some partial results */
call NLPHQN(rc, xSoln, "VecF", x0) OPT=optn; 
print (path[11,])[c={'s' 't'} L="Soln from Gauss-Newton"],
       xSoln[c={'s' 't'} L="Soln from NLPHQN"];

The solution from the hybrid quasi-Newton method is very close the Gauss-Newton solution. If your goal is to solve a least-squares minimization, use the NLPHQN (or NLPLM) subroutine. But you might want to implement your own minimization method for special problems or for educational purposes.

Summary

In summary, this article shows three things:

  • For a vector-valued function, the NLPFDD subroutine evaluates the function and uses finite-difference derivatives to approximate the Jacobian (J) at a point in the domain. The subroutine also returns the matrix product J`*J.
  • The three quantities from the NLPFDD subroutine are exactly the three quantities needed to implement the Gauss-Newton method. By using the NLPFDD subroutine and the matrix language, you can implement the Gauss-Newton method in a few lines of code.
  • The answer from the Gauss-Newton method is very similar to the answer from calling a built-in least-squares solver, such as the NLPHQN subroutine.

Further reading

The post Least-squares optimization and the Gauss-Newton method appeared first on The DO Loop.

3月 072022
 

I previously showed how to use SAS to compute finite-difference derivatives for smooth scalar-valued functions of several variables. You can use the NLPFDD subroutine in SAS/IML software to approximate the gradient vector (first derivatives) and the Hessian matrix (second derivatives). The computation uses finite-difference derivatives to approximate the derivatives.

The NLPFDD subroutine also supports approximating the first derivatives a smooth vector-valued function. If F: Rn → Rm is a vector-valued function with m components, then you can write it as F(x) = (F1(x), F2(x), ..., Fm(x)). The Jacobian matrix, J, is the matrix whose (i,j)th entry is J[i,j] = ∂Fi / ∂xj. That is, the i_th row is the gradient of the i_th component function, Fi, i=1,2,..,m and j=1,2...,n. The NLPFDD subroutine enables you to approximate the Jacobian matrix at any point in the domain of F.

An example of a vector-valued function

A simple example of a vector-valued function is the parameterization of a cylinder, which is a mapping from R2 → R3 given by the function
C(s,t) = (cos(s), sin(s), t)
For any specified values of (s,t), the vector C(s,t) is the vector from the origin to a point on the cylinder. It is sometimes useful to consider basing the vectors at a point q, in which case the function F(s,t) = C(s,t) - q is the vector from q to a point on the cylinder. You can define this function in SAS/IML by using the following statements. You can then use the NLPFDD subroutine to compute the Jacobian matrix of the function at any value of (s,t), as follows:

/* Example of a vector-valued function f: R^n -> R^m for n=2 and m=3.
   Map from (s,t) in [0, 2*pi) x R to the cylinder in R^3.
   The vector is from q = (0, 2, 1) to F(s,t)  */
proc iml;
 
q = {0, 2, 1};               /* define a point in R^3 */
/* vector from q to a point on a cylinder */
start VecF(st) global(q);
   s = st[1]; t = st[2];
   F = j(3, 1, .);
   F[1] = cos(s) - q[1];
   F[2] = sin(s) - q[2];
   F[3] =     t  - q[3];
   return( F );
finish;
 
/* || VecF || has local min at (s,t) = (pi/2, 1), so let's evaluate derivatives at (pi/2, 1) */
pi = constant('pi');
x =  pi/2 || 1;              /* value of (s,t) at which to evaluate derivatives */
/* Parameters: the function has 3 components, use forward difference, choose step according to scale of F */
parm = {3, 00, .};           
call nlpfdd(f, J, JtJ, "VecF", x) par=parm;  /* return f(x), J=DF(x), and J`*J */
 
print f[r={'F_1' 'F_2' 'F_3'} L='F(pi/2, 1)'], 
      J[r={'F_1' 'F_2' 'F_3'} c={'d/ds' 'd/dt'} L='DF(pi/2, 1) [Fwd Diff]'];

The program evaluates the function and the Jacobian at the point (s,t)=(π/2, 1). You must use the PAR= keyword when the function is vector-valued. The PAR= argument is a three-element vector that contains the following information. If you specify a missing value, the default value is used.

  • The first element is the number of components (m) for the function. By default, the NLPFDD subroutine expects a scalar-valued function (m=1), so you must specify PARM[1]=3 for this problem.
  • The second element is the finite-difference method. The function supports several schemes, but I always use either 0 (forward difference) or 1 (central difference). By default, PARM[2]=0.
  • The third element provides a way to choose the step size for the finite-difference method. I recommend that you use the default value.

The output shows that the value of the function at (s,t)=(π/2, 1) is {0, -1, 0}. The Jacobian at that point is shown. The gradient of the i_th component function (Fi) is contained in the i_th row of the Jacobian matrix.

  • The gradient of the first component function at (π/2, 1) is {-1 0}.
  • The gradient of the second component function at (π/2, 1) is {0 0}, which means that F2 is at a local extremum. (In this case, F2 is at a minimum).
  • The gradient of the third component function is {0 1}.

The NLPFDD subroutine also returns a third result, which is named JtJ in the program. It contains the matrix product J`*J, where J is the Jacobian matrix. This product is used in some numerical methods, such as the Gauss-Newton algorithm, to minimize the value of a vector-valued function. Unless you are writing your own algorithm to perform a least-squares minimization, you probably won't need the third output argument.

Summary

The NLPFDD subroutine in SAS/IML uses finite-difference derivatives to approximate the derivatives of multivariate functions. A previous article shows how to use the NLPFDD subroutine to approximate the gradient and Hessian of a scalar-valued function. This article shows how to approximate the Jacobian of a vector-valued function. Numerical approximations for derivatives are useful in all sorts of numerical routines, including optimization, root-finding, and solving nonlinear least-squares problems.

In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer implementation that has a similar syntax.

The post Finite-difference derivatives of vector-valued functions appeared first on The DO Loop.

3月 022022
 

Many applications in mathematics and statistics require the numerical computation of the derivatives of smooth multivariate functions. For simple algebraic and trigonometric functions, you often can write down expressions for the first and second partial derivatives. However, for complicated functions, the formulas can get unwieldy (and some applications do not have explicit analytical derivatives). In those cases, numerical analysts use finite-difference methods to approximate the partial derivatives of a function at a point of its domain. This article is about how to compute first-order and second-order finite-difference derivatives for smooth functions in SAS.

Formulas for finite-difference derivatives

There are many ways to approximate the derivative of a function at a point, but the most common formulas are for the forward-difference method and the central-difference method. In SAS/IML software, you can use the NLPFDD subroutine to compute a finite-difference derivative (FDD).

This article shows how to compute finite-difference derivatives for a smooth scalar-valued function f: Rn → R at a point x0 in its domain. The first derivatives are the elements of the gradient vector, grad(f). The i_th element of the gradient vector is ∂f / ∂xi for i = 1, 2, ..., n. The second derivatives are the elements of the Hessian matrix, Hes(f). The (i,j)th element of the Hessian matrix is ∂2f / ∂xixj for i,j = 1, 2, ..., n. For both cases, the derivatives are evaluated at x0, so grad(f)(x0) is a numerical row vector and Hes(f)(x0) is a numerical symmetric matrix.

For this article, I will use the following cubic function of two variables:
\(f(x,y) = x^3 - y^2 - x + 0.5 \)
A heat map of the function is shown to the right. The function has two critical points at which the gradient is the zero vector: a local maximum at (-1/sqrt(3), 0) and a saddle point at (+1/sqrt(3), 0). The formula for the gradient is grad(f) = [ \(3 x^2 - 1\), \(-2 y\)] and the elements of the Hessian matrix are H[1,1] = 6 x, H[1,2] = H[2,1] = 0, and H[2,2] = -2. You can evaluate these formulas at a specified point to obtain the analytical derivatives at that point.

Finite-difference derivatives in SAS

In SAS, you can use the NLPFDD subroutine in PROC IML to compute finite difference derivatives. (In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer implementation.) By default, the NLPFDD subroutine uses forward-difference formulas to approximate the derivatives. The subroutine returns three output arguments: the value of the function, gradient, and Hessian, respectively, evaluated at a specified point. The following SAS/IML program defines the cubic function and calls the NLPFDD subroutine to approximate the derivatives:

proc iml;
start Func(xy);
   x = xy[,1]; y = xy[,2];
   z = x##3 - y##2 - x + 0.5;
   return( z );
finish;
 
/* Function has local max at (x_max, 0) where x_max = -1/sqrt(3) = -0.57735 */
x_max =  -1/sqrt(3) || 0;
call nlpfdd(f_max, grad_max, Hes_max, "Func", x_max);
 
reset fuzz=2e-6;       /* print 0 for any number less than FUZZ */
print f_max, grad_max, Hes_max;

The output shows the value of the function and its derivatives at x0 = (-1/sqrt(3), 0), which is a local maximum. The value of the function is 0.88. The gradient at the local maximum is the zero vector. The Hessian is a diagonal matrix. The forward-difference derivatives differ from the corresponding analytical values by 2E-6 or less.

Use finite-difference derivatives to classify extrema

One use of the derivatives is to determine whether a point is an extreme value. Points where the gradient of the function vanishes are called critical points. A critical point can be a local extremum or a saddle point. The eigenvalues of the Hessian matrix determine whether a critical point is a local extremum. The eigenvalues are used in the second-derivative test:

  1. If the eigenvalues are all positive, the critical point is a local minimum.
  2. If the eigenvalues are all negative, the critical point is a local maximum.
  3. If some eigenvalues are positive and others are negative, the critical point is a saddle point.
  4. If any eigenvalue is zero, the test is inconclusive.

For the current example, the Hessian is a diagonal matrix. By inspection, the eigenvalues are all negative, which proves that the point x0 = (-1/sqrt(3), 0) is a local maximum. You can repeat the computations for x_s = 1/sqrt(3) || 0, which is a saddle point.

Forward- versus central-difference derivatives

You can evaluate the derivatives at any point, not merely at the critical points. For example, you can evaluate the derivatives of the function at x1 = (-1, 0.5), which is not an extremum:

x1 = {-1 0.5};
call nlpfdd(FwdD_f, FwdD_g, FwdD_H, "Func", x1);
print FwdD_g, FwdD_H;

By default, the NLPFDD subroutine uses a forward-difference method to approximate the derivatives. You can also use a central difference scheme. The documentation describes six different ways to approximate derivatives, but the most common way is to use the objective function to choose a step size and to use either a forward or central difference. You can use the PAR= keyword to specify the method: PAR[2] = 0 specifies forward difference and PAR[2]=1 specifies central difference:

parm = {.,  00, .};                /* forward diff based on objective function (default) */
call nlpfdd(FwdD_f, FwdD_g, FwdD_H, "Func", x1) par=parm;
parm = {.,  01, .};                /* central diff based on objective function */
call nlpfdd(CenD_f, CenD_g, CenD_H, "Func", x1) par=parm;
 
grad_results = (FwdD_g // CenD_g);
print grad_results[r={"Fwd FD" "Cen FD"} c={"df/dx" "df/dy"}];
HesDiff = FwdD_H - CenD_H;
reset fuzz=2e-16;       /* print small numbers */
print HesDiff;

The first table shows the results for the forward-difference method (first row) and the central-difference method (second row). The Hessian matrices are very similar. The output shows that the maximum difference between the Hessians is about 4.5E-7. In general, the forward-difference method is faster but less accurate. The central-difference method is slower but more accurate.

Summary

This article shows how to use the NLPFDD subroutine in SAS/IML to approximate the first-order and second-order partial derivatives of a multivariate function at any point in its domain. The subroutine uses finite-difference methods to approximate the derivatives. You can use a forward-difference method (which is fast but less accurate) or a central-difference method (which is slower but more accurate). The derivatives are useful in many applications, including classifying critical points as minima, maxima, and saddle points.

The post Finite-difference derivatives in SAS appeared first on The DO Loop.