Numerical Analysis

6月 272022
 

Recently, I needed to solve an optimization problem in which the objective function included a term that involved the quantile function (inverse CDF) of the t distribution, which is shown to the right for DF=5 degrees of freedom. I casually remarked to my colleague that the optimizer would have to use finite-difference derivatives because "this objective function does not have an analytical derivative." But even as I said it, I had a nagging thought in my mind. Is that statement true?

I quickly realized that I was wrong. The quantile function of ANY continuous probability distribution has an analytical derivative as long as the density function (PDF) is nonzero at the point of evaluation. Furthermore, the quantile function has an analytical expression for the second derivative if the density function is differential.

This article derives an exact analytical expression for the first and second derivatives of the quantile function of any probability distribution that has a smooth density function. The article demonstrates how to apply the formula to the t distribution and to the standard normal distribution. For the remainder of this article, we assume that the density function is differentiable.

The derivative of a quantile function

It turns out that I had already solved this problem in a previous article in which I looked at a differential equation that all quantile functions satisfy. In the appendix of that article, I derived a relationship between the first derivative of the quantile function and the PDF function for the same distribution. I also showed that the second derivative of the quantile function can be expressed in terms of the PDF and the derivative of the PDF.

Here are the results from the previous article. Let X be a random variable that has a smooth density function, f. Let w = w(p) be the p_th quantile. Then the first derivative of the quantile function is
\(\frac{dw}{dp} = \frac{1}{f(w(p))}\)
provided that the denominator is not zero. The second derivative is
\(\frac{d^2w}{dp^2} = \frac{-f^\prime(w)}{f(w)} \left( \frac{dw}{dp} \right)^2 = \frac{-f^\prime(w)}{f(w)^3}\).

Derivatives of the quantile function for the t distribution

Let's confirm the formulas for the quantile function of the t distribution with five degrees of freedom. You can use the PDF function in SAS to evaluate the density function for many common probability distributions, so it is easy to get the first derivative of the quantile function in either the DATA step or in the IML procedure, as follows:

proc iml;
reset fuzz=1e-8;
DF = 5;
/* first derivative of the quantile function w = F^{-1}(p) for p \in (0,1) */
start DTQntl(p) global(DF);
   w = quantile("t", p, DF);  /* w = quantile for p */
   fw = pdf("t", w, DF);      /* density at w */
   return( 1/fw );
finish;
 
/* print the quantile and derivative at several points */
p = {0.1, 0.5, 0.75};
Quantile = quantile("t", p, DF);  /* w = quantile for p */
DQuantile= DTQntl(p);
print p Quantile[F=Best6.] DQuantile[F=Best6.];

This table shows the derivative (third column) for the quantiles of the t distribution. If you look at the graph of the quantile function, these values are the slope of the tangent lines to the curve at the given values of the p variable.

Second derivatives of the quantile function for the t distribution

With a little more work, you can obtain an analytical expression for the second derivative. It is "more work" because you must use the derivative of the PDF, and that formula is not built-in to most statistical software. However, derivatives are easy (if unwieldy) to compute, so if you can write down the analytical expression for the PDF, you can write down the expression for the derivative.

For example, the following expression is the formula for the PDF of the t distribution with ν degrees of freedom:

\( f(x)={\frac {\Gamma ({\frac {\nu +1}{2}})} {{\sqrt {\nu \pi }}\,\Gamma ({\frac {\nu }{2}})}} \left(1+{\frac {x^{2}}{\nu }}\right)^{-(\nu +1)/2} \)
Here, \(\Gamma\) is the gamma function, which is available in SAS by using the GAMMA function. If you take the derivative of the PDF with respect to x, you obtain the following analytical expression, which you can use to compute the second derivative of the quantile function, as follows:
/* derivative of the PDF for the t distribution */
start Dtpdf(x) global(DF);
   nu = DF;
   pi = constant('pi');
   A = Gamma( (nu +1)/2 ) / (sqrt(nu*pi) * Gamma(nu/2));
   dfdx = A * (-1-nu)/nu # x # (1 + x##2/nu)##(-(nu +1)/2 - 1); 
   return dfdx;
finish;
 
/* second derivativbe of the quantile function for the t distribution */
start D2TQntl(p) global(df);
   w = quantile("t", p, df);
   fw = pdf("t", w, df);
   dfdx = Dtpdf(w);
   return( -dfdx / fw##3 );
finish;
 
D2Quantile = D2TQntl(p);
print p Quantile[F=Best6.] DQuantile[F=Best6.] D2Quantile[F=Best6.];

Derivatives of the normal quantiles

You can use the same theoretical formulas for the normal quantiles. The second derivative, however, does not require that you compute the derivative of the normal PDF. It turns out that the derivative of the normal PDF can be written as the product of the PDF and the quantile function, which simplifies the computation. Specifically, if f is the standard normal PDF, then \(f^\prime(w) = -f(w) w\).
and so the second derivative of the quantile function is
\(\frac{d^2w}{dp^2} = w \left( \frac{dw}{dp}\right)^2\).
This is shown in the following program:

/* derivatives of the quantile function for the normal distribution */
proc iml;
/* first derivative of the quantile function */
start DNormalQntl(p);
   w = quantile("Normal", p);
   fw = pdf("Normal", w);
   return 1/fw;
finish;
/* second derivative of the quantile function */
start D2NormalQntl(p);
   w = quantile("Normal", p);
   dwdp = DNormalQntl(p);
   return( w # dwdp##2 );
finish;
 
p = {0.1, 0.5, 0.75};
Quantile = quantile("Normal", p);
DQuantile = DNormalQntl(p);
D2Quantile = D2NormalQntl(p);
print p Quantile[F=Best6.] DQuantile[F=Best6.] D2Quantile[F=Best6.];

Summary

In summary, this article shows how to compute the first and second derivatives of the quantile function for any probability distribution (with mild assumptions for the density function). The first derivative of the quantile function can be defined in terms of the quantile function and the density function. For the second derivative, you usually need to compute the derivative of the density function. The process is demonstrated by using the t distribution. For the standard normal distribution, you do not need to compute the derivative of the density function; the derivatives depend only on the quantile function and the density function.

The post The derivative of a quantile function appeared first on The DO Loop.

4月 252022
 

To a numerical analyst, numerical integration has two meanings. Numerical integration often means solving a definite integral such as \(\int_{a}^b f(s) \, ds\). Numerical integration is also called quadrature because it computes areas. The other meaning applies to solving ordinary differential equations (ODEs). My article about new methods for solving ODEs in SAS uses the words "integrate" and "integrator" a lot. This article caused a colleague to wonder about the similarities and differences between integration as a method for solving a definite integral and integration as a method for solving an initial-value problem for an ODE. Specifically, he asked whether you could use an ODE solver to solve a definite integral?

It is an intriguing question. This article shows that, yes, you can use an ODE solver to compute some definite integrals. However, just as a builder should use the most appropriate tools to build a house, so should a programmer use the best tools to solve numerical problems. Just because you can solve an integral by using an ODE does not mean that you should discard the traditional tried-and-true methods of solving integrals.

The most famous differential equation in probability and statistics

There are many famous differential equations in physics and engineering because Newton's Second Law is a differential equation. In probability and statistics, the most famous differential equation is the relationship between the two fundamental ways of looking at a continuous distribution. Namely, the derivative of the cumulative distribution function (CDF) is the probability density function (PDF). We usually write the relationship \(F^\prime(x) = f(x)\), where F is the CDF and f is the PDF. Thus, the CDF is the solution of a first-order differential equation that has the PDF on the right-hand side. If you know the PDF and an initial condition, you can solve the differential equation to find the CDF.

However, there is a problem. By integrating both sides of the differential equation, you obtain \(F(x) = \int_{-\infty}^x f(s) \, ds\). Because the lower limit of integration is -∞, you cannot solve the differential equation as an initial-value problem (IVP). An initial-value problem requires that you know the value of the function for some finite value of the lower limit.

However, you can get around this issue for some specific distributions. Some distributions have a lower limit of 0 instead of -∞. If the density at x=0 is finite, you can solve the ODE for those distributions.

You can also use an ODE to solve for the CDF of symmetric distributions. If f is a symmetric distribution, the point of symmetry is the median value of the distribution. So, if s0 is the symmetric value, you can solve the ODE \(F^\prime(x) = f(x)\) for any value xs0 by using the initial condition \(F(s_0) = 0.5\). Thus, you can use an IVP solve any integral of the form \(\int_{s_0}^b f(s) \, ds\). By symmetry, you can solve any integral of the form \(\int_a^{s_0} f(s) \, ds\). By combining these results, you can use an IVP to solve the integral on any finite-length interval [a, b].

This symmetry idea is shown in the graph to the right for the case of the standard normal distribution. The normal distribution is symmetric about the median s0=0. This means that the CDF passes through the point (0, 1/2). You can integrate the ODE to obtain the CDF curve for any positive value of x. The computations are shown in the next section.

Solve an integral to find the normal probability

I like to know the correct answer BEFORE I do any computation. The usual way to compute an integral in SAS is to use the QUAD subroutine in SAS/IML software. (QUAD is short for quadrature.) This subroutine is designed to compute one-dimensional integrals. To use the QUAD subroutine, you need to define the integrand and interval of integration. For example, the following statements define the integral for the normal PDF on the interval [0, 3]:

proc iml;
/* Use QUAD subroutine to integrate the standard normal PDF.
   Note: For this integrand, you can call the CDF function to compute the integral:
         CDF = cdf("Normal", 3) - 0.5; */
start NormalPDF(x);
   return pdf("Normal", x);   /* f(x) = 1/sqrt(2*pi) *exp(-x##2/2) */
finish;
 
/* find the area under pdf on [0, 3] */
call quad(Area, "NormalPDF", {0 3});
print Area;

The QUAD subroutine makes it simple to integrate any continuous function on a finite interval. In fact, the QUAD subroutine can also integrate over infinite intervals! And, if you specify the points of discontinuity, it can integrate functions that are not continuous at finitely many points.

Solve an ODE to find the normal probability

Just for fun, let's see if we can get the same answer by using the ODE subroutine to solve a differential equation. We want to compute the integral \(F(a) = \int_0^a \phi(s) \, ds\), where φ is the standard normal PDF. The equivalent formulation as a differential equation is \(F^{\prime}(x) = \phi(x)\), where F(0)=0.5. For this example, we want to solution at x=3.

The module that evaluates the right-hand side of the ODE must take two arguments. The first argument is the independent variable, which is x. The second argument is the state of the system, which is not used for this problem. You also need to specify the initial condition, which is F(0)=0.5. Lastly, you also need to specify the limits of integration which is [0, 3]. The following statements use the ODE subroutine in SAS/IML software to solve the IVP defined by F`(x)=PDF("Normal",x) and F(0)=0.5:

/* ODE is A` = 1/sqrt(2*pi) *exp(-x##2/2) 
   A(0) = 0.5  */
start NormalPDF_ODE(s, y);
   return pdf("Normal", s);
finish;
 
c = 0.5;                     /* initial value at x=0 */
x = {0 3};                  /* integrate from x=0 to x=3 */
h={1.e-7, 1, 1e-4};          /* (min, max, initial) step size for integrator */
call ode(result, "NormalPDF_ODE", c, x, h) eps=1e-7;
A = result - c;
print A;

Success! The ODE gives a similar numerical result for the integral of the PDF, which is the CDF at x=3.

Integrating an ODE is a poor way to compute an integral

Although it is possible to use an ODE to solve some integrals on a finite domain, this result is more of a curiosity than a practical technique. A better choice is to use a routine designed for numerical integration, such as the QUAD subroutine. The QUAD subroutine can integrate on any domain, including infinite domains. For example, the following statements integrate the normal PDF on [-3, 3] and (-∞, 3), respectively.

call quad(Area2, "NormalPDF", {-3 3});  /* area on [-3, 3] */
call quad(Area3, "NormalPDF", {0 .P});  /* area on [0, infinity) */
print Area2 Area3;

In addition, the QUAD subroutine can be used to compute double integrals by using iterated integrals. As far as I know, you cannot use an ODE to compute a double integral.

Whereas the ODE subroutine requires knowing an initial value on the curve, the QUAD routine does not. The ODE subroutine also requires separate calls to integrate forward and backward from the initial condition, whereas the QUAD routine does not.

Summary

The phrase "numerical integration" has two meanings in numerical analysis. The usual meaning is "quadrature," which means to evaluate a definite integral. However, it is also used to describe the numerical solution of an ODE. In certain cases, you can use an ODE to solve a definite integral on a bounded domain. However, the ODE requires special additional information such as knowing an "initial condition." So, yes, you can (sometimes) solve integrals by using a differential equation, but the process is more complicated than if you use a standard method for solving integrals.

The post Two perspectives on numerical integration appeared first on The DO Loop.

4月 112022
 

Many discussions and articles about SAS Viya emphasize its ability to handle Big Data, perform parallel processing, and integrate open-source languages. These are important issues for some SAS customers. But for customers who program in SAS and do not have Big Data, SAS Viya is attractive because it is the development environment in which SAS R&D adds new features, options, and algorithms. Since 2018, most new development has been in SAS Viya. This article discusses one new feature in the SAS IML product: new state-of-the-art algorithms for solving ordinary differential equations (ODEs) that are formulated as initial value problems (IVPs).

In statistics, differential equations arise in several areas, including compartment models in the field of pharmacokinetics. The example in this article is a compartment model, although it is from epidemiology, not pharmacokinetics.

Solving ODEs in SAS

The SAS/IML product introduced the ODE subroutine way back in SAS 6. The ODE subroutine solves IVPs by using an adaptive, variable-order, variable-step-size integrator. This integrator is very good for a wide assortment of problems. The integrator was state-of-the-art in the 1980s, but researchers have made a lot of progress in solving ODEs since then.

The new ODEQN subroutine was introduced in Release 2021.1.5 of SAS IML software. See the documentation for a full description of the features of the ODEQN subroutine. Briefly, it has a simple calling syntax and supports about 36 different integration methods, including the following:

  • BDF: backward differentiation formulas for stiff problems
  • Adams-Moulton method for non-stiff problems
  • Hybrid (BDF + Adams): Automatic detection of stiffness and switching
  • Explicit Runge-Kutta methods: Popular solvers in the sciences and engineering fields

Each class of solvers supports several "orders," which equate to low- or high-order Taylor series approximations.

The new ODEQN subroutine has several advantages over the previous solver, but the following features are noteworthy:

  • Self-starting: the subroutine automatically chooses an initial step size
  • Variable-order methods use higher-order approximations to take larger steps, when possible
  • Automatically detect stiffness (Hybrid method)
  • Support piecewise-defined vector fields
  • Support backward-in-time integration
  • BLAS-aware linear algebra scales to handle large systems

Example: A basic SIR model in epidemiology

Many introductory courses in differential equations introduce the SIR model in epidemiology. The SIR model is a simple model for the transmission and recovery of a population that is exposed to an infectious pathogen. More sophisticated variations of the basic SIR model have been used by modelers during the COVID-19 pandemic. The SIR model assumes that the population is classified into three populations: Susceptible to the disease, Infected with the disease, and Recovered from the disease.

In the simplest SIR model, there are no births and no deaths. Initially, almost everyone is susceptible except for a small proportion of the population that is infected. The susceptible population can become infected by interacting with a member of the infected population. Everyone in the infected population recovers, at which point the model assumes that they are immune to reinfection. The SIR model has two parameters. One parameter (b) controls the average rate at which a susceptible-infected interaction results in a new infected person. Another parameter (k) represents how quickly an infected person recovers and is no longer infectious.

The following SAS IML module defines the SIR equations, which model the proportion of the population that is susceptible, infected, and recovered. Initially, 99.99% of the population is susceptible, 0.01% is infected, and 0% is recovered. The new ODEQN subroutine in SAS Viya solves the system of equations for 100 units of time. The output of the routines is a 100 x 3 matrix. The i_th row is the proportion of the population in each category after i units of time have passed. Notice that the syntax of the ODEQN subroutine is quite simple: You specify the module that defines the equations, the initial conditions, and the time points at which you want a solution.

/* Program requires SAS IML in SAS Viya 2021.1.5 */
proc iml;
/* SIR model: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology */
start SIR(t, y) global(b, k);
   S = y[1]; I = y[2]; R = y[3];
   dSdt = -b * S*I;
   dIdt =  b * S*I - k*I;
   dRdt =            k*I;
   return ( dSdt || dIdt || dRdt );
finish;
 
/* set parameters and initial conditions */
b = 0.5;      /* b param: transition rate from S to I */
k = 1/14;     /* k param: transition rate from I to R */
IC = {0.9999  0.0001   0};   /* S=99.99% I=0.01%  R=0 */
t = T(0:100);  /* compute spread for each day up to 100 days */
 
call odeqn(soln, "SIR", IC, t);  /* in SAS Viya only */

You can graph the solution of this system by plotting the time-evolution of the three populations:

/* write solution to SAS data set */
S = t || soln;
create SIROut from S[c={'Time' 'Susceptible' 'Infected' 'Recovered'}];
append from S;
close;
QUIT;
 
title "Solution to SIR Model";
title2 "b = 1/2; k = 1/14";
proc sgplot data=SIROut;
   series x=Time y=Susceptible;
   series x=Time y=Infected;
   series x=Time y=Recovered;
   xaxis grid; yaxis grid label="Percent of Group";
run;

For the parameters (b, k) = (0.5, 0.07), the infected population rapidly increases until Day 25 when almost 60% of the population is infected. By this time, the population of susceptible people is only about 10% of the population. For these parameters, nearly everyone in the population becomes infected by Day 60. The highly infectious (but not lethal) epidemic is essentially over by Day 80 when almost everyone has been infected, recovered, and is now immune.

Example: An SIR model for a change in behavior

One of the advantages of mathematical models is that you can simulate "what if" scenarios. For example, what if everyone in the population changes their behavior on Day 20 to reduce the transmission rate? This might include the adoption of non-pharmaceutical interventions such as mask-wearing, social distancing, and quarantining the infected. What if these behaviors change the effective rate of transmission from 0.5 (very high) to 0.1 (moderate)? You can modify the module that defines the ODE to reflect that parameter change at t=20. As mentioned earlier, the ODEQN subroutine can integrate piecewise-defined systems if you specify the time at which the system changes its behavior. The following call to the ODEQN subroutine uses the TDISCONT= option to tell the integrator that the system of ODEs changes at t=20.

%let TChange = 20;
start SIR(t, y) global(k);
   if t < &TChange then b = 0.5;   /* more spread */
   else                 b = 0.1;   /* less spread */
 
   S = y[1]; I = y[2]; R = y[3];
   dSdt = -b * S*I;
   dIdt =  b * S*I - k*I;
   dRdt =            k*I;
   return ( dSdt || dIdt || dRdt );
finish;
 
/* set parameters and initial conditions */
k = 1/14;     /* k param: transition rate from I to R */
IC = {0.9999  0.0001   0};   /* S=99.99% I=0.01%  R=0 */
t = T(0:100);  /* compute spread for each day up to 100 days */
call odeqn(soln, "SIR", IC, t) TDISCON=&TChange;

If you plot the new solutions, you obtain the following graph:

In the new scenario, the increase in the infected population is reversed by the intervention on Day 20. From there, the proportion of infected persons decreases over time. Eventually, the disease dies out. By Day 100, 75% of the population was infected and recovered, leaving 25% that are still susceptible to a future outbreak.

Summary

Although moving to SAS Viya offers many advantages for SAS customers who use Big Data and want to leverage parallel processing, Viya also provides new features for traditional SAS programmers. One example is the new ODEQN subroutine in SAS IML, which solves ODEs that are initial value problems by using state-of-the-art numerical algorithms. In addition to computational improvements in accuracy and speed, the ODEQN subroutine has a simple syntax and makes many decisions automatically, which makes the routine easier for non-experts to use.

The post New methods for solving differential equations in SAS appeared first on The DO Loop.

4月 062022
 

The graph to the right is the quantile function for the standard normal distribution, which is sometimes called the probit function. Given any probability, p, the quantile function gives the value, x, such that the area under the normal density curve to the left of x is exactly p. This function is the inverse of the normal cumulative distribution function (CDF).

Did you know that this curve is also the solution of a differential equation? You can write a nonlinear second-order differential equation whose solution is exactly the probit function! This means that you can compute quantiles by solving a differential equation.

This article derives the differential equation and solves it by using the built-in ODE subroutine in SAS/IML software.

A differential equation for quantiles

Let w(p) = PROBIT(p) be the standard normal quantile function. The w function satisfies the following differential equation:
\(\frac{d^2w}{dp^2} = w \left( \frac{dw}{dp}\right)^2\)
The derivation of this formula is shown in the appendix.

A differential equation looks scary, but it merely reveals a relationship between the derivatives of a function at every point in the domain. For example: at the point p=1/2, the value of the probit function is w(1/2) = 0, and the slope of the function is w'(1/2) = sqrt(2 π). From those two values, the differential equation tells you that the second derivative at p=0 is exactly 0*2π = 0.

The appendix shows how to use calculus to prove the relationship. You can also use the NLPFDD subroutine in SAS/IML to compute finite-difference derivatives at an arbitrary value of p. The derivatives satisfy the differential equation to within the accuracy of the computation.

Use the differential equation to compute quantiles

An interesting application of the ordinary differential equation (ODE) is that you can use it to compute quantiles. At the point, p=1/2, the probit function has the value w(1/2) = 0, and the slope of the function is w'(1/2) = sqrt(2 π). Therefore, the probit function is the particular solution to the previous ODE for those initial conditions. Consequently, you can obtain the quantiles by solving an initial value problem (IVP).

I previously showed how to use the ODE subroutine in SAS/IML to solve initial value problems. For many IVPs, the independent variable represents time, and the solution represents the evolution of the state of the system, given an initial state. For this IVP, the independent variable represents probability. By knowing the quantile of the standard normal distribution at one location (p=1/2), you can find any larger quantile by integrating forward, or any smaller quantile by integrating backward.

The ODE subroutine solves systems of first-order differential equations, so you must convert the second-order equation into a system that contain two first-order equations. Let v = dw/dp = w`. Then the differential equation can be written as the following first order system:

   w` = v
   v` = w * v**2

The following SAS/IML statement use the differential equation to find the quantiles of the standard normal distribution for p={0.5, 0.68, 0.75, 0.9, 0.95, 0.99}:

proc iml;
/* Solve first-order system of ODES with IC
   w(1/2) = 0
   v(1/2) = sqrt(2*pi)
*/
start NormQntl(t,y);
   w = y[1]; v = y[2];
   F = v  // w*v**2;
   return(F);
finish;
 
/* initial conditions at p=1/2 */
pi = constant('pi');
c = 0 || sqrt(2*pi);  
 
/* integrate forward in time to find these quantiles */
p = {0.5, 0.68, 0.75, 0.9, 0.95, 0.99};
 
/* ODE call requires a column vector for the IC, and a row vec for indep variable */ 
c = colvec(c);               /* IC */
t = rowvec(p);               /* indep variable */
h={1.e-7, 1, 1e-4};          /* (min, max, initial) step size for integrator */
call ode(result, "NormQntl", c, t, h) eps=1e-6;
Traj  = (c` // result`);     /* solutions in rows. Transpose to columns */
qODE = Traj[,1];             /* first column is quantile; second col is deriv */
 
/* compare with the correct answers */
Correct = quantile("Normal", p);
Diff = Correct - qODE;       /* difference between ODE solution and QUANTILE */
print p Correct qODE Diff;

The output shows the solution to the system of ODEs at the specified values of p. The solution is very close to the more exact quantiles as computed by the QUANTILE function. This is fairly impressive if you think about it. The ODE subroutine has no idea that it is solving for normal quantiles. The subroutine merely uses a general-purpose ODE solver to generate a solution at the specified values of p. In contrast, the QUANTILE routine is a special-purpose routine that is engineered to produce the best-possible approximations of the normal quantiles.

To get quantiles for p < 0.5, you can use the symmetry of the normal distribution. The quantiles satisfy the relation w(1-p) = -w(p).

Summary

A differential equation reveals the relationship between a function and its derivatives. It turns out that there is an ordinary differential equation that is satisfied by the quantile function of any smooth probability distribution. This article shows the ODE for the standard normal distribution. You can solve the ODE in SAS to compute quantiles if you also know the value of the quantile function and its derivative at some point, such as a median at p=1/2.

Appendix: Derive the quantile ODE

For the mathematically inclined, this section derives a differential equation for any smooth probability distribution. The equations are given in a Wikipedia article, but the article does not include a derivation.

Let X be a random variable that has a smooth CDF, F. The density function, f, is the first derivative of F. By definition, the quantile function, w, is the inverse of the CDF function. That means that for any probability, p, the CDF and quantile function satisfy the equation \(F(w(p)) = p\).

Take the deivative of both sides with respect to p and apply the chain rule:
\(F^\prime(w(p)) \frac{dw}{dp} = 1\)
Substitute f for the derivative and solve for dw/dp:
\(\frac{dw}{dp} = \frac{1}{f(w(p))}\)

Again, take the derivative of both sides:
\(\frac{d^2w}{dp^2} = \frac{-1}{f(w(p))^2} \frac{df}{dp} \frac{dw}{dp}\).
You can use the relationship dw/dp = 1/f to obtain
\(\frac{d^2w}{dp^2} = \frac{-f(w)^\prime}{f(w)} \left( \frac{dw}{dp} \right)^2\).
In this derivation, I have not used any special properties of the distribution. Therefore, it is satisfied by any probability distribution that is sufficiently smooth and for any values of p for which the equation is well-defined.

The ODE simplifies for some probability density functions. For example, the standard normal density is \(f(w) = \frac{1}{\sqrt{2\pi}} \exp(-w^2/2 )\) and thus
\(f^\prime(w) = \frac{1}{\sqrt{2\pi}} \exp(-w^2/2 )(-w) = -f(w) w\).
Consequently, for the standard normal distribution, the ODE simplifies to
\(\frac{d^2w}{dp^2} = w \left( \frac{dw}{dp}\right)^2\).

The post A differential equation for quantiles of a distribution appeared first on The DO Loop.

3月 212022
 

Statistical programmers need to access numerical constants that help us to write robust and accurate programs. Specifically, it is necessary to know when it is safe to perform numerical operations such as raising a number to a power without exceeding the largest number that is representable in finite-precision arithmetic. This article discusses five constants that every statistical programmer must know: PI, MACEPS, EXACTINT, BIG, and SMALL. In the SAS language, you can find the values of these constants by using the CONSTANT function in Base SAS.

The following table shows the values of the constants that are discussed in this article:

data Constants;
length Const $10;
format Value Best12.;
input Const @@;
Value = constant(Const);
datalines;
PI MACEPS EXACTINT 
BIG LOGBIG SQRTBIG SMALL
;
 
proc print noobs; run;

Pi and other mathematical constants

The CONSTANT function provides values for various mathematical constants. Of these, the most important constant is π ≈ 3.14159..., but you can also obtain other mathematical constants such as the natural base (e) and the golden ratio (φ). To get an approximation for π, use pi = constant('pi'). The number π is essential for working with angles and trigonometric functions. For an example, see the article, "Polygons, pi, and linear approximations," which uses π to create regular polygons.

Machine epsilon

Arguably the most important constant in computer science is machine epsilon. This number enables you to perform floating-point comparisons such as deciding whether two numbers are equal or determining the rank of a numerical matrix. To get machine epsilon, use eps = constant('maceps').

The largest representable integer

It is important to know the largest number that can be represented accurately in finite precision on your machine. This number is available as bigInt = constant('exactint'). This number enables you to find the largest factorial number that you can compute (18!) and the largest row of Pascal's triangle (56) that you can faithfully represent in double precision (56).

The largest representable double

Except for π, the constants I use the most are related to BIG ≈ 1.8E308. I primarily use LOGBIG and SQRTBIG, which you can compute as
logbig = constant('logbig');
sqrtbig = constant('sqrtbig');
These constants are useful for preventing overflow when you perform arithmetic operations on large numbers:

  • The quantity exp(x) can only be computed when x is less than LOGBIG ≈ 709.78.
  • The LOGBIG option supports the BASE suboption (BASE > 1), which you can use to ensure that raising a number to a power does not overflow. For example, constant('logbig', 2) returns 1024 because 2**1024 is the largest power of 2 that does not exceed BIG.
  • The SQRTBIG option tells you whether you can square a number without overflowing.

The smallest representable double

What is the smallest positive floating-point number on your computer? It is given by small = constant('small'). There are also LOGSMALL and SQRTSMALL versions, which you can use to prevent overflow. I don't use these constants as frequently as their 'BIG' counterparts. In my experience, underflow is usually not a problem in SAS.

Summary

This article discusses five constants that every statistical programmer must know: PI, MACEPS, EXACTINT, BIG, and SMALL. Whether you need mathematical constants (such as π) will depend on the programs that you write. The MACEPS constant is used to compare floating-point numbers. The other constants are used by computer scientists and numerical analysts to ensure that programs can correctly compute with very large (or very small) numbers without encountering floating-point overflow (or underflow).

The post Five constants every statistical programmer should know appeared first on The DO Loop.

3月 162022
 

A previous article showed how to use SAS to compute finite-difference derivatives of smooth vector-valued multivariate functions. The article uses the NLPFDD subroutine in SAS/IML to compute the finite-difference derivatives. The article states that the third output argument of the NLPFDD subroutine "contains the matrix product J`*J, where J is the Jacobian matrix. This product is used in some numerical methods, such as the Gauss-Newton algorithm, to minimize the value of a vector-valued function."

This article expands on that statement and shows that you can use the SAS/IML matrix language to implement the Gauss-Newton method by writing only a few lines of code.

The NLPFDD subroutine has been in SAS/IML software since SAS version 6. In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer finite-difference function that has a syntax similar to the NLPFDD subroutine.

A review of least-squares minimization

Statisticians often use least-squares minimization in the context of linear or nonlinear least-squares (LS) regression. In LS regression, you have an observed set of response values {Y1, Y2, ..., Yn}, and you fit a parametric model to obtain a set of predicted values {P1, P2, ..., Pn}. These predictions depend on parameters, and you try to find the parameters that minimize the sum of the squared residuals, || Pi - Yi ||2 for i=1, 2, ..., n.

In this formula, the parameters are not explicitly written, but this formulation is the least-squares minimization of a function that maps each parameter to the sum of squared residuals. Abstractly, if F: Rn → Rm is a smooth vector-valued function with m components, then you can write it as F(x) = (F1(x), F2(x), ..., Fm(x)). In most applications, the dimension of the domain is less than the dimension of the range: n < m. The goal of least-square minimization is to find a value of x that minimizes the sum of the squared components of F(x). In vector form, we search for a value of x that (locally) minimizes || F(x) ||2. Equivalently, the problem is often formulated so that you minimize (1/2) || F(x) ||2. The factor of 1/2 simplifies the expression for the derivative of the objective function.

Visualize a least-squares function

The previous article used the following vector-valued function:

proc iml;
 
q = {0, 2, 1};               /* define a point in R^3 */
start VecF(st) global(q);    /* compute vector from q to a point on a cylinder */
   s = st[1]; t = st[2];
   F = j(3, 1, .);
   F[1] = cos(s) - q[1];
   F[2] = sin(s) - q[2];
   F[3] =     t  - q[3];
   return( F );
finish;

Geometrically, this function represents the vector from q = (0, 2, 1) to a point on a cylinder in R3. Therefore, the vector-valued function is minimized when (s,t) = (pi/2, 1), since that is the point on the cylinder closest to the point q, which is not on the cylinder.

Let's visualize the function (1/2) || F(s,t) ||2 for (s,t) values near (pi/2. 1). You can use the EXPANDGRID function to create a uniform grid of points in the (s,t) domain. For each point on the grid, you can use the following statements to compute (one-half) the sum of the squares of the vector-valued function. To compute the sum of squared components, you can use the SSQ function or the NORM function, but I have decided to use the inner-product F`*F.

/* create heat map of VecF */
s = do(1, 2.1, 0.02);       /* horiz grid points */
t = do(0.5, 1.5, 0.02);     /* vert grid points */
st = expandgrid(s, t);
z = j(nrow(st), 1);
do i = 1 to nrow(st);
   F = VecF( st[i,] ); 
   z[i] = 0.5 * F`*F;       /* 1/2 sum of squares of components */
end;
 
result = st||z;
create Heat from result[c={'s' 't' 'z'}]; append from result; close;
 
submit;
title "1/2 || F(x,y) ||^2";
proc sgplot data=Heat;
   heatmapparm x=s y=t colorresponse=z / colormodel=ThreeColorRamp;
   refline 1.571/axis=x label='pi/2' labelpos=min; refline 1/axis=y;
run;
endsubmit;

The graph shows the values of z = (1/2) ||F(s,t)||2 on a uniform grid. The reference lines intersect at (pi/2, 1), which is the minimum value of z.

The Gauss-Newton method for minimizing least-squares problems

One way to solve a least-squares minimization is to expand the expression (1/2) ||F(s,t)||2 in terms of the component functions. You get a scalar function of (s,t), so you can use a traditional optimization method, such as the Newton-Raphson method, which you can call by using the NLPNRA subroutine in SAS/IML. Alternatively, you can use the NLPLM subroutine to minimize the least-squares problem directly by using the vector-valued function. In this section, I show a third method: using matrix operations in SAS/IML to implement a basic Gauss-Newton method from first principles.

The Gauss-Newton method is an iterative method that does not require using any second derivatives. It begins with an initial guess, then modifies the guess by using information in the Jacobian matrix. Since each row in the Jacobian matrix is a gradient of a component function, the Gauss-Newton method is similar to a gradient descent method for a scalar-valued function. It tries to move "downhill" towards a local minimum.

Rather than derive the formula from first principles, I will merely present the Gauss-Newton algorithm, which you can find in Wikipedia and in course notes for numerical analysis (Cornell, 2017):

  1. Make an initial guess for the minimizer, x0. Let x = x0 be the current guess.
  2. Evaluate the vector FF(x) and the Jacobian J ≡ [ ∂Fi / ∂xj ] for i=1,2,..,m and j=1,2...,n. The NLPFDD subroutine in SAS/IML enables you to evaluate the function and approximate the Jacobian matrix with a single call.
  3. Find a direction that decreases || F(x) || by solving the linear system (J`J) h = -J`*F for h.
  4. Take a step in the direction of h and update the guess: xnew = x + α h for some α in (0, 1).
  5. Go to Step 2 until the problem converges.

For any point x in the domain of F, the NLPFDD subroutine returns three results: F, J, and the matrix product J`*J. These are exactly the quantities that are needed to perform the Gauss-Newton method! The following statements implement the algorithm for an initial guess x0 = (2, 0.5):

/* Gauss-Newton line-search method for minimizing || VecF(x) || */
x0 = {2  0.5};       /* initial guess for (s,t) */
parm = {3, 00, .};   /* function has 3 components, use forward difference */
nSteps = 10;         /* take 10 steps regardless of convergence */
alpha = 0.6;         /* step size for line search */
 
path = j(nSteps+1, 2);  /* save and plot the approximations */
path[1,] = x0;
/* start the Gauss-Newton method */
x = x0;
do i = 1 to nSteps;
   call nlpfdd(F, J, JtJ, "VecF", x) par=parm; /* get F, J, and J`J at x */
   h = solve(JtJ, -J`*F);   /* h points downhill */
   x = x + alpha*h`;        /* update the current guess */
   path[i+1,] = x;          /* remember this point */
end;
 
/* write the iterates to a data set */
create path from path[c={'px' 'py'}]; append from path; close;
 
submit alpha;
data All; set Heat path; run;
 
title "Path of Gauss-Newton Minimization";
title2 "alpha = &alpha";
proc sgplot data=All noautolegend;
   heatmapparm x=s y=t colorresponse=z / colormodel=ThreeColorRamp;
   refline 1.571/axis=x label='pi/2' labelpos=min; refline 1/axis=y;
   series x=px y=py / markers;
run;
endsubmit;

The graph overlays the path of the Gauss-Newton approximations on the heat map of (1/2) || F(s,t) ||2. For each iterate, you can see that the search direction appears to be "downhill." A more sophisticated algorithm would monitor the norm of h to determine when to step. You can also use different values of the step length, α, or even vary the step length as you approach the minimum. I leave those interesting variations as an exercise.

The main point I want to emphasize is that the NLPFDD subroutine returns all the information you need to implement the Gauss-Newton method. Several times, I have been asked why the NLPFDD subroutine returns the matrix J`*J, which seems like a strange quantity. Now you know that J`*J is one of the terms in the Gauss-Newton method and other similar least-squares optimization methods.

Verify the Gauss-Newton Solution

As I mentioned earlier, SAS/IML software supports two subroutines for solving least-squares minimization problems. The NLPHQN subroutine is a hybrid quasi-Newton algorithm that is similar to (but more sophisticated than) the Gauss-Newton method. Let's use the NLPHQN subroutine to solve the same problem and compare the solutions:

/* solve same problem by using NLPHQN (hybrid quasi-Newton method) */
optn = {3 1};     /* tell NLPQN that VecF has three components; print some partial results */
call NLPHQN(rc, xSoln, "VecF", x0) OPT=optn; 
print (path[11,])[c={'s' 't'} L="Soln from Gauss-Newton"],
       xSoln[c={'s' 't'} L="Soln from NLPHQN"];

The solution from the hybrid quasi-Newton method is very close the Gauss-Newton solution. If your goal is to solve a least-squares minimization, use the NLPHQN (or NLPLM) subroutine. But you might want to implement your own minimization method for special problems or for educational purposes.

Summary

In summary, this article shows three things:

  • For a vector-valued function, the NLPFDD subroutine evaluates the function and uses finite-difference derivatives to approximate the Jacobian (J) at a point in the domain. The subroutine also returns the matrix product J`*J.
  • The three quantities from the NLPFDD subroutine are exactly the three quantities needed to implement the Gauss-Newton method. By using the NLPFDD subroutine and the matrix language, you can implement the Gauss-Newton method in a few lines of code.
  • The answer from the Gauss-Newton method is very similar to the answer from calling a built-in least-squares solver, such as the NLPHQN subroutine.

Further reading

The post Least-squares optimization and the Gauss-Newton method appeared first on The DO Loop.

3月 072022
 

I previously showed how to use SAS to compute finite-difference derivatives for smooth scalar-valued functions of several variables. You can use the NLPFDD subroutine in SAS/IML software to approximate the gradient vector (first derivatives) and the Hessian matrix (second derivatives). The computation uses finite-difference derivatives to approximate the derivatives.

The NLPFDD subroutine also supports approximating the first derivatives a smooth vector-valued function. If F: Rn → Rm is a vector-valued function with m components, then you can write it as F(x) = (F1(x), F2(x), ..., Fm(x)). The Jacobian matrix, J, is the matrix whose (i,j)th entry is J[i,j] = ∂Fi / ∂xj. That is, the i_th row is the gradient of the i_th component function, Fi, i=1,2,..,m and j=1,2...,n. The NLPFDD subroutine enables you to approximate the Jacobian matrix at any point in the domain of F.

An example of a vector-valued function

A simple example of a vector-valued function is the parameterization of a cylinder, which is a mapping from R2 → R3 given by the function
C(s,t) = (cos(s), sin(s), t)
For any specified values of (s,t), the vector C(s,t) is the vector from the origin to a point on the cylinder. It is sometimes useful to consider basing the vectors at a point q, in which case the function F(s,t) = C(s,t) - q is the vector from q to a point on the cylinder. You can define this function in SAS/IML by using the following statements. You can then use the NLPFDD subroutine to compute the Jacobian matrix of the function at any value of (s,t), as follows:

/* Example of a vector-valued function f: R^n -> R^m for n=2 and m=3.
   Map from (s,t) in [0, 2*pi) x R to the cylinder in R^3.
   The vector is from q = (0, 2, 1) to F(s,t)  */
proc iml;
 
q = {0, 2, 1};               /* define a point in R^3 */
/* vector from q to a point on a cylinder */
start VecF(st) global(q);
   s = st[1]; t = st[2];
   F = j(3, 1, .);
   F[1] = cos(s) - q[1];
   F[2] = sin(s) - q[2];
   F[3] =     t  - q[3];
   return( F );
finish;
 
/* || VecF || has local min at (s,t) = (pi/2, 1), so let's evaluate derivatives at (pi/2, 1) */
pi = constant('pi');
x =  pi/2 || 1;              /* value of (s,t) at which to evaluate derivatives */
/* Parameters: the function has 3 components, use forward difference, choose step according to scale of F */
parm = {3, 00, .};           
call nlpfdd(f, J, JtJ, "VecF", x) par=parm;  /* return f(x), J=DF(x), and J`*J */
 
print f[r={'F_1' 'F_2' 'F_3'} L='F(pi/2, 1)'], 
      J[r={'F_1' 'F_2' 'F_3'} c={'d/ds' 'd/dt'} L='DF(pi/2, 1) [Fwd Diff]'];

The program evaluates the function and the Jacobian at the point (s,t)=(π/2, 1). You must use the PAR= keyword when the function is vector-valued. The PAR= argument is a three-element vector that contains the following information. If you specify a missing value, the default value is used.

  • The first element is the number of components (m) for the function. By default, the NLPFDD subroutine expects a scalar-valued function (m=1), so you must specify PARM[1]=3 for this problem.
  • The second element is the finite-difference method. The function supports several schemes, but I always use either 0 (forward difference) or 1 (central difference). By default, PARM[2]=0.
  • The third element provides a way to choose the step size for the finite-difference method. I recommend that you use the default value.

The output shows that the value of the function at (s,t)=(π/2, 1) is {0, -1, 0}. The Jacobian at that point is shown. The gradient of the i_th component function (Fi) is contained in the i_th row of the Jacobian matrix.

  • The gradient of the first component function at (π/2, 1) is {-1 0}.
  • The gradient of the second component function at (π/2, 1) is {0 0}, which means that F2 is at a local extremum. (In this case, F2 is at a minimum).
  • The gradient of the third component function is {0 1}.

The NLPFDD subroutine also returns a third result, which is named JtJ in the program. It contains the matrix product J`*J, where J is the Jacobian matrix. This product is used in some numerical methods, such as the Gauss-Newton algorithm, to minimize the value of a vector-valued function. Unless you are writing your own algorithm to perform a least-squares minimization, you probably won't need the third output argument.

Summary

The NLPFDD subroutine in SAS/IML uses finite-difference derivatives to approximate the derivatives of multivariate functions. A previous article shows how to use the NLPFDD subroutine to approximate the gradient and Hessian of a scalar-valued function. This article shows how to approximate the Jacobian of a vector-valued function. Numerical approximations for derivatives are useful in all sorts of numerical routines, including optimization, root-finding, and solving nonlinear least-squares problems.

In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer implementation that has a similar syntax.

The post Finite-difference derivatives of vector-valued functions appeared first on The DO Loop.

3月 022022
 

Many applications in mathematics and statistics require the numerical computation of the derivatives of smooth multivariate functions. For simple algebraic and trigonometric functions, you often can write down expressions for the first and second partial derivatives. However, for complicated functions, the formulas can get unwieldy (and some applications do not have explicit analytical derivatives). In those cases, numerical analysts use finite-difference methods to approximate the partial derivatives of a function at a point of its domain. This article is about how to compute first-order and second-order finite-difference derivatives for smooth functions in SAS.

Formulas for finite-difference derivatives

There are many ways to approximate the derivative of a function at a point, but the most common formulas are for the forward-difference method and the central-difference method. In SAS/IML software, you can use the NLPFDD subroutine to compute a finite-difference derivative (FDD).

This article shows how to compute finite-difference derivatives for a smooth scalar-valued function f: Rn → R at a point x0 in its domain. The first derivatives are the elements of the gradient vector, grad(f). The i_th element of the gradient vector is ∂f / ∂xi for i = 1, 2, ..., n. The second derivatives are the elements of the Hessian matrix, Hes(f). The (i,j)th element of the Hessian matrix is ∂2f / ∂xixj for i,j = 1, 2, ..., n. For both cases, the derivatives are evaluated at x0, so grad(f)(x0) is a numerical row vector and Hes(f)(x0) is a numerical symmetric matrix.

For this article, I will use the following cubic function of two variables:
\(f(x,y) = x^3 - y^2 - x + 0.5 \)
A heat map of the function is shown to the right. The function has two critical points at which the gradient is the zero vector: a local maximum at (-1/sqrt(3), 0) and a saddle point at (+1/sqrt(3), 0). The formula for the gradient is grad(f) = [ \(3 x^2 - 1\), \(-2 y\)] and the elements of the Hessian matrix are H[1,1] = 6 x, H[1,2] = H[2,1] = 0, and H[2,2] = -2. You can evaluate these formulas at a specified point to obtain the analytical derivatives at that point.

Finite-difference derivatives in SAS

In SAS, you can use the NLPFDD subroutine in PROC IML to compute finite difference derivatives. (In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer implementation.) By default, the NLPFDD subroutine uses forward-difference formulas to approximate the derivatives. The subroutine returns three output arguments: the value of the function, gradient, and Hessian, respectively, evaluated at a specified point. The following SAS/IML program defines the cubic function and calls the NLPFDD subroutine to approximate the derivatives:

proc iml;
start Func(xy);
   x = xy[,1]; y = xy[,2];
   z = x##3 - y##2 - x + 0.5;
   return( z );
finish;
 
/* Function has local max at (x_max, 0) where x_max = -1/sqrt(3) = -0.57735 */
x_max =  -1/sqrt(3) || 0;
call nlpfdd(f_max, grad_max, Hes_max, "Func", x_max);
 
reset fuzz=2e-6;       /* print 0 for any number less than FUZZ */
print f_max, grad_max, Hes_max;

The output shows the value of the function and its derivatives at x0 = (-1/sqrt(3), 0), which is a local maximum. The value of the function is 0.88. The gradient at the local maximum is the zero vector. The Hessian is a diagonal matrix. The forward-difference derivatives differ from the corresponding analytical values by 2E-6 or less.

Use finite-difference derivatives to classify extrema

One use of the derivatives is to determine whether a point is an extreme value. Points where the gradient of the function vanishes are called critical points. A critical point can be a local extremum or a saddle point. The eigenvalues of the Hessian matrix determine whether a critical point is a local extremum. The eigenvalues are used in the second-derivative test:

  1. If the eigenvalues are all positive, the critical point is a local minimum.
  2. If the eigenvalues are all negative, the critical point is a local maximum.
  3. If some eigenvalues are positive and others are negative, the critical point is a saddle point.
  4. If any eigenvalue is zero, the test is inconclusive.

For the current example, the Hessian is a diagonal matrix. By inspection, the eigenvalues are all negative, which proves that the point x0 = (-1/sqrt(3), 0) is a local maximum. You can repeat the computations for x_s = 1/sqrt(3) || 0, which is a saddle point.

Forward- versus central-difference derivatives

You can evaluate the derivatives at any point, not merely at the critical points. For example, you can evaluate the derivatives of the function at x1 = (-1, 0.5), which is not an extremum:

x1 = {-1 0.5};
call nlpfdd(FwdD_f, FwdD_g, FwdD_H, "Func", x1);
print FwdD_g, FwdD_H;

By default, the NLPFDD subroutine uses a forward-difference method to approximate the derivatives. You can also use a central difference scheme. The documentation describes six different ways to approximate derivatives, but the most common way is to use the objective function to choose a step size and to use either a forward or central difference. You can use the PAR= keyword to specify the method: PAR[2] = 0 specifies forward difference and PAR[2]=1 specifies central difference:

parm = {.,  00, .};                /* forward diff based on objective function (default) */
call nlpfdd(FwdD_f, FwdD_g, FwdD_H, "Func", x1) par=parm;
parm = {.,  01, .};                /* central diff based on objective function */
call nlpfdd(CenD_f, CenD_g, CenD_H, "Func", x1) par=parm;
 
grad_results = (FwdD_g // CenD_g);
print grad_results[r={"Fwd FD" "Cen FD"} c={"df/dx" "df/dy"}];
HesDiff = FwdD_H - CenD_H;
reset fuzz=2e-16;       /* print small numbers */
print HesDiff;

The first table shows the results for the forward-difference method (first row) and the central-difference method (second row). The Hessian matrices are very similar. The output shows that the maximum difference between the Hessians is about 4.5E-7. In general, the forward-difference method is faster but less accurate. The central-difference method is slower but more accurate.

Summary

This article shows how to use the NLPFDD subroutine in SAS/IML to approximate the first-order and second-order partial derivatives of a multivariate function at any point in its domain. The subroutine uses finite-difference methods to approximate the derivatives. You can use a forward-difference method (which is fast but less accurate) or a central-difference method (which is slower but more accurate). The derivatives are useful in many applications, including classifying critical points as minima, maxima, and saddle points.

The post Finite-difference derivatives in SAS appeared first on The DO Loop.

2月 232022
 

It is important to be able to detect whether a numerical matrix is symmetric. Some operations in linear algebra require symmetric matrices. Sometimes, you can use special algorithms to factor a symmetric matrix. In both cases, you need to test a matrix for symmetry.

A symmetric matrix must be square. Mathematically, we say that an n x n matrix, A, is symmetric if A = AT. In terms of the elements of A, the matrix is symmetric if A[i,j] = A[j,i] for all 1 < i < j ≤ n.

How to test a numerical matrix for symmetry

In numerical linear algebra, you should usually avoid testing floating-point numbers for equality. Instead, ask whether the difference | A[i,j] - A[j,i] | is smaller than some value, δ. But what small value should you choose for δ? Although a value such as δ = 1E-14 is suitable for many matrices, symmetry should not depend on the scaling of the matrix. In other words, the test for the matrices A and k*A should give the same results for any nonzero scalar, k.

One way to ensure that the scale of the matrix does not affect the result is to operate on a standardized matrix. You can use the elements of A to determine the scale factor. Let c = max(|A[i,j]|) for 1 ≤ i, j ≤ n. Define δ = c*sqrt(ε), where ε is machine precision. In SAS, you can use the CONSTANT('MACEPS') and CONSTANT('SQRTMACEPS') function calls to obtain the value of ε and sqrt(ε), respectively. SAS uses 2.22E-16 for machine precision, so sqrt(ε) ≈ 1.5e-8.

In summary, a good test for symmetry is to test whether all pairwise differences satisfy the equation | A[i,j] - A[j,i] | < c*sqrt(ε), where c = max(|A[i,j]|). This is equivalent to scaling A so that the largest element has unit magnitude and comparing the scaled differences with sqrt(ε).

In a matrix language such as SAS/IML, you can easily write a function that tests for symmetry, as follows:

proc iml;
 
start TestSym(A);
   if nrow(A) ^= ncol(A) then return(0);    /* A is not square */
   c = max(abs(A));
   sqrteps = constant('SqrtMacEps');
   return( all( abs(A-A`) < c*sqrteps ) );
finish;

Examples of testing a matric for symmetry

With that definition, let's see whether the function works on a few example matrices. The first example is a symmetric matrix. Let's see whether the test returns 1 (true) for the matrix and for various scalings of the matrix:

/* A is symmetric. TestSym(A) should return 1 (true). */
A = {1  2  3 -4  5,
     2  3  2  3  3,
     3  2  5 -1  0,
    -4  3 -1  4 -2, 
     5  3  0 -2  1 };
b1 = TestSym(A);
b2 = TestSym(1E6 * A);
b3 = TestSym(1E-8 * A);
print b1 b2 b3;

Success. The TestSym function correctly detects that the matrix is symmetric. Various scalings of the matrix are also deemed symmetric.

What happens if we break the symmetry by making a tiny change to the (3,5) element?

/* break the symmetry by changing an off-diagonal element */
A[3,5] = 1e-6;
d1 = TestSym(A);
d2 = TestSym(1E6 * A);
d3 = TestSym(1E-8 * A);
print d1 d2 d3;

For this matrix, the magnitude of the largest element is c=5. The perturbation to the matrix is 1e-6, which is less than c*sqrt(ε). Accordingly, the test detects that the perturbed matrix and its scalings are not symmetric. If you rerun the previous statements but use a smaller perturbation such as A[3,5] = 1e-9, the test will declare that the matrix is "numerically symmetric." Tiny differences between A[i,j] and A[j,i] are ignored, where "tiny" is relative to the magnitude of the largest element of A.

Make a symmetric matrix

There are times when you might want to form a matrix that is symmetric from a matrix that is nearly symmetric. An easy way to do that is to use the matrix B = (A + AT)/2, as follows:

B = (A + A`)/2;      /* B is always symmetric */
isSym = TestSym(B);
print isSym;

For any square matrix, A, the matrix B = (A + AT)/2 is symmetric. I call the matrix B the symmetricization of A. An off-diagonal elements B[i,j] is the average of the corresponding elements A[i,j] and A[j,i].

Summary

This article shows how to test a matrix for symmetry in numerical linear algebra. It uses the largest value of the matrix as a scale factor to standardize the computation. This technique is used in several SAS/IML functions that need to determine whether a matrix is symmetric.

Test for skew symmetric matrix

Here is an exercise for the motivated reader. A square matrix, M, is skew-symmetric if it satisfies the equation M = -M`. In other words, M[i,j] = -M[j,i] for all 1 < i, j ≤ n. Can you modify the TestSym function to test a matrix for a skew-symmetry to within some numerical precision? Hint: You only need to change one line!

The post Test for a symmetric matrix appeared first on The DO Loop.

2月 232022
 

It is important to be able to detect whether a numerical matrix is symmetric. Some operations in linear algebra require symmetric matrices. Sometimes, you can use special algorithms to factor a symmetric matrix. In both cases, you need to test a matrix for symmetry.

A symmetric matrix must be square. Mathematically, we say that an n x n matrix, A, is symmetric if A = AT. In terms of the elements of A, the matrix is symmetric if A[i,j] = A[j,i] for all 1 < i < j ≤ n.

How to test a numerical matrix for symmetry

In numerical linear algebra, you should usually avoid testing floating-point numbers for equality. Instead, ask whether the difference | A[i,j] - A[j,i] | is smaller than some value, δ. But what small value should you choose for δ? Although a value such as δ = 1E-14 is suitable for many matrices, symmetry should not depend on the scaling of the matrix. In other words, the test for the matrices A and k*A should give the same results for any nonzero scalar, k.

One way to ensure that the scale of the matrix does not affect the result is to operate on a standardized matrix. You can use the elements of A to determine the scale factor. Let c = max(|A[i,j]|) for 1 ≤ i, j ≤ n. Define δ = c*sqrt(ε), where ε is machine precision. In SAS, you can use the CONSTANT('MACEPS') and CONSTANT('SQRTMACEPS') function calls to obtain the value of ε and sqrt(ε), respectively. SAS uses 2.22E-16 for machine precision, so sqrt(ε) ≈ 1.5e-8.

In summary, a good test for symmetry is to test whether all pairwise differences satisfy the equation | A[i,j] - A[j,i] | < c*sqrt(ε), where c = max(|A[i,j]|). This is equivalent to scaling A so that the largest element has unit magnitude and comparing the scaled differences with sqrt(ε).

In a matrix language such as SAS/IML, you can easily write a function that tests for symmetry, as follows:

proc iml;
 
start TestSym(A);
   if nrow(A) ^= ncol(A) then return(0);    /* A is not square */
   c = max(abs(A));
   sqrteps = constant('SqrtMacEps');
   return( all( abs(A-A`) < c*sqrteps ) );
finish;

Examples of testing a matric for symmetry

With that definition, let's see whether the function works on a few example matrices. The first example is a symmetric matrix. Let's see whether the test returns 1 (true) for the matrix and for various scalings of the matrix:

/* A is symmetric. TestSym(A) should return 1 (true). */
A = {1  2  3 -4  5,
     2  3  2  3  3,
     3  2  5 -1  0,
    -4  3 -1  4 -2, 
     5  3  0 -2  1 };
b1 = TestSym(A);
b2 = TestSym(1E6 * A);
b3 = TestSym(1E-8 * A);
print b1 b2 b3;

Success. The TestSym function correctly detects that the matrix is symmetric. Various scalings of the matrix are also deemed symmetric.

What happens if we break the symmetry by making a tiny change to the (3,5) element?

/* break the symmetry by changing an off-diagonal element */
A[3,5] = 1e-6;
d1 = TestSym(A);
d2 = TestSym(1E6 * A);
d3 = TestSym(1E-8 * A);
print d1 d2 d3;

For this matrix, the magnitude of the largest element is c=5. The perturbation to the matrix is 1e-6, which is less than c*sqrt(ε). Accordingly, the test detects that the perturbed matrix and its scalings are not symmetric. If you rerun the previous statements but use a smaller perturbation such as A[3,5] = 1e-9, the test will declare that the matrix is "numerically symmetric." Tiny differences between A[i,j] and A[j,i] are ignored, where "tiny" is relative to the magnitude of the largest element of A.

Make a symmetric matrix

There are times when you might want to form a matrix that is symmetric from a matrix that is nearly symmetric. An easy way to do that is to use the matrix B = (A + AT)/2, as follows:

B = (A + A`)/2;      /* B is always symmetric */
isSym = TestSym(B);
print isSym;

For any square matrix, A, the matrix B = (A + AT)/2 is symmetric. I call the matrix B the symmetricization of A. An off-diagonal elements B[i,j] is the average of the corresponding elements A[i,j] and A[j,i].

Summary

This article shows how to test a matrix for symmetry in numerical linear algebra. It uses the largest value of the matrix as a scale factor to standardize the computation. This technique is used in several SAS/IML functions that need to determine whether a matrix is symmetric.

Test for skew symmetric matrix

Here is an exercise for the motivated reader. A square matrix, M, is skew-symmetric if it satisfies the equation M = -M`. In other words, M[i,j] = -M[j,i] for all 1 < i, j ≤ n. Can you modify the TestSym function to test a matrix for a skew-symmetry to within some numerical precision? Hint: You only need to change one line!

The post Test for a symmetric matrix appeared first on The DO Loop.