I previously wrote about using SAS/IML for nonlinear optimization, and demonstrated optimization by maximizing a likelihood function. Many well-known optimization algorithms require derivative information during the optimization, including the conjugate gradient method (implemented in the NLPCG subroutine) and the Newton-Raphson method (implemented in the NLPNRA method).

You should specify analytic derivatives whenever you can, because they are more accurate and more efficient than numerical derivatives. Computing a derivative is not hard, but it is important to get it right. A mistake in the coding of derivative information can difficult to find, and can lead to wrong answers in the optimization.

That's why I suggest two techniques to make sure that derivatives are correct:

- Use symbolic derivative software to compute complex derivatives.
- Use finite difference computations to check that the derivatives are correct. In SAS/IML software, you can use the NLPFDD subroutine to approximate derivatives with finite-differences.

### Example: Derivatives in MLE estimation

Suppose that you have a real-valued function of several variables. To optimize the function, you might want to compute the gradient function, which is the vector of first derivatives. (The gradient is identically zero at local optima.)

To be concrete, let's reuse the log-likelihood example from my previous blog post. The following SAS/IML module defines the log-likelihood function for fitting a normal density curve to data. Given a data vector (*x*_{1}, *x*_{2}, ..., *x*_{n}), the function can be used to find parameters (μ, σ) such that the data are most likely to be a sample from a normal distribution with those parameters.

proc iml; /* the log-likelihood function for the normal distribution N(mu, sigma) */ start LogLik(param) global (x); mu = param[1]; sigma2 = param[2]##2; n = nrow(x); c = x - mu; f = -n/2*log(sigma2) - 1/2/sigma2*sum(c##2); return ( f ); finish;

### Tip 1: Compute symbolic derivatives

The previous log-likelihood function is simple enough that you can manually compute the derivatives of the function with respect to the parameters `mu` and `sigma`. However, you might be interested in knowing that there are online tools that you can use to compute symbolic derivatives.
The one I use is Wolfram Alpha, which is powered by the same software as *Mathematica*^{®}. In *Mathematica*, the D operator is used to compute derivatives. The same operator works in Wolfram Alpha. So, for example, to compute the derivative with respect to sigma, you can type

`D[ -n/2*log(sigma^2) - 1/2/sigma^2*Sum[(x_i-mu)^2], sigma]`

and Wolfram Alpha will return the derivative

`-n/sigma + Sum[(x_i-mu)^2] / sigma^3`

I was less successful computing the derivative with respect to mu. Perhaps the Sum[] function confused the software. However, you can take the derivative of the quantity inside the summation:

`D[ -1/2/sigma^2*(x_i-mu)^2, mu]`

and Wolfram Alpha will return the derivative

`(x_i-mu) / sigma^2`

The gradient of the log-likelihood function is therefore as follows:

start GradLogLik(param) global (x); mu = param[1]; sigma = param[2]; sigma2 = sigma##2; n = nrow(x); c = x - mu; dfdmu = sum(c) / sigma2; dfdsigma = -n/sigma + sum(c##2)/sigma##3; return ( dfdmu || dfdsigma ); finish;

### Tip 2: Use the NLPFDD subroutine to check derivatives

Is the gradient function programmed correctly? Let's find out by calling the GradLogLik function and also the NLPFDD subroutine, which will compute a finite difference approximation to the derivatives of the LogLik function:

/* Use the NLPFDD subroutine to check analytic derivatives */ use Sashelp.Iris; read all var {SepalWidth} into x; close Sashelp.Iris; p = {30 4}; /* evaluate derivative at this (mu, sigma) */ grad = GradLogLik(p); call nlpfdd(func, FDDGrad, hessian, "LogLik", p); diff = max(abs(grad - FDDGrad)); print grad, FDDGrad, diff;

Success! The gradient from the GradLogLik function and the finite difference approximation (in the FDDGrad vector) are essentially the same. Of course, we might have gotten lucky; we evaluated the derivative only at a single set of parameter values. If you want to be extra cautious, the following statements use the Define2DGrid module to generate a whole grid of parameter values. The exact gradient and the finite difference derivatives are evaluated at each of these parameter values.

/* optional: evaluate derivatives on a grid */ params = Define2DGrid( 27, 36, 11, 3, 6, 11 ); exactgrad = j(nrow(params),2); fddgrad = j(nrow(params),2); do i = 1 to nrow(params); exactgrad[i,] = GradLogLik( params[i,] ); call nlpfdd(func, grad, hessian, "LogLik", params[i,] ); fddgrad[i,] = grad; end; diff = max(abs( exactgrad - fddgrad )); print diff;

This was probably more work than was necessary, but it shows that the analytical derivatives and the finite difference derivatives agree to five decimal places. I am now confident that my analytical derivatives are correct.