A popular use of SAS/IML software is to optimize functions of several variables. One statistical application of optimization is estimating parameters that optimize the maximum likelihood function. This post gives a simple example for maximum likelihood estimation (MLE): fitting a parametric density estimate to data.

### Which density curve fits the data?

If you plot a histogram for the SepalWidth variable in the famous Fisher's iris data, the data look normally distributed. The normal distribution has two parameters: μ is the mean and σ is the standard deviation. For each possible choice of (μ, σ), you can ask the question, "If the true population is N(μ, σ), how likely is it that I would get the SepalWidth sample?" Maximum likelihood estimation is a technique that enables you to estimate the "most likely" parameters. This is commonly referred to as fitting a parametric density estimate to data.

Visually, you can think of overlaying a bunch of normal curves on the histogram and choosing the parameters for the best-fitting curve. For example, the following graph shows four normal curves overlaid on the histogram of the SepalWidth variable:

proc sgplot data=Sashelp.Iris; histogram SepalWidth; density SepalWidth / type=normal(mu=35 sigma=5.5); density SepalWidth / type=normal(mu=32.6 sigma=4.2); density SepalWidth / type=normal(mu=30.1 sigma=3.8); density SepalWidth / type=normal(mu=30.5 sigma=4.3); run;

It is clear that the first curve, N(35, 5.5), does not fit the data as well as the other three. The second curve, N(32.6, 4.2), fits a little better, but mu=32.6 seems too large. The remaining curves fit the data better, but it is hard to determine which is the best fit. If I had to guess, I'd choose the brown curve, N(30.5, 4.3), as the best fit among the four.

The method of maximum likelihood provides an algorithm for choosing the best set of parameters: choose the parameters that maximize the likelihood function for the data. Parameters that maximize the log-likelihood also maximize the likelihood function (because the log function is monotone increasing), and it turns out that the log-likelihood is easier to work with. (For the normal distribution, you can explicitly solve the likelihood equations for the normal distribution. This provides an analytical solution against which to check the numerical solution.)

### The likelihood and log-likelihood functions

The UNIVARIATE procedure uses maximum likelihood estimation to fit parametric distributions to data. The UNIVARIATE procedure supports fitting about a dozen common distributions, but you can use SAS/IML software to fit any parametric density to data. There are three steps:

- Write a SAS/IML module that computes the log-likelihood function. Optionally, since many numerical optimization techniques use derivative information, you can provide a module for the gradient of the log-likelihood function. If you do not supply a gradient module, SAS/IML software automatically uses finite differences to approximate the derivatives.
- Set up any constraints for the parameters. For example, the scale parameters for many distributions are restricted to be positive values.
- Call one of the SAS/IML nonlinear optimization routines. For this example, I call the NLPNRA subroutine, which uses a Newton-Raphson algorithm to optimize the log-likelihood.

#### Step 1: Write a module that computes the log-likelihood

A general discussion of log-likelihood functions, including formulas for some common distributions, is available in the documentation for the GENMOD procedure. The following module computes the log-likelihood for the normal distribution:

proc iml; /* write the log-likelihood function for Normal dist */ start LogLik(param) global (x); mu = param[1]; sigma2 = param[2]##2; n = nrow(x); c = x - mu; f = -n/2*log(sigma2) - 1/2/sigma2*sum(c##2); return ( f ); finish;

Notice that the arguments to this module are the two parameters `mu` and `sigma`. The data vector (which is constant during the optimization) is specified as the global variable `x`. *When you use an optimization method, SAS/IML software requires that the arguments to the function be the quantities to optimize.* Pass in other parameters by using the GLOBAL parameter list.

It's always a good idea to test your module. Remember those four curves that were overlaid on the histogram of the SepalWidth data? Let's evaluate the log-likelihood function at those same parameters. We expect that log-likelihood should be low for parameter values that do not fit the data well, and high for parameter values that do fit the data.

use Sashelp.Iris; read all var {SepalWidth} into x; close Sashelp.Iris; /* optional: test the module */ params = {35 5.5, 32.6 4.2, 30.1 3.8, 30.5 4.3}; LogLik = j(nrow(params),1); do i = 1 to nrow(params); p = params[i,]; LogLik[i] = LogLik(p); end; print Params[c={"Mu" "Sigma"} label=""] LogLik;

The log-likelihood values confirm our expectations. A normal density curve with parameters (35, 5.5) does not fit the data as well as the other parameters, and the curve with parameters (30.5, 4.3) fits the curve the best because its log-likelihood is largest.

#### Step 2: Set up constraints

The *SAS/IML User's Guide* describes how to specify constraints for nonlinear optimization. For this problem, specify a matrix with two rows and *k* columns, where *k* is the number of parameters in the problem. For this example, *k=2*.

- The first row of the matrix specifies the lower bounds on the parameters. Use a missing value (.) if the parameter is not bounded from below.
- The second row of the matrix specifies the upper bounds on the parameters. Use a missing value (.) if the parameter is not bounded from above.

The only constraint on the parameters for the normal distribution is that `sigma` is positive. Therefore, the following statement specifies the constraint matrix:

/* mu-sigma constraint matrix */ con = { . 0, /* lower bounds: -infty < mu; 0 < sigma */ . .}; /* upper bounds: mu < infty; sigma < infty */

#### Step 3: Call an optimization routine

You can now call an optimization routine to find the MLE estimate for the data. You need to provide an initial guess to the optimization routine, and you need to tell it whether you are finding a maximum or a minimum. There are other options that you can specify, such as how much printed output you want.

p = {35 5.5};/* initial guess for solution */ opt = {1, /* find maximum of function */ 4}; /* print a LOT of output */ call nlpnra(rc, result, "LogLik", p, opt, con);

The following table and graph summarize the optimization process. Starting from the initial condition, the Newton-Raphson algorithm takes six steps before it reached a maximum of the log-likelihood function. (The exact numbers might vary in different SAS releases.) At each step of the process, the log-likelihood function increases. The process stops when the log-likelihood stops increasing, or when the gradient of the log-likelihood is zero, which indicates a maximum of the function.

You can summarize the process graphically by plotting the path of the optimization on a contour plot of the log-likelihood function. You can see that the optimization travels "uphill" until it reaches a maximum.

### Check the optimization results

As I mentioned earlier, you can explicitly solve for the parameters that maximize the likelihood equations for the normal distribution. The optimal value of the `mu` parameter is the sample mean of the data. The optimal value of the `sigma` parameter is the unadjusted standard deviation of the sample. The following statements compute these quantities for the SepalWidth data:

OptMu = x[:]; OptSigma = sqrt( ssq(x-OptMu)/nrow(x) );

The values found by the NLPNRA subroutine agree with these exact values through seven decimal places.