In categorical data analysis, it is common to analyze tables of counts. For example, a researcher might gather data for 18 boys and 12 girls who apply for a summer enrichment program. The researcher might be interested in whether the proportion of boys that are admitted is different from the proportion of girls. For this analysis, the data that you type into statistical software (such as SAS) does not typically consist of 20 observations. Rather, the data often consists of four observations and a frequency variable (also call a count variable). The first two observations record the number of boys that are admitted and not admitted, respectively. The next two observations record the corresponding frequencies for the girls.

This representation of the data is sometimes called frequency data. It is used when you can group individual subjects into categories such as boy/girl and admitted/not admitted. Despite its convenience, there are times when it is useful to expand or "unroll" the data to enumerate the individual subjects. Examples include bootstrap estimates and analyzing the data in SAS procedures that do not support a FREQ statement.

This article shows how to use the SAS DATA step to unroll frequency data. In addition, this article shows how to expand data that are in events/trials format.

### Data in frequency form

The following example is from the documentation of the FREQ procedure. The goal is to compute some statistics for a 2 x 2 frequency table for 23 patients. Eight patients were on a low-cholesterol diet and 15 were on a high-cholesterol diet. The table shows how many subjects in each group developed heart disease. The data are used to estimate the relative risk of developing heart disease in the two groups:

/* Doc Example: PROC FREQ analysis of a 2x2 Contingency Table */ proc format; value ExpFmt 1='High Cholesterol Diet' 0='Low Cholesterol Diet'; value RspFmt 1='Yes' 0='No'; run;   data FatComp; format Exposure ExpFmt. Response RspFmt.; input Exposure Response Count; label Response='Heart Disease'; datalines; 0 0 6 0 1 2 1 0 4 1 1 11 ;   title 'Case-Control Study of High Fat/Cholesterol Diet'; proc freq data=FatComp; tables Exposure*Response / relrisk norow nocol nopercent; weight Count; /* each row represents COUNT patients */ run;

Notice that the data are input in terms of group frequencies. The output is shown. The analysis estimates the risk of heart disease for a subject on a high-cholesterol diet as 2.8 times greater than the risk for a subject on a low-cholesterol diet.

### How to unroll frequency data in SAS

You can use the DATA step in SAS to unroll the frequency data. The COUNT variable tells you how many patients are in each group. You can therefore write a loop that expands the data: iterate from 1 to the COUNT value and put an OUTPUT statement in the body of the loop. If desired, you can include an ID variable to enumerate the subjects in each group, as follows:

data RawFat; set FatComp; Group + 1; /* identify each category */ do ID = 1 to Count; /* identify subjects in each category */ output; end; drop Count; run;   proc print data=RawFat(obs=10); run;

Whereas the original data set has four rows, the RawFat data has 23 rows. The first 10 rows are shown. The first six observations are the expansion of the subjects in the first group (Exposure=Low; Response=No). The next two observations are the subjects in the second group (Exposure=Low; Response=Yes), and so forth. The GROUP variable identifies the original groups. The ID variable enumerates patients within each group.

The second data set can be analyzed by any SAS procedure. In this form, the data do not require using a FREQ statement (or the WEIGHT statement in PROC FREQ), as shown by the following call to PROC FREQ:

/* run the same analysis on the expanded data */ proc freq data=RawFat; tables Exposure*Response / relrisk norow nocol nopercent; /* no WEIGHT statement because there is one row for each patient */ run;

The output (not shown) is the same as for the frequency data. Both data sets contain the same information, but the second might be useful for certain tasks such as a bootstrap analysis.

### Data in events/trials form

A related data format is called the events/trials format. In the events/trials format, each observation has two columns that specify counts. One column specifies the number of times that a certain event occurs, and the other column specifies the number of trials (or attempts) that were conducted. This format is used for logistic regression and for other generalized linear regression models that predict the proportion of events.

The following example is from the documentation for the LOGISTIC procedure. The data describe the result of 19 batches of ingots in a designed experiment. Each batch was soaked and heated for specified combinations of times. The ingots were then inspected to determine which were not ready for rolling. The following DATA step records the results in the events/trials syntax. The R variable records the number of ingots that are not ready for rolling, and the N variable contains the number of ingots in the batch at the given values of the HEAT and SOAK variables. Consequently, R/N is the proportion of ingots that were not ready. For example, the values R=4 and N=44 means that 4 ingots (9%) were not ready for rolling in a batch that contained 44. Regression procedures such as PROC LOGISTIC enable you to analyze data by using an events/trials syntax:

/* events/trials syntax */ data ingots; input Heat Soak r n @@; datalines; 7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13 7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1 7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1 7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1 7 4.0 0 9 14 4.0 0 19 27 4.0 1 16 ;   proc logistic data=ingots; model r/n = Heat Soak; ods select NObs ResponseProfile ParameterEstimates; run;

The "NObs" table shows that there are 19 rows of data, which represents 387 units. The "Response Profile" table counts and displays the number of events and the number of trials. For these data, 12 units were not ready for rolling. The "ParameterEstimates" table provides estimates for the Intercept and the coefficients of the effects in the logistic regression model.

### How to expand event/trials data

You can expand these data by creating 387 observations, one for each unit. The following DATA step creates a variable named BATCH that enumerates the original 19 batches of ingots. The ID variable enumerates the units in each batch. Some of the units were ready for rolling and some were not. You can create an indicator variable (NOTREADY) that arbitrarily sets the first R ingots as "not ready" (NOTREADY=1) and the remaining ingots as ready (NOTREADY=0).

/* expand or unroll the events/trials data */ data RawIngots; set ingots; BatchID + 1; do ID = 1 to n; NotReady = (ID <= r); output; end; run;   /* analyze the binary variable NOTREADY */ proc logistic data=RawIngots; model NotReady(event='1') = Heat Soak; ods select NObs ResponseProfile ParameterEstimates; run;

The "NOBs" and "Response Profile" tables are different from before, but they contain the same information. The "ParameterEstimates" table is the same and is not shown.

### Summary

In categorical data analysis, it is common to analyze data that are in frequency form. Each observation represents some number of subjects, and the number is given by the value of a Count variable. In some situations, it is useful to "unroll" or expand the data. If an observation has frequency C, it results in C observations in the expanded data set. This article shows how to use the SAS DATA step to expand frequency data and data that are in events/trials syntax.

The post How to unroll frequency data appeared first on The DO Loop.

Recently, I wrote about Bartlett's test for sphericity. The purpose of this hypothesis test is to determine whether the variables in the data are uncorrelated. It works by testing whether the sample correlation matrix is close to the identity matrix.

Often statistics textbooks or articles include a statement such as "under the null hypothesis, the test statistic is distributed as a chi-square statistic with DF degrees of freedom." That sentence contains a lot of information! Using algebraic formulas and probability theory to describe a sampling distribution can be very complicated. But it is straightforward to describe a sampling distribution by using simulation and statistical graphics.

This article discusses how to use simulation to understand the sampling distribution of a statistic under a null hypothesis. You can find many univariate examples of simulating a sampling distribution for univariate data, but this article shows the sampling distribution of a statistic for a hypothesis test on multivariate data. The hypothesis test in this article is Bartlett's sphericity test, but the ideas apply to other statistics. It also shows a useful trick for simulating correlation matrices for multivariate normal (MVN) data.

### The simulation approach to null distributions

Simulation is well-suited for explaining the sampling distribution of a statistic under the null hypothesis. For Bartlett's sphericity test, the null hypothesis is that the data are a random sample of size N from a multivariate normal population of p uncorrelated variables. Equivalently, the correlation matrix for the population is the p x p identity matrix. Under this hypothesis, the test statistic
$T = -\log(\det(R)) (N-1-(2p+5)/6)$
has a chi-square distribution with p(p–1)/2 degrees of freedom, where R is the sample correlation matrix. In other words, if we randomly sample many times from an uncorrelated MVN distribution, the statistics for each sample will follow a chi-square distribution. Let's use simulation to visualize the null distribution for Bartlett's test.

Here is a useful fact: You do not have to generate MVN data if the statistic is related to the covariance or correlation of the data. Instead, you can DIRECTLY simulate a correlation matrix for MVN data by using the Wishart distribution. The following SAS/IML statements use the Wishart distribution to simulate 10,000 correlation matrices for MVN(0, Σ) samples, where Σ is a diagonal covariance matrix:

proc iml; call randseed(12345); p = 3; /* number of MVN variables */ N = 50; /* MVN sample size */ Sigma = diag(1:p); /* population covariance */ NumSamples = 10000; /* number of samples in simulation */   /* Simulate correlation matrices for MVN data by using the Wishart distribution. /* Each row of A is a scatter matrix; each row of B is a covariance matrix */ A = RandWishart(NumSamples, N-1, Sigma); /* N-1 degrees of freedom */ B = A / (N-1); /* rescale to form covariance matrix */   /* print one row to show an example */ C = shape(B[1,], p); /* reshape 1st row to p x p matrix */ R = cov2corr(C); /* convert covariance to correlation */ print C, R;

The output shows one random covariance matrix (C) and its associate correlation matrix (R) from among the 1,000 random matrices. The B matrix is 10000 x 9, and each row is a sample covariance matrix for a MVN(0, Σ) sample that has N=50 observations.

Recall that the determinant of a correlation matrix is always in [0,1]. The determinant equals 1 only when R=I(p). Therefore, the expression -log(det(R)) is close to 0 when R is close to the identity matrix and gets more and more positive as R is farther and farther away from the identity matrix. (It is undefined if R is singular.) So if we apply Bartlett's formula to each of the random matrices, we expect to get a lot of statistics that are close to 0 (because R should be close to the identity) and a few that are far away. The following SAS IML program carries out this method and plots the 10,000 statistics that result:

T = j(numSamples, 1); do i = 1 to NumSamples; cov = shape(B[i,], p, p); /* convert each row to square covariance matrix */ R = cov2corr(cov); /* convert covariance to correlation */ T[i] = -log(det(R)) * (N-1-(2*p+5)/6); end;   title "Bartlett's Sphericity Statistic"; title2 "&numSamples MVN(0, Sigma) Samples for Diagonal Covariance"; call histogram(T) rebin={0.25 0.5};

The histogram shows the null distribution, which is the distribution of Bartlett's statistic under the null hypothesis. As expected, most statistics are close to 0. Only a few are far from 0, where "far" is a relative term that depends on the dimension of the data (p).

Knowing that the null distribution is a chi-square distribution with DF=p(p-1)/2 degrees of freedom helps to provide a quantitative value to "far from 0." You can use the 95th percentile of the chi-square(DF) distribution to decide whether a sample correlation matrix is "far from 0":

/* critical value of chi-square(3) */ DF = p*(p-1)/2; crit = quantile("ChiSq", 0.95, DF); /* one-sided: Pr(chiSq < crit)=0.95 */ print crit;

You can overlay the chi-square(3) distribution on the null distribution and add the critical value to obtain the following figure:

This graph summarizes the Bartlett sphericity test. The histogram approximates the null distribution by computing Bartlett's statistic on 10,000 random samples from an uncorrelated trivariate normal distribution. The curve is the asymptotic chi-square(DF=3) distribution. The vertical line is the critical value for testing the hypothesis at the 95% confidence level. The next section uses this graph and Bartlett's test to determine whether real data is a sample from an uncorrelated MVN distribution.

### Bartlett's test on real data

The previous graph shows the null distribution. If you run the test on a sample that contains three variables and 50 observations, you will get a value of the test statistic. If the value is greater than 7.8, it is unlikely that the data are from an uncorrelated multivariate normal distribution.

A previous article showed how to use PROC FACTOR to run Bartlett's test in SAS. Let's run PROC FACTOR on 50 observations and three variables of Fisher's Iris data:

proc factor data=sashelp.iris(obs=50) method=ML heywood; var SepalLength SepalWidth PetalLength ; ods select SignifTests; /* output only Bartlett's test */ run;

The value of Bartlett's statistic on these data is 41.35. The X axis for the null distribution only goes up to 20, so this value is literally "off the chart"! You would reject the null hypothesis for these data and conclude that these data do not come from the null distribution.

When researchers see this result, they often assume that one or more variables in the data are correlated. However, it could also be the case that the data are not multivariate normal since normality is an assumption that was used to generate the null distribution.

### Summary

This article shows how to simulate the null distribution for Bartlett's sphericity test. The null distribution is obtained by simulating many data samples from an uncorrelated multivariate normal distribution and graphing the distribution of the test statistics. For Bartlett's test, you get a chi-square distribution.

The ideas in this article apply more generally to other hypothesis tests. An important use of simulation is to approximate the null distribution for tests when an exact form of the distribution is not known.

This article also shows that you can use the Wishart distribution to avoid having to simulate MVN data. If the goal of the simulation is to obtain a covariance or correlation matrix for MVN data, you can use the Wishart distribution to simulate the matrix directly.

The post Simulate the null distribution for a hypothesis test appeared first on The DO Loop.

Recently, I showed how to use a heat map to visualize measurements over time for a set of patients in a longitudinal study. The visualization is sometimes called a lasagna plot because it presents an alternative to the usual spaghetti plot. A reader asked whether a similar visualization can be created for subjects if the response is an ordinal variable, such as a count. Yes! And the heat map approach is a substantial improvement over a spaghetti plot in this situation.

This article uses the following data, which represent the counts of malaria cases at five clinics over a 14-week time period:

data Clinical; input SiteID @; do Week = 1 to 14; input Count @; output; end; /* ID Wk1 Wk2 Wk3 Wk4 ... Wk14 */ datalines; 001 1 0 0 0 0 0 0 3 1 3 3 0 3 0 002 0 0 0 1 1 2 1 2 2 1 1 0 2 2 003 1 . . 1 0 1 0 3 . 1 0 3 2 1 004 1 1 . 1 0 1 2 2 3 2 1 0 . 0 005 1 1 1 . 0 0 0 1 0 1 2 4 5 1 ;   title "Spaghetti Plot of Counts for Five Clinics"; proc sgplot data=Clinical; series x=Week y=Count / group=SiteID; xaxis integer values=(1 to 14) valueshint; run;

The line plot is not an effective way to visualize these data. In fact, it is almost useless. Because the counts are discrete integer values, and most counts are in the range [0, 3], the graph cannot clearly show the weekly values for any one clinic. The following sections develop a heat map that visualizes these data better.

### Format the raw values

The following call to PROC FORMAT defines a format that associates character strings with values of the COUNT variable:

proc format; value CountFmt . = "Not Counted" 0 = "None" 1 = "1" 2 = "2" 3 = "3" 4 - high = "4+"; run;

You can use this format to encode values and display them in a legend. Notice that you could also use a format to combine counts, such as using the word "Few" to describe 2 or 3 counts.

### Associate colors to formatted values

A discrete attribute map ensures that the colors will not change if the data change. For ordinal data, it also ensures that the legend will be in ordinal order, as opposed to "data order" or alphabetical order.

You can use a discrete data map to associate graphical attributes with the formatted value of a variable. Examples of graphical attributes include marker colors, marker symbols, line colors, and line patterns. For a heat map, you want to associate the "fill color" of each cell with a formatted value. The following DATA step creates the mapping between values and colors. Notice that I use the PUTN function to apply the format to raw data values. This ensures that the mapping correctly associates formatted values with colors. The raw values are stored in an array (VAL) as are the colors (COLOR). This makes it easy to modify the map in the future or to adapt it to other situations.

### Pi and other mathematical constants

The CONSTANT function provides values for various mathematical constants. Of these, the most important constant is π ≈ 3.14159..., but you can also obtain other mathematical constants such as the natural base (e) and the golden ratio (φ). To get an approximation for π, use pi = constant('pi'). The number π is essential for working with angles and trigonometric functions. For an example, see the article, "Polygons, pi, and linear approximations," which uses π to create regular polygons.

### Machine epsilon

Arguably the most important constant in computer science is machine epsilon. This number enables you to perform floating-point comparisons such as deciding whether two numbers are equal or determining the rank of a numerical matrix. To get machine epsilon, use eps = constant('maceps').

### The largest representable integer

It is important to know the largest number that can be represented accurately in finite precision on your machine. This number is available as bigInt = constant('exactint'). This number enables you to find the largest factorial number that you can compute (18!) and the largest row of Pascal's triangle (56) that you can faithfully represent in double precision (56).

### The largest representable double

Except for π, the constants I use the most are related to BIG ≈ 1.8E308. I primarily use LOGBIG and SQRTBIG, which you can compute as
logbig = constant('logbig');
sqrtbig = constant('sqrtbig');
These constants are useful for preventing overflow when you perform arithmetic operations on large numbers:

• The quantity exp(x) can only be computed when x is less than LOGBIG ≈ 709.78.
• The LOGBIG option supports the BASE suboption (BASE > 1), which you can use to ensure that raising a number to a power does not overflow. For example, constant('logbig', 2) returns 1024 because 2**1024 is the largest power of 2 that does not exceed BIG.
• The SQRTBIG option tells you whether you can square a number without overflowing.

### The smallest representable double

What is the smallest positive floating-point number on your computer? It is given by small = constant('small'). There are also LOGSMALL and SQRTSMALL versions, which you can use to prevent overflow. I don't use these constants as frequently as their 'BIG' counterparts. In my experience, underflow is usually not a problem in SAS.

### Summary

This article discusses five constants that every statistical programmer must know: PI, MACEPS, EXACTINT, BIG, and SMALL. Whether you need mathematical constants (such as π) will depend on the programs that you write. The MACEPS constant is used to compare floating-point numbers. The other constants are used by computer scientists and numerical analysts to ensure that programs can correctly compute with very large (or very small) numbers without encountering floating-point overflow (or underflow).

The post Five constants every statistical programmer should know appeared first on The DO Loop.

A previous article showed how to use SAS to compute finite-difference derivatives of smooth vector-valued multivariate functions. The article uses the NLPFDD subroutine in SAS/IML to compute the finite-difference derivatives. The article states that the third output argument of the NLPFDD subroutine "contains the matrix product J*J, where J is the Jacobian matrix. This product is used in some numerical methods, such as the Gauss-Newton algorithm, to minimize the value of a vector-valued function."

This article expands on that statement and shows that you can use the SAS/IML matrix language to implement the Gauss-Newton method by writing only a few lines of code.

The NLPFDD subroutine has been in SAS/IML software since SAS version 6. In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer finite-difference function that has a syntax similar to the NLPFDD subroutine.

### A review of least-squares minimization

Statisticians often use least-squares minimization in the context of linear or nonlinear least-squares (LS) regression. In LS regression, you have an observed set of response values {Y1, Y2, ..., Yn}, and you fit a parametric model to obtain a set of predicted values {P1, P2, ..., Pn}. These predictions depend on parameters, and you try to find the parameters that minimize the sum of the squared residuals, || Pi - Yi ||2 for i=1, 2, ..., n.

In this formula, the parameters are not explicitly written, but this formulation is the least-squares minimization of a function that maps each parameter to the sum of squared residuals. Abstractly, if F: Rn → Rm is a smooth vector-valued function with m components, then you can write it as F(x) = (F1(x), F2(x), ..., Fm(x)). In most applications, the dimension of the domain is less than the dimension of the range: n < m. The goal of least-square minimization is to find a value of x that minimizes the sum of the squared components of F(x). In vector form, we search for a value of x that (locally) minimizes || F(x) ||2. Equivalently, the problem is often formulated so that you minimize (1/2) || F(x) ||2. The factor of 1/2 simplifies the expression for the derivative of the objective function.

### Visualize a least-squares function

The previous article used the following vector-valued function:

proc iml;   q = {0, 2, 1}; /* define a point in R^3 */ start VecF(st) global(q); /* compute vector from q to a point on a cylinder */ s = st[1]; t = st[2]; F = j(3, 1, .); F[1] = cos(s) - q[1]; F[2] = sin(s) - q[2]; F[3] = t - q[3]; return( F ); finish;

Geometrically, this function represents the vector from q = (0, 2, 1) to a point on a cylinder in R3. Therefore, the vector-valued function is minimized when (s,t) = (pi/2, 1), since that is the point on the cylinder closest to the point q, which is not on the cylinder.

Let's visualize the function (1/2) || F(s,t) ||2 for (s,t) values near (pi/2. 1). You can use the EXPANDGRID function to create a uniform grid of points in the (s,t) domain. For each point on the grid, you can use the following statements to compute (one-half) the sum of the squares of the vector-valued function. To compute the sum of squared components, you can use the SSQ function or the NORM function, but I have decided to use the inner-product F*F.

/* create heat map of VecF */ s = do(1, 2.1, 0.02); /* horiz grid points */ t = do(0.5, 1.5, 0.02); /* vert grid points */ st = expandgrid(s, t); z = j(nrow(st), 1); do i = 1 to nrow(st); F = VecF( st[i,] ); z[i] = 0.5 * F*F; /* 1/2 sum of squares of components */ end;   result = st||z; create Heat from result[c={'s' 't' 'z'}]; append from result; close;   submit; title "1/2 || F(x,y) ||^2"; proc sgplot data=Heat; heatmapparm x=s y=t colorresponse=z / colormodel=ThreeColorRamp; refline 1.571/axis=x label='pi/2' labelpos=min; refline 1/axis=y; run; endsubmit;

The graph shows the values of z = (1/2) ||F(s,t)||2 on a uniform grid. The reference lines intersect at (pi/2, 1), which is the minimum value of z.

### The Gauss-Newton method for minimizing least-squares problems

One way to solve a least-squares minimization is to expand the expression (1/2) ||F(s,t)||2 in terms of the component functions. You get a scalar function of (s,t), so you can use a traditional optimization method, such as the Newton-Raphson method, which you can call by using the NLPNRA subroutine in SAS/IML. Alternatively, you can use the NLPLM subroutine to minimize the least-squares problem directly by using the vector-valued function. In this section, I show a third method: using matrix operations in SAS/IML to implement a basic Gauss-Newton method from first principles.

The Gauss-Newton method is an iterative method that does not require using any second derivatives. It begins with an initial guess, then modifies the guess by using information in the Jacobian matrix. Since each row in the Jacobian matrix is a gradient of a component function, the Gauss-Newton method is similar to a gradient descent method for a scalar-valued function. It tries to move "downhill" towards a local minimum.

Rather than derive the formula from first principles, I will merely present the Gauss-Newton algorithm, which you can find in Wikipedia and in course notes for numerical analysis (Cornell, 2017):

1. Make an initial guess for the minimizer, x0. Let x = x0 be the current guess.
2. Evaluate the vector FF(x) and the Jacobian J ≡ [ ∂Fi / ∂xj ] for i=1,2,..,m and j=1,2...,n. The NLPFDD subroutine in SAS/IML enables you to evaluate the function and approximate the Jacobian matrix with a single call.
3. Find a direction that decreases || F(x) || by solving the linear system (JJ) h = -J*F for h.
4. Take a step in the direction of h and update the guess: xnew = x + α h for some α in (0, 1).
5. Go to Step 2 until the problem converges.

For any point x in the domain of F, the NLPFDD subroutine returns three results: F, J, and the matrix product J*J. These are exactly the quantities that are needed to perform the Gauss-Newton method! The following statements implement the algorithm for an initial guess x0 = (2, 0.5):

/* Gauss-Newton line-search method for minimizing || VecF(x) || */ x0 = {2 0.5}; /* initial guess for (s,t) */ parm = {3, 00, .}; /* function has 3 components, use forward difference */ nSteps = 10; /* take 10 steps regardless of convergence */ alpha = 0.6; /* step size for line search */   path = j(nSteps+1, 2); /* save and plot the approximations */ path[1,] = x0; /* start the Gauss-Newton method */ x = x0; do i = 1 to nSteps; call nlpfdd(F, J, JtJ, "VecF", x) par=parm; /* get F, J, and JJ at x */ h = solve(JtJ, -J*F); /* h points downhill */ x = x + alpha*h; /* update the current guess */ path[i+1,] = x; /* remember this point */ end;   /* write the iterates to a data set */ create path from path[c={'px' 'py'}]; append from path; close;   submit alpha; data All; set Heat path; run;   title "Path of Gauss-Newton Minimization"; title2 "alpha = &alpha"; proc sgplot data=All noautolegend; heatmapparm x=s y=t colorresponse=z / colormodel=ThreeColorRamp; refline 1.571/axis=x label='pi/2' labelpos=min; refline 1/axis=y; series x=px y=py / markers; run; endsubmit;

The graph overlays the path of the Gauss-Newton approximations on the heat map of (1/2) || F(s,t) ||2. For each iterate, you can see that the search direction appears to be "downhill." A more sophisticated algorithm would monitor the norm of h to determine when to step. You can also use different values of the step length, α, or even vary the step length as you approach the minimum. I leave those interesting variations as an exercise.

The main point I want to emphasize is that the NLPFDD subroutine returns all the information you need to implement the Gauss-Newton method. Several times, I have been asked why the NLPFDD subroutine returns the matrix J*J, which seems like a strange quantity. Now you know that J*J is one of the terms in the Gauss-Newton method and other similar least-squares optimization methods.

### Verify the Gauss-Newton Solution

As I mentioned earlier, SAS/IML software supports two subroutines for solving least-squares minimization problems. The NLPHQN subroutine is a hybrid quasi-Newton algorithm that is similar to (but more sophisticated than) the Gauss-Newton method. Let's use the NLPHQN subroutine to solve the same problem and compare the solutions:

/* solve same problem by using NLPHQN (hybrid quasi-Newton method) */ optn = {3 1}; /* tell NLPQN that VecF has three components; print some partial results */ call NLPHQN(rc, xSoln, "VecF", x0) OPT=optn; print (path[11,])[c={'s' 't'} L="Soln from Gauss-Newton"], xSoln[c={'s' 't'} L="Soln from NLPHQN"];

The solution from the hybrid quasi-Newton method is very close the Gauss-Newton solution. If your goal is to solve a least-squares minimization, use the NLPHQN (or NLPLM) subroutine. But you might want to implement your own minimization method for special problems or for educational purposes.

### Summary

• For a vector-valued function, the NLPFDD subroutine evaluates the function and uses finite-difference derivatives to approximate the Jacobian (J) at a point in the domain. The subroutine also returns the matrix product J*J.
• The three quantities from the NLPFDD subroutine are exactly the three quantities needed to implement the Gauss-Newton method. By using the NLPFDD subroutine and the matrix language, you can implement the Gauss-Newton method in a few lines of code.
• The answer from the Gauss-Newton method is very similar to the answer from calling a built-in least-squares solver, such as the NLPHQN subroutine.

The post Least-squares optimization and the Gauss-Newton method appeared first on The DO Loop.

I previously showed how to use SAS to compute finite-difference derivatives for smooth scalar-valued functions of several variables. You can use the NLPFDD subroutine in SAS/IML software to approximate the gradient vector (first derivatives) and the Hessian matrix (second derivatives). The computation uses finite-difference derivatives to approximate the derivatives.

The NLPFDD subroutine also supports approximating the first derivatives a smooth vector-valued function. If F: Rn → Rm is a vector-valued function with m components, then you can write it as F(x) = (F1(x), F2(x), ..., Fm(x)). The Jacobian matrix, J, is the matrix whose (i,j)th entry is J[i,j] = ∂Fi / ∂xj. That is, the i_th row is the gradient of the i_th component function, Fi, i=1,2,..,m and j=1,2...,n. The NLPFDD subroutine enables you to approximate the Jacobian matrix at any point in the domain of F.

### An example of a vector-valued function

A simple example of a vector-valued function is the parameterization of a cylinder, which is a mapping from R2 → R3 given by the function
C(s,t) = (cos(s), sin(s), t)
For any specified values of (s,t), the vector C(s,t) is the vector from the origin to a point on the cylinder. It is sometimes useful to consider basing the vectors at a point q, in which case the function F(s,t) = C(s,t) - q is the vector from q to a point on the cylinder. You can define this function in SAS/IML by using the following statements. You can then use the NLPFDD subroutine to compute the Jacobian matrix of the function at any value of (s,t), as follows:

/* Example of a vector-valued function f: R^n -> R^m for n=2 and m=3. Map from (s,t) in [0, 2*pi) x R to the cylinder in R^3. The vector is from q = (0, 2, 1) to F(s,t) */ proc iml;   q = {0, 2, 1}; /* define a point in R^3 */ /* vector from q to a point on a cylinder */ start VecF(st) global(q); s = st[1]; t = st[2]; F = j(3, 1, .); F[1] = cos(s) - q[1]; F[2] = sin(s) - q[2]; F[3] = t - q[3]; return( F ); finish;   /* || VecF || has local min at (s,t) = (pi/2, 1), so let's evaluate derivatives at (pi/2, 1) */ pi = constant('pi'); x = pi/2 || 1; /* value of (s,t) at which to evaluate derivatives */ /* Parameters: the function has 3 components, use forward difference, choose step according to scale of F */ parm = {3, 00, .}; call nlpfdd(f, J, JtJ, "VecF", x) par=parm; /* return f(x), J=DF(x), and J*J */   print f[r={'F_1' 'F_2' 'F_3'} L='F(pi/2, 1)'], J[r={'F_1' 'F_2' 'F_3'} c={'d/ds' 'd/dt'} L='DF(pi/2, 1) [Fwd Diff]'];

The program evaluates the function and the Jacobian at the point (s,t)=(π/2, 1). You must use the PAR= keyword when the function is vector-valued. The PAR= argument is a three-element vector that contains the following information. If you specify a missing value, the default value is used.

• The first element is the number of components (m) for the function. By default, the NLPFDD subroutine expects a scalar-valued function (m=1), so you must specify PARM[1]=3 for this problem.
• The second element is the finite-difference method. The function supports several schemes, but I always use either 0 (forward difference) or 1 (central difference). By default, PARM[2]=0.
• The third element provides a way to choose the step size for the finite-difference method. I recommend that you use the default value.

The output shows that the value of the function at (s,t)=(π/2, 1) is {0, -1, 0}. The Jacobian at that point is shown. The gradient of the i_th component function (Fi) is contained in the i_th row of the Jacobian matrix.

• The gradient of the first component function at (π/2, 1) is {-1 0}.
• The gradient of the second component function at (π/2, 1) is {0 0}, which means that F2 is at a local extremum. (In this case, F2 is at a minimum).
• The gradient of the third component function is {0 1}.

The NLPFDD subroutine also returns a third result, which is named JtJ in the program. It contains the matrix product J*J, where J is the Jacobian matrix. This product is used in some numerical methods, such as the Gauss-Newton algorithm, to minimize the value of a vector-valued function. Unless you are writing your own algorithm to perform a least-squares minimization, you probably won't need the third output argument.

### Summary

The NLPFDD subroutine in SAS/IML uses finite-difference derivatives to approximate the derivatives of multivariate functions. A previous article shows how to use the NLPFDD subroutine to approximate the gradient and Hessian of a scalar-valued function. This article shows how to approximate the Jacobian of a vector-valued function. Numerical approximations for derivatives are useful in all sorts of numerical routines, including optimization, root-finding, and solving nonlinear least-squares problems.

In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer implementation that has a similar syntax.

The post Finite-difference derivatives of vector-valued functions appeared first on The DO Loop.

Many applications in mathematics and statistics require the numerical computation of the derivatives of smooth multivariate functions. For simple algebraic and trigonometric functions, you often can write down expressions for the first and second partial derivatives. However, for complicated functions, the formulas can get unwieldy (and some applications do not have explicit analytical derivatives). In those cases, numerical analysts use finite-difference methods to approximate the partial derivatives of a function at a point of its domain. This article is about how to compute first-order and second-order finite-difference derivatives for smooth functions in SAS.

### Formulas for finite-difference derivatives

There are many ways to approximate the derivative of a function at a point, but the most common formulas are for the forward-difference method and the central-difference method. In SAS/IML software, you can use the NLPFDD subroutine to compute a finite-difference derivative (FDD).

This article shows how to compute finite-difference derivatives for a smooth scalar-valued function f: Rn → R at a point x0 in its domain. The first derivatives are the elements of the gradient vector, grad(f). The i_th element of the gradient vector is ∂f / ∂xi for i = 1, 2, ..., n. The second derivatives are the elements of the Hessian matrix, Hes(f). The (i,j)th element of the Hessian matrix is ∂2f / ∂xixj for i,j = 1, 2, ..., n. For both cases, the derivatives are evaluated at x0, so grad(f)(x0) is a numerical row vector and Hes(f)(x0) is a numerical symmetric matrix.

For this article, I will use the following cubic function of two variables:
$f(x,y) = x^3 - y^2 - x + 0.5$
A heat map of the function is shown to the right. The function has two critical points at which the gradient is the zero vector: a local maximum at (-1/sqrt(3), 0) and a saddle point at (+1/sqrt(3), 0). The formula for the gradient is grad(f) = [ $3 x^2 - 1$, $-2 y$] and the elements of the Hessian matrix are H[1,1] = 6 x, H[1,2] = H[2,1] = 0, and H[2,2] = -2. You can evaluate these formulas at a specified point to obtain the analytical derivatives at that point.

### Finite-difference derivatives in SAS

In SAS, you can use the NLPFDD subroutine in PROC IML to compute finite difference derivatives. (In SAS Viya, you can use the FDDSOLVE subroutine, which is a newer implementation.) By default, the NLPFDD subroutine uses forward-difference formulas to approximate the derivatives. The subroutine returns three output arguments: the value of the function, gradient, and Hessian, respectively, evaluated at a specified point. The following SAS/IML program defines the cubic function and calls the NLPFDD subroutine to approximate the derivatives:

proc iml; start Func(xy); x = xy[,1]; y = xy[,2]; z = x##3 - y##2 - x + 0.5; return( z ); finish;   /* Function has local max at (x_max, 0) where x_max = -1/sqrt(3) = -0.57735 */ x_max = -1/sqrt(3) || 0; call nlpfdd(f_max, grad_max, Hes_max, "Func", x_max);   reset fuzz=2e-6; /* print 0 for any number less than FUZZ */ print f_max, grad_max, Hes_max;

The output shows the value of the function and its derivatives at x0 = (-1/sqrt(3), 0), which is a local maximum. The value of the function is 0.88. The gradient at the local maximum is the zero vector. The Hessian is a diagonal matrix. The forward-difference derivatives differ from the corresponding analytical values by 2E-6 or less.

### Forward- versus central-difference derivatives

You can evaluate the derivatives at any point, not merely at the critical points. For example, you can evaluate the derivatives of the function at x1 = (-1, 0.5), which is not an extremum:

x1 = {-1 0.5}; call nlpfdd(FwdD_f, FwdD_g, FwdD_H, "Func", x1); print FwdD_g, FwdD_H;

By default, the NLPFDD subroutine uses a forward-difference method to approximate the derivatives. You can also use a central difference scheme. The documentation describes six different ways to approximate derivatives, but the most common way is to use the objective function to choose a step size and to use either a forward or central difference. You can use the PAR= keyword to specify the method: PAR[2] = 0 specifies forward difference and PAR[2]=1 specifies central difference:

parm = {., 00, .}; /* forward diff based on objective function (default) */ call nlpfdd(FwdD_f, FwdD_g, FwdD_H, "Func", x1) par=parm; parm = {., 01, .}; /* central diff based on objective function */ call nlpfdd(CenD_f, CenD_g, CenD_H, "Func", x1) par=parm;   grad_results = (FwdD_g // CenD_g); print grad_results[r={"Fwd FD" "Cen FD"} c={"df/dx" "df/dy"}]; HesDiff = FwdD_H - CenD_H; reset fuzz=2e-16; /* print small numbers */ print HesDiff;

The first table shows the results for the forward-difference method (first row) and the central-difference method (second row). The Hessian matrices are very similar. The output shows that the maximum difference between the Hessians is about 4.5E-7. In general, the forward-difference method is faster but less accurate. The central-difference method is slower but more accurate.

### Summary

This article shows how to use the NLPFDD subroutine in SAS/IML to approximate the first-order and second-order partial derivatives of a multivariate function at any point in its domain. The subroutine uses finite-difference methods to approximate the derivatives. You can use a forward-difference method (which is fast but less accurate) or a central-difference method (which is slower but more accurate). The derivatives are useful in many applications, including classifying critical points as minima, maxima, and saddle points.

The post Finite-difference derivatives in SAS appeared first on The DO Loop.