Markov Chain Monte Carlo

10月 042011

Rounding off our reports on major new developments in SAS 9.3, today we'll talk about proc mcmc and the random statement.

Stand-alone packages for fitting very general Bayesian models using Markov chain Monte Carlo (MCMC) methods have been available for quite some time now. The best known of these are BUGS and its derivatives WinBUGS (last updated in 2007) and OpenBUGS . There are also some packages available that call these tools from R.

Today we'll consider a relatively simple model: Clustered Poisson data where cluster means are a constant plus a cluster-specific exponentially-distributed random effect. To be clear:
y_ij ~ Poisson(mu_i)
log(mu_i) = B_0 + r_i
r_i ~ Exponential(lambda)
Of course in Bayesian thinking all effects are random-- here we use the term in the sense of cluster-specific effects.

Several SAS procedures have a bayes statement that allow some specific models to be fit. For example, in Section 6.6 and example 8.17, we show Bayesian Poisson and logistic regression, respectively, using proc genmod. But our example today is a little unusual, and we could not find a canned procedure for it. For these more general problems, SAS has proc mcmc, which in SAS 9.3 allows random effects to be easily modeled.

We begin by generating the data, and fitting the naive (unclustered) model. We set B_0 = 1 and lambda = 0.4. There are 200 clusters of 10 observations each, which we might imagine represent 10 students from each of 200 classrooms.

data test2;
truebeta0 = 1;
randscale = .4;
call streaminit(1944);
do i = 1 to 200;
randint = rand("EXPONENTIAL") * randscale;
do ni = 1 to 10;
mu = exp(truebeta0 + randint);
y = rand("POISSON", mu);

proc genmod data = test2;
model y = / dist=poisson;

Standard Wald 95%
Parameter Estimate Error Confidence Limits

Intercept 1.4983 0.0106 1.4776 1.5190

Note the inelegant SAS syntax for fitting an intercept-only model. The result is pretty awful-- 50% bias with respect to the global mean. Perhaps we'll do better by acknowledging the clustering. We might try that with normally distributed random effects in proc glimmix.

proc glimmix data = test2 method=laplace;
class i;
model y = / dist = poisson solution;
random int / subject = i type = un;

Cov Standard
Parm Subject Estimate Error
UN(1,1) i 0.1682 0.01841

Effect Estimate Error t Value Pr > |t|
Intercept 1.3805 0.03124 44.20 <.0001

No joy-- still a 40% bias in the estimated mean. And the variance of the random effects is biased by more than 50%! Let's try fitting the model that generated the data.

proc mcmc data=test2 nmc=10000 thin=10 seed=2011;
parms fixedint 1 gscale 0.4;

prior fixedint ~ normal(0, var = 10000);
prior gscale ~ igamma(.01 , scale = .01 ) ;

random rint ~ gamma(shape=1, scale=gscale) subject = i initial=0.0001;
mu = exp(fixedint + rint);
model y ~ poisson(mu);

The key points of the proc mcmc statement are nmc, the total number of Monte Carlo iterations to perform, and thin, which includes only every nth sample for inference. The prior and model statements are fairly obvious; we note that in more complex models, parameters that are listed within a single prior statement are sampled as a block. We're placing priors on the fixed (shared) intercept and the scale of the exponential. The mu line is actually just a programming statement-- it uses the same syntax as data step programming.
The newly available statement is random. The syntax here is similar to those for the other priors, with the addition of the subject option, which generates a unique parameter for each level of the subject variable. The random effects themselves can be used in later statements, as shown, to enter into data distributions. A final note here is that the exponential distribution isn't explicitly available, but since the gamma distribution with shape fixed at 1 defines the exponential, this is not a problem. Here are the key results.

Posterior Summaries

Parameter N Mean Deviation
fixedint 1000 1.0346 0.0244
gscale 1000 0.3541 0.0314

Posterior Intervals

Parameter Alpha HPD Interval
fixedint 0.050 0.9834 1.0791
gscale 0.050 0.2937 0.4163

The 95% HPD regions include the true values of the parameters and the posterior means are much less biased than in the model assuming normal random effects.

As usual, MCMC models should be evaluated carefully for convergence and coverage. In this example, I have some concerns (see default diagnostic figure above) and if it were real data I would want to do more.

The CRAN task view on Bayesian Inference includes a summary of tools for general and model-specific MCMC tools. However, there is nothing like proc mcmc in terms of being a general and easy to use tool that is native to R. The nearest options are to use R front ends to WinBUGS/OpenBUGS (R2WinBUGS) or JAGS (rjags). (A brief worked example of using rjags was posted last year by John Myles White.) Alternatively, with some math and a little sweat, the mcmc package would also work. We'll explore an approach through one or more of these packages in a later entry, and would welcome a collaboration from anyone who would like to take that on.
12月 132010
In recent weeks, we've explored methods to fit logistic regression models when a state of quasi-complete separation exists. We considered Firth's penalized likelihood approach, exact logistic regression, and Bayesian models using Markov chain Monte Carlo (MCMC).

Today we'll show how to build a Monte Carlo experiment to compare these approaches. Suppose we have 100 observations with x=0 and 100 with x=1, and suppose that the Pr(Y=1|X=0) = 0.001, while the Pr(Y=1|X=1) = 0.05. Thus the true odds ratio is (0.05/0.95)/(0.001/0.999) = 52.8 and the log odds ratio we want to find is 3.96. But we will rarely observe any y=1 when x=0. Which of these approaches is most likely to give us acceptable results?

Note that in all of the MCMC analyses we use only 6000 iterations, which is likely too few to trust in practice.

The code is long enough here that we annotate within rather than write much text around the code.


All the SAS procedures used accept the events/trials syntax (section 4.1.1), so we'll generate example data sets as two observations of binomial random variates with the probabilities noted above. We also make extensive use of the ODS system to suppress all printed output (section A.7.1) and to save desired pieces of output as SAS data sets. The latter usage requires first using the ods trace on/listing statement to find the name of the output before saving it. Finally, we use the by statement (section A.6.2) to replicate the analysis for each simulated data set.

data rlog;
do trial = 1 to 100;
/* each "trial" is a simulated data set with two observations
containing the observed number of events with x=0 or x=1 */
x=0; events = ranbin(0,100,.001); n=100; output;
x=1; events = ranbin(0,100,.05); n=100; output;

ods select none; /* omit _all_ printed output */

ods output parameterestimates=glm; /* save the estimated betas */
proc logist data = rlog;
by trial;
model events / n=x; /* ordinary logistic regression */

ods output parameterestimates=firth; /* save the estimated betas */
/* note the output data set has the same name
as in the uncorrected glm */
proc logist data = rlog;
by trial;
model events / n = x / firth; /* do the firth bias correction */

ods output exactparmest=exact;
/* the exact estimates have a different name under ODS */
proc logist data=rlog;
by trial;
model events / n = x;
exact x / estimate; /* do the exact estimation */

data prior;
input _type_ $ Intercept x;
Var 25 25
Mean 0 0

ods output postsummaries=mcmc;
proc genmod data = rlog;
by trial;
model events / n = x / dist=bin;
bayes nbi=1000 nmc=6000
coeffprior=normal(input=prior) diagnostics=none
/* do the Bayes regression, using the prior made in the
previous data step */

Now I have four data sets with parameter estimates in them. I could use them separately, but I'd like to merge them together. I can do this with the merge statement (section 1.5.7) in a data step. I also need to drop the lines with the estimated intercepts and rename the variables that hold the parameter estimates. The latter is necessary because the names are duplicated across the output data sets and desirable in that it allows names that are meaningful. In any event, I can use the where and rename data set options to include these modifications as I do the merge. I'll also add the number of events when x=0 and when x=1, which requires merging in the original data twice.

data lregsep;
glm (where = (variable = "x") rename = (estimate = glm))
firth (where = (variable = "x") rename = (estimate = firth))
exact (rename = (estimate = exact))
mcmc (where = (parameter = "x") rename = (mean=mcmc))
rlog (where = (x = 1) rename = (events = events1))
rlog (where = (x = 0) rename = (events = events0));
by trial;

ods select all; /* now I want to see the output! */
/* check to make sure the output dataset looks right */
proc print data = lregsep (obs = 5) ;
var trial glm firth exact mcmc;

/* what do the estimates look like? */
proc means data=lregsep;
var glm firth exact mcmc;

With the following output.

Obs trial glm firth exact mcmc

1 1 12.7866 2.7803 2.3186 3.9635
2 2 12.8287 3.1494 2.7223 4.0304
3 3 10.7192 1.6296 0.8885 2.5613
4 4 11.7458 2.2378 1.6906 3.3409
5 5 10.7192 1.6296 0.8885 2.5115

Variable Mean Std Dev
glm 10.6971252 3.4362801
firth 2.2666700 0.5716097
exact 1.8237047 0.5646224
mcmc 3.1388274 0.9620103

The ordinary logistic estimates are entirely implausible, while the three alternate approaches are more acceptable. The MCMC result has the least bias, but it's unclear to what degree this is a happy coincidence between the odds ratio and the prior precision. The Firth approach appears to be less biased than the exact logistic regression

The R version is roughly analogous to the SAS version. The notable differences are that 1) I want the "weights" version of the data (see example 8.15) for the glm() and logistf() functions and need the events/trials syntax for the elrm() function and the expanded (one row per observation) version for the MCMClogit() funtion. The sapply() function (section B.5.3) serves a similar function to the by statement in SAS. Finally, rather than spelunking through the ods trace output to find the parameter estimates, I used the str() function (section 1.3.2) to figure out where they are stored in the output objects and indexes (rather than data set options) to pull out the one estimate I need.

# make sure the needed packages are present
# the runlogist() function generates a dataset and runs each analysis
# the parameter "trial" keeps track of which time we're calling runlogist()
runlogist = function(trial) {
# the result vector will hold the estimates temporarily
result = matrix(0,4)
# generate the number of events once
events.0 =rbinom(1,100, .001) # for x = 0
events.1 = rbinom(1,100, .05) # for x = 1
# following for glm and logistf "weights" format
xw = c(0,0,1,1)
yw = c(0,1,0,1)
ww = c(100 - events.0, events.0, 100 - events.1,events.1)
# run the glm and logistf, grab the estimates, and stick
# them into the results vector
result[1] =
glm(yw ~ xw, weights=ww, binomial)$coefficients[2]
result[2] = logistf(yw ~ xw, weights=ww)$coefficients[2]
# elrm() needs a data frame in the events/trials syntax
elrmdata = data.frame(events=c(events.0,events.1), x =c(0,1),
trials = c(100,100))
# run it and grab the estimate
result[3]=elrm(events/trials ~ x, interest = ~ x, iter = 6000,
burnIn = 1000, data = elrmdata, r = 2)$coeffs
# MCMClogit() needs expanded data
x = c(rep(0,100), rep(1,100))
y = c(rep(0,100-events.0), rep(1,events.0),
rep(0, 100-events.1), rep(1, events.1))
# run it and grab the mean of the MCMC posteriors
result[4] = summary(MCMClogit(y~as.factor(x), burnin=1000,
mcmc=6000, b0=0, B0=.04,
seed = list(c(781306, 78632467, 364981736, 6545634, 7654654,
# send back the four estimates, plus the number of events
# when x=0 and x=1
return(c(trial, events.0, events.1, result))

Note the construction of the seed= option to the MCMClogit() function. This allows a different seed in every call without actually using sequential seeds.

Now we're ready to call the function repeatedly. We'll do that with the sapply() function, but we need to nest that inside a t() function call to get the estimates to appear as columns rather than rows, and we'll also make it a data frame in the same command. Note that the parameters we change within the sapply() function are merely a list of trial numbers. Finally, we'll add descriptive names for the columns with the names() function (section 1.3.4).

res2 =, runlogist)))
names(res2) <- c("trial","events.0","events.1", "glm",
"firth", "exact-ish", "MCMC")
mean(res2[,4:7], na.rm=TRUE)

trial events.0 events.1 glm firth exact-ish MCMC
1 1 0 6 18.559624 2.6265073 2.6269087 3.643560
2 2 1 3 1.119021 0.8676031 1.1822296 1.036173
3 3 0 5 18.366720 2.4489268 2.1308186 3.555314
4 4 0 5 18.366720 2.4489268 2.0452446 3.513743
5 5 0 2 17.419339 1.6295391 0.9021854 2.629160
6 6 0 9 17.997524 3.0382577 2.1573979 4.017105

glm firth exact-ish MCMC
17.333356 2.278344 1.813203 3.268243

The results are notably similar to SAS, except for the unacceptable glm() results.

In most Monte Carlo experimental settings, one would also be interested in examining the confidence limits for the parameter estimates. Notes and code for doing this can be found here. In a later entry we'll consider plots for the results generated above. As a final note, there are few combinations of event numbers with any mass worth considering. One could compute the probability of each of these and the associated parameter estimates, deriving a more analytic answer to the question. However, this would be difficult to replicate for arbitrary event probabilities and Ns, and very awkward for continuous covariates, while the above approach could be extended with trivial ease.
12月 072010

In examples 8.15 and 8.16 we considered Firth logistic regression and exact logistic regression as ways around the problem of separation, often encountered in logistic regression. (Re-cap: Separation happens when all the observations in a category share a result, or when a continuous covariate predicts the outcome too well. It results in a likelihood maximized when a parameter is extremely large, and causes trouble with ordinary maximum likelihood approached.) Another option is to use Bayesian methods. Here we focus on Markov chain Monte Carlo (MCMC) approaches to Bayesian analysis.


SAS access to MCMC for logistic regression is provided through the bayes statement in proc genmod. There are several default priors available. The normal prior is the most flexible (in the software), allowing different prior means and variances for the regression parameters. The prior is specified through a separate data set. We begin by setting up the data in the events/trials syntax. Then we define a fairly vague prior for the intercept and the effect of the covariate: uncorrelated, and each with a mean of zero and a variance of 1000 (or a precision of 0.001). Finally, we call proc genmod to implement the analysis.

data testmcmc;
x=0; count=0; n=100; output;
x=1; count=5; n=100; output;

data prior;
input _type_ $ Intercept x;
Var 1000 1000
Mean 0 0

title "Bayes with normal prior";
proc genmod descending data=testmcmc;
model count/n = x / dist=binomial link=logit;
bayes seed=10231995 nbi=1000 nmc=21000
coeffprior=normal(input=prior) diagnostics=all

In the forgoing, nbi is the length of the burn-in and nmc is the total number of Monte Carlo iterations. The remaining options define the prior and request certain output. The diagnostics=all option generates many results, including posterior autocorrelations, Gelman-Rubin, Geweke, Raftery-Lewis, and Heidelberger-Welch diagnostics. The summary statistics are presented below; the diagnostics are not especially encouraging.

Posterior Summaries

Parameter N Mean Deviation

Intercept 21000 -20.3301 10.3277
x 21000 17.2857 10.3368

Posterior Summaries

Parameter 25% 50% 75%

Intercept -27.6173 -18.5558 -11.9025
x 8.8534 15.5267 24.6024

It seems that perhaps this prior is too vague. Perhaps we can make it a little more precise. A log odds ratio of 10 implies an odds ratio > 22,000, so perhaps we can accept a prior variance of 25, with about 95% of the prior weight between -10 and 10.

data prior;
input _type_ $ Intercept x;
Var 25 25
Mean 0 0

ods graphics on;
title "Bayes with normal prior";
proc genmod descending data=testmcmc;
model count/n = x / dist=binomial link=logit;
bayes seed=10231995 nbi=1000 nmc=21000
coeffprior=normal(input=prior) diagnostics=all
statistics=(summary interval) plot=all;

Posterior Summaries

Parameter N Mean Deviation

Intercept 21000 -6.5924 1.7958
x 21000 3.5169 1.8431

Posterior Summaries

Parameter 25% 50% 75%

Intercept -7.6347 -6.3150 -5.2802
x 2.1790 3.2684 4.5929

Posterior Intervals

Parameter Alpha Equal-Tail Interval HPD Interval

Intercept 0.050 -10.8101 -3.8560 -10.2652 -3.5788
x 0.050 0.5981 7.7935 0.3997 7.4201

These are more plausible values, and the diagnostics are more favorable.
In the above, we added the keyword interval to generate confidence regions, and used the ods graphics on statement to enable ODS graphics and the plot=all option to generate the graphical output shown above.

There are several packages in R that include MCMC approaches. Here we use the MCMCpack package, which include the MCMClogit() function. It appears not to accept the weights option mentioned previously, so we generate data at the observation level to begin. Then we run the MCMC.

events.0=0 # for X = 0
events.1=5 # for X = 1
x = c(rep(0,100), rep(1,100))
y = c(rep(0,100-events.0), rep(1,events.0),
rep(0, 100-events.1), rep(1, events.1))

logmcmc = MCMClogit(y~as.factor(x), burnin=1000, mcmc=21000, b0=0, B0=.04)

The MCMClogit() accepts a formula object and allows the burn-in and number of Monte Carlo iterations to be specified. The prior mean b0 can be specified as a vector if different for each parameter, or as a scalar, as show. Similarly, the prior precision B0 can be a matrix or a scalar; if scalar, the parameters are uncorrelated in the prior.

> summary(logmcmc)

Iterations = 1001:22000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 21000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean SD Naive SE Time-series SE
(Intercept) -6.570 1.816 0.01253 0.03139
as.factor(x)1 3.513 1.859 0.01283 0.03363

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%
(Intercept) -10.6591 -7.634 -6.299 -5.229 -3.831
as.factor(x)1 0.6399 2.147 3.292 4.599 7.698


The result of the plot() is shown below. These results and those from SAS are reassuringly similar. Many diagnostics are available through the coda package. The codamenu() function allows simple menu-based access to its tools.