This note is a brief introduction to the stan_glmer()
function in rstanarm which allows you to fit Bayesian hierarchical models. For the most up-to-date information on hierarchical models in rstanarm consult the vignettes here and here.
You can consider using a hierarchical model if you believe that there is some sort of nested group structure in your data. For example, let \(\mathbf{y}\) and \(\mathbf{X}\) refer to the pooled outcome and predictor of your model, respectively. Now suppose that you can identify \(J\) groups within your model so that \(\{\mathbf{y}_j, \mathbf{X}_j\}\) refers to the data associated with the \(j\)th group. You have several options in terms of modeling, and we will discuss them below, after simulating the data.
Below we simulate the data that will be used in the examples. For simplicity we consider only three groups (i.e. \(J = 3\)) and one predictor.
# inverse logit function
invlogit <- function(x) {
return(exp(x)/(1+exp(x)))
}
# coefficients generated from prior distributions
b10 <- rnorm(1,0,1) # group 1 intercept
b11 <- rnorm(1,1,1) # group 1 slope
b20 <- rnorm(1,0,1) # group 2 intercept
b21 <- rnorm(1,1,1) # group 2 slope
b30 <- rnorm(1,0,1) # group 2 intercept
b31 <- rnorm(1,1,1) # group 2 slope
# for ease of comparison post-estimation
parameters <- rbind("(Intercept)" = c(b10, b20, b30),
"x1" = c(b11, b21, b31))
colnames(parameters) <- c("model1", "model2", "model3")
# generate data
n <- 900
X <- data.frame(matrix(c(c(rep(1,n/3),rep(2,n/3),rep(3,n/3)),rnorm(n,0,1)),ncol = 2))
names(X) <- c("group","x1")
y1 <- invlogit(b10+b11*X[which(X$group==1),2])
y2 <- invlogit(b20+b21*X[which(X$group==2),2])
y3 <- invlogit(b30+b31*X[which(X$group==3),2])
y <- c(y1,y2,y3)
dat <- data.frame(cbind(rbinom(n,1,y),X))
names(dat)[1] <- "y"
head(dat)
## y group x1
## 1 0 1 -0.1278102
## 2 1 1 -0.6042008
## 3 1 1 0.7835122
## 4 1 1 1.3770432
## 5 1 1 0.0153970
## 6 0 1 -0.8372943
If we completely pool the data then we are not taking into account the different groups. So we are modeling, \[ \begin{aligned} \mathbf{y} &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_0 + \mathbf{x}\beta]) \\ &\mbox{... include priors} \end{aligned} \]
The problem here is that we are not accounting for group-specific variability. (However, if there is no identifiable group structure then this may be an appropriate model.) In other words, our model has low variability and high bias.
If there is no pooling of the data then we are modeling each group in separately. This means that we will have \(J\) separate regression equations, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_1\beta_1]) \\ \mathbf{y}_2 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_2\beta_2]) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_J\beta_J]) \\ &\mbox{... include priors} \end{aligned} \]
The problem here is that we are not allowing the information used in one group to explain the model of another group. Here our model has high variability and low bias.
Using the data generated above we can estimate these two models using stan_glm()
. The completely pooled model would be,
fit1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
coef(fit1)
## (Intercept) x1
## 0.5553091 0.4919180
The model (or models) with no pooling would be,
fit2_1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 1, cores = 4, iter = 500)
fit2_2 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 2, cores = 4, iter = 500)
fit2_3 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 3, cores = 4, iter = 500)
# compare parameter estimates
cbind("model1" = coef(fit2_1),
"model2" = coef(fit2_2),
"model3" = coef(fit2_3))
## model1 model2 model3
## (Intercept) 0.902855 1.415597 -0.1270562
## x1 1.902605 -0.510234 0.4742643
parameters
## model1 model2 model3
## (Intercept) 0.9832633 1.5129626 -0.2624571
## x1 1.7452563 -0.2348312 0.5261211
Bayesian hierarchical models provide an intermediate solution to the two extremes above. Most importantly, these hierarchical models allow you to model group specific behavior while allowing interactions to exist across the groups.
First a quick summary of the formula syntax for stan_glmer
models:
-1
anywhere in the formula
will drop the intercept.stan_glm
(e.g. y ~ x1 + x2 ...
), but now we have the option to nest observations. We denote a nested linear predictor within parentheses and a vertical bar: (|)
. The formula for the linear predictor will be to the left of the vertical bar and the group structure will be identified to the right of the vertical bar.y ~ x1 + (x1 - 1| group)
x1
says to model an overall intercept and a slope parameter for the variable x1
(x1 - 1| group)
says that within each group model a slope parameter that feeds into the overall slope paremeter in (1).Here we want the intercept for each group to vary but the slopes will remain constant across the groups. Our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}\beta]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}\beta]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}\beta]\bigg) \\ &\mbox{... include priors} \end{aligned} \]
where \(\beta_{0j}\) refers to the intercept of the \(j\)th group and \(\mu_0\) is the overall intercept.
We can fit the model using stan_glmer()
as follows,
fit3 <- stan_glmer(y ~ x1 + (1 | group), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
summary(fit3)
##
## Model Info:
##
## function: stan_glmer
## family: binomial [logit]
## formula: y ~ x1 + (1 | group)
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 1000 (posterior sample size)
## num obs: 900
## groups: group (3)
##
## Estimates:
## mean sd 2.5% 25% 50%
## (Intercept) 0.6 0.6 -0.7 0.3 0.6
## x1 0.5 0.1 0.4 0.4 0.5
## b[(Intercept) group:1] 0.0 0.6 -1.4 -0.3 0.0
## b[(Intercept) group:2] 0.7 0.6 -0.6 0.4 0.7
## b[(Intercept) group:3] -0.7 0.6 -2.1 -1.0 -0.7
## Sigma[group:(Intercept),(Intercept)] 1.4 2.0 0.2 0.4 0.8
## mean_PPD 0.6 0.0 0.6 0.6 0.6
## log-posterior -548.3 1.8 -552.8 -549.3 -547.9
## 75% 97.5%
## (Intercept) 0.9 1.9
## x1 0.6 0.6
## b[(Intercept) group:1] 0.3 1.3
## b[(Intercept) group:2] 1.0 2.1
## b[(Intercept) group:3] -0.4 0.7
## Sigma[group:(Intercept),(Intercept)] 1.5 6.5
## mean_PPD 0.6 0.7
## log-posterior -546.9 -545.6
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 261
## x1 0.0 1.0 782
## b[(Intercept) group:1] 0.0 1.0 263
## b[(Intercept) group:2] 0.0 1.0 270
## b[(Intercept) group:3] 0.0 1.0 275
## Sigma[group:(Intercept),(Intercept)] 0.1 1.0 302
## mean_PPD 0.0 1.0 1000
## log-posterior 0.1 1.0 295
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
We should take a look at the traceplots to confirm that our chains have mixed well.
plot(fit3, plotfun = "trace")