## Introduction

This note is a brief introduction to the stan_glmer() function in rstanarm which allows you to fit Bayesian hierarchical models. For the most up-to-date information on hierarchical models in rstanarm consult the vignettes here and here.

You can consider using a hierarchical model if you believe that there is some sort of nested group structure in your data. For example, let $$\mathbf{y}$$ and $$\mathbf{X}$$ refer to the pooled outcome and predictor of your model, respectively. Now suppose that you can identify $$J$$ groups within your model so that $$\{\mathbf{y}_j, \mathbf{X}_j\}$$ refers to the data associated with the $$j$$th group. You have several options in terms of modeling, and we will discuss them below, after simulating the data.

## Simulating data

Below we simulate the data that will be used in the examples. For simplicity we consider only three groups (i.e. $$J = 3$$) and one predictor.

# inverse logit function
invlogit <- function(x) {
return(exp(x)/(1+exp(x)))
}
# coefficients generated from prior distributions
b10 <- rnorm(1,0,1)  # group 1 intercept
b11 <- rnorm(1,1,1)  # group 1 slope

b20 <- rnorm(1,0,1)  # group 2 intercept
b21 <- rnorm(1,1,1)  # group 2 slope

b30 <- rnorm(1,0,1)  # group 2 intercept
b31 <- rnorm(1,1,1)  # group 2 slope
# for ease of comparison post-estimation
parameters <- rbind("(Intercept)" = c(b10, b20, b30),
"x1" = c(b11, b21, b31))
colnames(parameters) <- c("model1", "model2", "model3")
# generate data
n <- 900
X <- data.frame(matrix(c(c(rep(1,n/3),rep(2,n/3),rep(3,n/3)),rnorm(n,0,1)),ncol = 2))
names(X) <- c("group","x1")
y1 <- invlogit(b10+b11*X[which(X$group==1),2]) y2 <- invlogit(b20+b21*X[which(X$group==2),2])
y3 <- invlogit(b30+b31*X[which(X\$group==3),2])
y <- c(y1,y2,y3)
dat <- data.frame(cbind(rbinom(n,1,y),X))
names(dat)[1] <- "y"
head(dat)
##   y group         x1
## 1 0     1 -0.1278102
## 2 1     1 -0.6042008
## 3 1     1  0.7835122
## 4 1     1  1.3770432
## 5 1     1  0.0153970
## 6 0     1 -0.8372943

## Complete Pooling and No Pooling

If we completely pool the data then we are not taking into account the different groups. So we are modeling, \begin{aligned} \mathbf{y} &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_0 + \mathbf{x}\beta]) \\ &\mbox{... include priors} \end{aligned}

The problem here is that we are not accounting for group-specific variability. (However, if there is no identifiable group structure then this may be an appropriate model.) In other words, our model has low variability and high bias.

If there is no pooling of the data then we are modeling each group in separately. This means that we will have $$J$$ separate regression equations, \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_1\beta_1]) \\ \mathbf{y}_2 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_2\beta_2]) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_J\beta_J]) \\ &\mbox{... include priors} \end{aligned}

The problem here is that we are not allowing the information used in one group to explain the model of another group. Here our model has high variability and low bias.

Using the data generated above we can estimate these two models using stan_glm(). The completely pooled model would be,

fit1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
coef(fit1)
## (Intercept)          x1
##   0.5553091   0.4919180

The model (or models) with no pooling would be,

fit2_1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 1, cores = 4, iter = 500)
fit2_2 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 2, cores = 4, iter = 500)
fit2_3 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 3, cores = 4, iter = 500)
# compare parameter estimates
cbind("model1" = coef(fit2_1),
"model2" = coef(fit2_2),
"model3" = coef(fit2_3))
##               model1    model2     model3
## (Intercept) 0.902855  1.415597 -0.1270562
## x1          1.902605 -0.510234  0.4742643
parameters
##                model1     model2     model3
## (Intercept) 0.9832633  1.5129626 -0.2624571
## x1          1.7452563 -0.2348312  0.5261211

## Hierarchical Models

Bayesian hierarchical models provide an intermediate solution to the two extremes above. Most importantly, these hierarchical models allow you to model group specific behavior while allowing interactions to exist across the groups.

First a quick summary of the formula syntax for stan_glmer models:

• Recall that an intercept is included in the model by default so including -1 anywhere in the formula will drop the intercept.
• The formula will consist of the standard expression you are familiar with in stan_glm (e.g. y ~ x1 + x2 ...), but now we have the option to nest observations. We denote a nested linear predictor within parentheses and a vertical bar: (|). The formula for the linear predictor will be to the left of the vertical bar and the group structure will be identified to the right of the vertical bar.
• Consider the following formula: y ~ x1 + (x1 - 1| group)
1. x1 says to model an overall intercept and a slope parameter for the variable x1
2. (x1 - 1| group) says that within each group model a slope parameter that feeds into the overall slope paremeter in (1).

### Varying Intercept

Here we want the intercept for each group to vary but the slopes will remain constant across the groups. Our model might look like, \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}\beta]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}\beta]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}\beta]\bigg) \\ &\mbox{... include priors} \end{aligned}

where $$\beta_{0j}$$ refers to the intercept of the $$j$$th group and $$\mu_0$$ is the overall intercept.

We can fit the model using stan_glmer() as follows,

fit3 <- stan_glmer(y ~ x1 + (1 | group), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
summary(fit3)
##
## Model Info:
##
##  function:  stan_glmer
##  family:    binomial [logit]
##  formula:   y ~ x1 + (1 | group)
##  algorithm: sampling
##  priors:    see help('prior_summary')
##  sample:    1000 (posterior sample size)
##  num obs:   900
##  groups:    group (3)
##
## Estimates:
##                                        mean   sd     2.5%   25%    50%
## (Intercept)                             0.6    0.6   -0.7    0.3    0.6
## x1                                      0.5    0.1    0.4    0.4    0.5
## b[(Intercept) group:1]                  0.0    0.6   -1.4   -0.3    0.0
## b[(Intercept) group:2]                  0.7    0.6   -0.6    0.4    0.7
## b[(Intercept) group:3]                 -0.7    0.6   -2.1   -1.0   -0.7
## Sigma[group:(Intercept),(Intercept)]    1.4    2.0    0.2    0.4    0.8
## mean_PPD                                0.6    0.0    0.6    0.6    0.6
## log-posterior                        -548.3    1.8 -552.8 -549.3 -547.9
##                                        75%    97.5%
## (Intercept)                             0.9    1.9
## x1                                      0.6    0.6
## b[(Intercept) group:1]                  0.3    1.3
## b[(Intercept) group:2]                  1.0    2.1
## b[(Intercept) group:3]                 -0.4    0.7
## Sigma[group:(Intercept),(Intercept)]    1.5    6.5
## mean_PPD                                0.6    0.7
## log-posterior                        -546.9 -545.6
##
## Diagnostics:
##                                      mcse Rhat n_eff
## (Intercept)                          0.0  1.0   261
## x1                                   0.0  1.0   782
## b[(Intercept) group:1]               0.0  1.0   263
## b[(Intercept) group:2]               0.0  1.0   270
## b[(Intercept) group:3]               0.0  1.0   275
## Sigma[group:(Intercept),(Intercept)] 0.1  1.0   302
## mean_PPD                             0.0  1.0  1000
## log-posterior                        0.1  1.0   295
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

We should take a look at the traceplots to confirm that our chains have mixed well.

plot(fit3, plotfun = "trace")