This note is a brief introduction to the `stan_glmer()`

function in rstanarm which allows you to fit Bayesian hierarchical models. For the most up-to-date information on hierarchical models in rstanarm consult the vignettes here and here.

You can consider using a hierarchical model if you believe that there is some sort of nested group structure in your data. For example, let \(\mathbf{y}\) and \(\mathbf{X}\) refer to the pooled outcome and predictor of your model, respectively. Now suppose that you can identify \(J\) groups within your model so that \(\{\mathbf{y}_j, \mathbf{X}_j\}\) refers to the data associated with the \(j\)th group. You have several options in terms of modeling, and we will discuss them below, after simulating the data.

Below we simulate the data that will be used in the examples. For simplicity we consider only three groups (i.e. \(J = 3\)) and one predictor.

```
# inverse logit function
invlogit <- function(x) {
return(exp(x)/(1+exp(x)))
}
# coefficients generated from prior distributions
b10 <- rnorm(1,0,1) # group 1 intercept
b11 <- rnorm(1,1,1) # group 1 slope
b20 <- rnorm(1,0,1) # group 2 intercept
b21 <- rnorm(1,1,1) # group 2 slope
b30 <- rnorm(1,0,1) # group 2 intercept
b31 <- rnorm(1,1,1) # group 2 slope
# for ease of comparison post-estimation
parameters <- rbind("(Intercept)" = c(b10, b20, b30),
"x1" = c(b11, b21, b31))
colnames(parameters) <- c("model1", "model2", "model3")
# generate data
n <- 900
X <- data.frame(matrix(c(c(rep(1,n/3),rep(2,n/3),rep(3,n/3)),rnorm(n,0,1)),ncol = 2))
names(X) <- c("group","x1")
y1 <- invlogit(b10+b11*X[which(X$group==1),2])
y2 <- invlogit(b20+b21*X[which(X$group==2),2])
y3 <- invlogit(b30+b31*X[which(X$group==3),2])
y <- c(y1,y2,y3)
dat <- data.frame(cbind(rbinom(n,1,y),X))
names(dat)[1] <- "y"
head(dat)
```

```
## y group x1
## 1 0 1 -0.1278102
## 2 1 1 -0.6042008
## 3 1 1 0.7835122
## 4 1 1 1.3770432
## 5 1 1 0.0153970
## 6 0 1 -0.8372943
```

If we **completely pool** the data then we are not taking into account the different groups. So we are modeling, \[
\begin{aligned}
\mathbf{y} &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_0 + \mathbf{x}\beta]) \\
&\mbox{... include priors}
\end{aligned}
\]

The problem here is that we are *not accounting for group-specific variability*. (However, if there is no identifiable group structure then this may be an appropriate model.) In other words, our model has low variability and high bias.

If there is **no pooling** of the data then we are modeling each group in separately. This means that we will have \(J\) separate regression equations, \[
\begin{aligned}
\mathbf{y}_1 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_1\beta_1]) \\
\mathbf{y}_2 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_2\beta_2]) \\
&\vdots \\
\mathbf{y}_J &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_J\beta_J]) \\
&\mbox{... include priors}
\end{aligned}
\]

The problem here is that we are *not allowing the information used in one group to explain the model of another group*. Here our model has high variability and low bias.

Using the data generated above we can estimate these two models using `stan_glm()`

. The completely pooled model would be,

```
fit1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
coef(fit1)
```

```
## (Intercept) x1
## 0.5553091 0.4919180
```

The model (or models) with no pooling would be,

```
fit2_1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 1, cores = 4, iter = 500)
fit2_2 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 2, cores = 4, iter = 500)
fit2_3 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 3, cores = 4, iter = 500)
# compare parameter estimates
cbind("model1" = coef(fit2_1),
"model2" = coef(fit2_2),
"model3" = coef(fit2_3))
```

```
## model1 model2 model3
## (Intercept) 0.902855 1.415597 -0.1270562
## x1 1.902605 -0.510234 0.4742643
```

`parameters`

```
## model1 model2 model3
## (Intercept) 0.9832633 1.5129626 -0.2624571
## x1 1.7452563 -0.2348312 0.5261211
```

Bayesian hierarchical models provide an intermediate solution to the two extremes above. Most importantly, *these hierarchical models allow you to model group specific behavior while allowing interactions to exist across the groups*.

First a quick summary of the formula syntax for `stan_glmer`

models:

- Recall that an intercept is included in the model by default so including
`-1`

anywhere in the`formula`

will drop the intercept. - The formula will consist of the standard expression you are familiar with in
`stan_glm`

(e.g.`y ~ x1 + x2 ...`

), but now we have the option to nest observations. We denote a nested linear predictor within parentheses and a vertical bar:`(|)`

. The formula for the linear predictor will be to the left of the vertical bar and the group structure will be identified to the right of the vertical bar. - Consider the following formula:
`y ~ x1 + (x1 - 1| group)`

`x1`

says to model an overall intercept and a slope parameter for the variable`x1`

`(x1 - 1| group)`

says that within each group model a slope parameter that feeds into the overall slope paremeter in (1).

Here we want the intercept for each group to vary but the slopes will remain constant across the groups. Our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}\beta]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}\beta]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}\beta]\bigg) \\ &\mbox{... include priors} \end{aligned} \]

where \(\beta_{0j}\) refers to the intercept of the \(j\)th group and \(\mu_0\) is the overall intercept.

We can fit the model using `stan_glmer()`

as follows,

```
fit3 <- stan_glmer(y ~ x1 + (1 | group), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
summary(fit3)
```

```
##
## Model Info:
##
## function: stan_glmer
## family: binomial [logit]
## formula: y ~ x1 + (1 | group)
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 1000 (posterior sample size)
## num obs: 900
## groups: group (3)
##
## Estimates:
## mean sd 2.5% 25% 50%
## (Intercept) 0.6 0.6 -0.7 0.3 0.6
## x1 0.5 0.1 0.4 0.4 0.5
## b[(Intercept) group:1] 0.0 0.6 -1.4 -0.3 0.0
## b[(Intercept) group:2] 0.7 0.6 -0.6 0.4 0.7
## b[(Intercept) group:3] -0.7 0.6 -2.1 -1.0 -0.7
## Sigma[group:(Intercept),(Intercept)] 1.4 2.0 0.2 0.4 0.8
## mean_PPD 0.6 0.0 0.6 0.6 0.6
## log-posterior -548.3 1.8 -552.8 -549.3 -547.9
## 75% 97.5%
## (Intercept) 0.9 1.9
## x1 0.6 0.6
## b[(Intercept) group:1] 0.3 1.3
## b[(Intercept) group:2] 1.0 2.1
## b[(Intercept) group:3] -0.4 0.7
## Sigma[group:(Intercept),(Intercept)] 1.5 6.5
## mean_PPD 0.6 0.7
## log-posterior -546.9 -545.6
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 261
## x1 0.0 1.0 782
## b[(Intercept) group:1] 0.0 1.0 263
## b[(Intercept) group:2] 0.0 1.0 270
## b[(Intercept) group:3] 0.0 1.0 275
## Sigma[group:(Intercept),(Intercept)] 0.1 1.0 302
## mean_PPD 0.0 1.0 1000
## log-posterior 0.1 1.0 295
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
```

We should take a look at the traceplots to confirm that our chains have mixed well.

`plot(fit3, plotfun = "trace")`