Introduction

This note is a brief introduction to the stan_glmer() function in rstanarm which allows you to fit Bayesian hierarchical models. For the most up-to-date information on hierarchical models in rstanarm consult the vignettes here and here.

You can consider using a hierarchical model if you believe that there is some sort of nested group structure in your data. For example, let \(\mathbf{y}\) and \(\mathbf{X}\) refer to the pooled outcome and predictor of your model, respectively. Now suppose that you can identify \(J\) groups within your model so that \(\{\mathbf{y}_j, \mathbf{X}_j\}\) refers to the data associated with the \(j\)th group. You have several options in terms of modeling, and we will discuss them below, after simulating the data.

Simulating data

Below we simulate the data that will be used in the examples. For simplicity we consider only three groups (i.e. \(J = 3\)) and one predictor.

# inverse logit function
invlogit <- function(x) {
  return(exp(x)/(1+exp(x)))
}
# coefficients generated from prior distributions
b10 <- rnorm(1,0,1)  # group 1 intercept
b11 <- rnorm(1,1,1)  # group 1 slope

b20 <- rnorm(1,0,1)  # group 2 intercept
b21 <- rnorm(1,1,1)  # group 2 slope

b30 <- rnorm(1,0,1)  # group 2 intercept
b31 <- rnorm(1,1,1)  # group 2 slope
# for ease of comparison post-estimation
parameters <- rbind("(Intercept)" = c(b10, b20, b30),
                    "x1" = c(b11, b21, b31))
colnames(parameters) <- c("model1", "model2", "model3")
# generate data
n <- 900
X <- data.frame(matrix(c(c(rep(1,n/3),rep(2,n/3),rep(3,n/3)),rnorm(n,0,1)),ncol = 2))
names(X) <- c("group","x1")
y1 <- invlogit(b10+b11*X[which(X$group==1),2])
y2 <- invlogit(b20+b21*X[which(X$group==2),2])
y3 <- invlogit(b30+b31*X[which(X$group==3),2])
y <- c(y1,y2,y3)
dat <- data.frame(cbind(rbinom(n,1,y),X))
names(dat)[1] <- "y"
head(dat)
##   y group         x1
## 1 0     1 -0.1278102
## 2 1     1 -0.6042008
## 3 1     1  0.7835122
## 4 1     1  1.3770432
## 5 1     1  0.0153970
## 6 0     1 -0.8372943

Complete Pooling and No Pooling

If we completely pool the data then we are not taking into account the different groups. So we are modeling, \[ \begin{aligned} \mathbf{y} &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_0 + \mathbf{x}\beta]) \\ &\mbox{... include priors} \end{aligned} \]

The problem here is that we are not accounting for group-specific variability. (However, if there is no identifiable group structure then this may be an appropriate model.) In other words, our model has low variability and high bias.

If there is no pooling of the data then we are modeling each group in separately. This means that we will have \(J\) separate regression equations, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_1\beta_1]) \\ \mathbf{y}_2 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_2\beta_2]) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_J\beta_J]) \\ &\mbox{... include priors} \end{aligned} \]

The problem here is that we are not allowing the information used in one group to explain the model of another group. Here our model has high variability and low bias.

Using the data generated above we can estimate these two models using stan_glm(). The completely pooled model would be,

fit1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
                 cores = 4, iter = 500)
coef(fit1)
## (Intercept)          x1 
##   0.5553091   0.4919180

The model (or models) with no pooling would be,

fit2_1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
                   subset = group == 1, cores = 4, iter = 500)
fit2_2 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
                   subset = group == 2, cores = 4, iter = 500)
fit2_3 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
                   subset = group == 3, cores = 4, iter = 500)
# compare parameter estimates
cbind("model1" = coef(fit2_1),
      "model2" = coef(fit2_2),
      "model3" = coef(fit2_3))
##               model1    model2     model3
## (Intercept) 0.902855  1.415597 -0.1270562
## x1          1.902605 -0.510234  0.4742643
parameters
##                model1     model2     model3
## (Intercept) 0.9832633  1.5129626 -0.2624571
## x1          1.7452563 -0.2348312  0.5261211

Hierarchical Models

Bayesian hierarchical models provide an intermediate solution to the two extremes above. Most importantly, these hierarchical models allow you to model group specific behavior while allowing interactions to exist across the groups.

First a quick summary of the formula syntax for stan_glmer models:

Varying Intercept

Here we want the intercept for each group to vary but the slopes will remain constant across the groups. Our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}\beta]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}\beta]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}\beta]\bigg) \\ &\mbox{... include priors} \end{aligned} \]

where \(\beta_{0j}\) refers to the intercept of the \(j\)th group and \(\mu_0\) is the overall intercept.

We can fit the model using stan_glmer() as follows,

fit3 <- stan_glmer(y ~ x1 + (1 | group), data = dat, family = binomial(link = "logit"),
                   cores = 4, iter = 500)
summary(fit3)
## 
## Model Info:
## 
##  function:  stan_glmer
##  family:    binomial [logit]
##  formula:   y ~ x1 + (1 | group)
##  algorithm: sampling
##  priors:    see help('prior_summary')
##  sample:    1000 (posterior sample size)
##  num obs:   900
##  groups:    group (3)
## 
## Estimates:
##                                        mean   sd     2.5%   25%    50% 
## (Intercept)                             0.6    0.6   -0.7    0.3    0.6
## x1                                      0.5    0.1    0.4    0.4    0.5
## b[(Intercept) group:1]                  0.0    0.6   -1.4   -0.3    0.0
## b[(Intercept) group:2]                  0.7    0.6   -0.6    0.4    0.7
## b[(Intercept) group:3]                 -0.7    0.6   -2.1   -1.0   -0.7
## Sigma[group:(Intercept),(Intercept)]    1.4    2.0    0.2    0.4    0.8
## mean_PPD                                0.6    0.0    0.6    0.6    0.6
## log-posterior                        -548.3    1.8 -552.8 -549.3 -547.9
##                                        75%    97.5%
## (Intercept)                             0.9    1.9 
## x1                                      0.6    0.6 
## b[(Intercept) group:1]                  0.3    1.3 
## b[(Intercept) group:2]                  1.0    2.1 
## b[(Intercept) group:3]                 -0.4    0.7 
## Sigma[group:(Intercept),(Intercept)]    1.5    6.5 
## mean_PPD                                0.6    0.7 
## log-posterior                        -546.9 -545.6 
## 
## Diagnostics:
##                                      mcse Rhat n_eff
## (Intercept)                          0.0  1.0   261 
## x1                                   0.0  1.0   782 
## b[(Intercept) group:1]               0.0  1.0   263 
## b[(Intercept) group:2]               0.0  1.0   270 
## b[(Intercept) group:3]               0.0  1.0   275 
## Sigma[group:(Intercept),(Intercept)] 0.1  1.0   302 
## mean_PPD                             0.0  1.0  1000 
## log-posterior                        0.1  1.0   295 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

We should take a look at the traceplots to confirm that our chains have mixed well.

plot(fit3, plotfun = "trace")

We can have a look at the group-specific deviations (ranef()) from the overall parameter values (fixef()).

ranef(fit3)
## $group
##   (Intercept)
## 1  0.03368735
## 2  0.73108443
## 3 -0.69122956
fixef(fit3)
## (Intercept)          x1 
##   0.5733216   0.4971398

We can also look at the group-specific parameters (coef())

coef(fit3)
## $group
##   (Intercept)        x1
## 1   0.6070090 0.4971398
## 2   1.3044061 0.4971398
## 3  -0.1179079 0.4971398
## 
## attr(,"class")
## [1] "coef.mer"

Note that coef() is the group-wise sum of ranef() and fixef()

ranef(fit3)$group["(Intercept)"] + fixef(fit3)["(Intercept)"]
##   (Intercept)
## 1   0.6070090
## 2   1.3044061
## 3  -0.1179079

Varying Slope

Here we want the slope to vary for each group while keeping the group-specific intercepts constant. Our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0} + \mathbf{x}_{1}\beta_{1}]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0} + \mathbf{x}_{2}\beta_{2}]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0} + \mathbf{x}_{J}\beta_{J}]\bigg) \\ &\mbox{... include priors} \end{aligned} \]

where \(\mu\) is the overall slope parameter.

We can fit the model using stan_glmer() as follows,

fit4 <- stan_glmer(y ~ x1 + (x1 - 1 | group), data = dat, family = binomial(link = "logit"),
                   cores = 4, iter = 500)
summary(fit4)
## 
## Model Info:
## 
##  function:  stan_glmer
##  family:    binomial [logit]
##  formula:   y ~ x1 + (x1 - 1 | group)
##  algorithm: sampling
##  priors:    see help('prior_summary')
##  sample:    1000 (posterior sample size)
##  num obs:   900
##  groups:    group (3)
## 
## Estimates:
##                      mean   sd     2.5%   25%    50%    75%    97.5%
## (Intercept)           0.6    0.1    0.5    0.6    0.6    0.7    0.8 
## x1                    0.7    0.7   -0.9    0.3    0.7    1.1    2.0 
## b[x1 group:1]         1.1    0.7   -0.2    0.6    1.1    1.5    2.6 
## b[x1 group:2]        -1.0    0.7   -2.4   -1.4   -1.0   -0.6    0.5 
## b[x1 group:3]        -0.1    0.7   -1.5   -0.5   -0.1    0.3    1.4 
## Sigma[group:x1,x1]    1.7    1.8    0.3    0.6    1.1    2.1    6.7 
## mean_PPD              0.6    0.0    0.6    0.6    0.6    0.6    0.7 
## log-posterior      -532.5    2.0 -537.2 -533.7 -532.2 -531.0 -529.6 
## 
## Diagnostics:
##                    mcse Rhat n_eff
## (Intercept)        0.0  1.0  878  
## x1                 0.0  1.0  262  
## b[x1 group:1]      0.0  1.0  250  
## b[x1 group:2]      0.0  1.0  268  
## b[x1 group:3]      0.0  1.0  268  
## Sigma[group:x1,x1] 0.1  1.0  438  
## mean_PPD           0.0  1.0  838  
## log-posterior      0.1  1.0  330  
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Examine the traceplots.

plot(fit4, plotfun = "trace")

Have a look at the overall and group-specific parameters

fixef(fit4)
## (Intercept)          x1 
##   0.6400170   0.6670047
coef(fit4)
## $group
##   (Intercept)         x1
## 1    0.640017  1.7361177
## 2    0.640017 -0.3376267
## 3    0.640017  0.5712161
## 
## attr(,"class")
## [1] "coef.mer"

Varying Intercept and Slope

Here we want the intercept and the slope parameters to vary among groups, so our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin} \bigg( 1\ , \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_{1}\beta_{1}] \bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin} \bigg( 1\ , \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_{2}\beta_{2}] \bigg) \\ &\vdots\\ \mathbf{y}_J &\sim \mbox{Bin} \bigg( 1\ , \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_{J}\beta_{J}] \bigg) \\ &\mbox{... include priors} \end{aligned} \]

We can fit the model using stan_glmer() as follows,

fit5 <- stan_glmer(y ~ x1 + (x1 | group), data = dat, family = binomial(link = "logit"),
                   cores = 4, iter = 500)

Examine the traceplots.

plot(fit5, plotfun = "trace")

Have a look at the overall and group-specific parameters

fixef(fit5)
## (Intercept)          x1 
##   0.7288699   0.6385118
coef(fit5)
## $group
##   (Intercept)         x1
## 1   0.8747940  1.8235750
## 2   1.3807621 -0.4841929
## 3  -0.1132491  0.4959946
## 
## attr(,"class")
## [1] "coef.mer"

Deeper Hierarchial Models

By no means are the following models deep, however, introducing a new grouping variable (despite being spurious) helps us identify more models that we can specify with stan_glmer().

First we indtroduce a second grouping variable group2 which includes two groups within each group of group.

dat$group2 <- c(rep(1,150),rep(2,150),rep(3,150),rep(4,150),rep(5,150),rep(6,150))
dat[140:160,]
##     y group          x1 group2
## 140 1     1  1.73661726      1
## 141 1     1 -0.25149303      1
## 142 1     1  1.54106624      1
## 143 0     1 -1.98128265      1
## 144 0     1  0.32841672      1
## 145 1     1  0.89554566      1
## 146 0     1 -1.15139924      1
## 147 1     1  1.86872587      1
## 148 0     1 -0.93532497      1
## 149 1     1  0.54238369      1
## 150 0     1  1.03696006      1
## 151 1     1  0.79495724      2
## 152 1     1 -0.06089017      2
## 153 1     1  0.40786948      2
## 154 0     1 -0.11964061      2
## 155 1     1  1.80909155      2
## 156 1     1  0.64158977      2
## 157 1     1  0.73768366      2
## 158 0     1  1.28069633      2
## 159 1     1  1.72004086      2
## 160 0     1 -1.07970858      2

In the model we fit below, the new term group:group2 is saying to estimate group2 specific parameters where the subgroups of group are group2.

Identifying the number of parameters we are estimating helps us understand the type of model we are fitting. We can decompose the formula in stan_glmer() below into three parts:

  1. x1
    • This is saying to estimate an overall intercept and slope (associated with the x1 variable).
    • \(2 \mbox{ parameters}\).
  2. (x1 | group)
    • This is saying to estimate an intercept and slope for each group of in group.
    • \(2 \mbox{ parameters} \cdot 3 \mbox{ groups} = 6 \mbox{ parameters total}\).
  3. (x1 | group:group2)
    • This is saying to estimate an intercept and slope for each group in group2 within group.
    • \(2 \mbox{ parameters} \cdot 3 \mbox{ groups} \cdot \mbox{ groups2} = 12 \mbox{ parameters total}\)

So we are estimating a total of \(2 + 6 + 12 = 20\) parameters.

fit6 <- stan_glmer(y ~ x1 + (x1 | group) + (x1 | group:group2), data = dat, family = binomial(link = "logit"),
                   cores = 4, iter = 500)
summary(fit6)
## 
## Model Info:
## 
##  function:  stan_glmer
##  family:    binomial [logit]
##  formula:   y ~ x1 + (x1 | group) + (x1 | group:group2)
##  algorithm: sampling
##  priors:    see help('prior_summary')
##  sample:    1000 (posterior sample size)
##  num obs:   900
##  groups:    group:group2 (6), group (3)
## 
## Estimates:
##                                               mean   sd     2.5%   25% 
## (Intercept)                                    0.7    0.8   -1.0    0.3
## x1                                             0.5    0.9   -1.5    0.1
## b[(Intercept) group:group2:1:1]                0.0    0.2   -0.3    0.0
## b[x1 group:group2:1:1]                         0.0    0.2   -0.5   -0.1
## b[(Intercept) group:group2:1:2]                0.0    0.2   -0.3    0.0
## b[x1 group:group2:1:2]                         0.1    0.2   -0.2    0.0
## b[(Intercept) group:group2:2:3]                0.0    0.2   -0.2    0.0
## b[x1 group:group2:2:3]                        -0.1    0.2   -0.5   -0.1
## b[(Intercept) group:group2:2:4]                0.0    0.2   -0.3    0.0
## b[x1 group:group2:2:4]                         0.0    0.2   -0.4    0.0
## b[(Intercept) group:group2:3:5]                0.0    0.1   -0.3    0.0
## b[x1 group:group2:3:5]                         0.1    0.2   -0.2    0.0
## b[(Intercept) group:group2:3:6]                0.0    0.2   -0.5   -0.1
## b[x1 group:group2:3:6]                        -0.1    0.2   -0.6   -0.1
## b[(Intercept) group:1]                         0.2    0.8   -1.1   -0.2
## b[x1 group:1]                                  1.3    1.0   -0.5    0.7
## b[(Intercept) group:2]                         0.7    0.8   -0.6    0.3
## b[x1 group:2]                                 -1.0    0.9   -3.0   -1.5
## b[(Intercept) group:3]                        -0.7    0.8   -2.1   -1.2
## b[x1 group:3]                                 -0.1    0.9   -1.9   -0.6
## Sigma[group:group2:(Intercept),(Intercept)]    0.0    0.1    0.0    0.0
## Sigma[group:group2:x1,(Intercept)]             0.0    0.1   -0.1    0.0
## Sigma[group:group2:x1,x1]                      0.1    0.1    0.0    0.0
## Sigma[group:(Intercept),(Intercept)]           1.7    2.6    0.2    0.5
## Sigma[group:x1,(Intercept)]                   -0.2    1.1   -2.5   -0.5
## Sigma[group:x1,x1]                             2.5    3.0    0.4    0.9
## mean_PPD                                       0.6    0.0    0.6    0.6
## log-posterior                               -534.0    4.2 -543.0 -536.6
##                                               50%    75%    97.5%
## (Intercept)                                    0.7    1.1    2.0 
## x1                                             0.6    1.1    2.4 
## b[(Intercept) group:group2:1:1]                0.0    0.1    0.3 
## b[x1 group:group2:1:1]                         0.0    0.0    0.4 
## b[(Intercept) group:group2:1:2]                0.0    0.0    0.3 
## b[x1 group:group2:1:2]                         0.0    0.1    0.7 
## b[(Intercept) group:group2:2:3]                0.0    0.1    0.5 
## b[x1 group:group2:2:3]                         0.0    0.0    0.3 
## b[(Intercept) group:group2:2:4]                0.0    0.0    0.4 
## b[x1 group:group2:2:4]                         0.0    0.1    0.4 
## b[(Intercept) group:group2:3:5]                0.0    0.0    0.3 
## b[x1 group:group2:3:5]                         0.0    0.1    0.5 
## b[(Intercept) group:group2:3:6]                0.0    0.0    0.2 
## b[x1 group:group2:3:6]                         0.0    0.0    0.2 
## b[(Intercept) group:1]                         0.2    0.6    1.8 
## b[x1 group:1]                                  1.2    1.7    3.4 
## b[(Intercept) group:2]                         0.7    1.1    2.4 
## b[x1 group:2]                                 -1.0   -0.5    0.9 
## b[(Intercept) group:3]                        -0.8   -0.4    0.9 
## b[x1 group:3]                                 -0.1    0.4    1.9 
## Sigma[group:group2:(Intercept),(Intercept)]    0.0    0.0    0.4 
## Sigma[group:group2:x1,(Intercept)]             0.0    0.0    0.1 
## Sigma[group:group2:x1,x1]                      0.0    0.0    0.5 
## Sigma[group:(Intercept),(Intercept)]           0.9    1.9    8.5 
## Sigma[group:x1,(Intercept)]                   -0.1    0.3    2.0 
## Sigma[group:x1,x1]                             1.5    2.9   10.1 
## mean_PPD                                       0.6    0.6    0.7 
## log-posterior                               -533.8 -531.0 -526.3 
## 
## Diagnostics:
##                                             mcse Rhat n_eff
## (Intercept)                                 0.0  1.0   253 
## x1                                          0.0  1.0   357 
## b[(Intercept) group:group2:1:1]             0.0  1.0   739 
## b[x1 group:group2:1:1]                      0.0  1.0   847 
## b[(Intercept) group:group2:1:2]             0.0  1.0   838 
## b[x1 group:group2:1:2]                      0.0  1.0   538 
## b[(Intercept) group:group2:2:3]             0.0  1.0   457 
## b[x1 group:group2:2:3]                      0.0  1.0   559 
## b[(Intercept) group:group2:2:4]             0.0  1.0   491 
## b[x1 group:group2:2:4]                      0.0  1.0   682 
## b[(Intercept) group:group2:3:5]             0.0  1.0   577 
## b[x1 group:group2:3:5]                      0.0  1.0   876 
## b[(Intercept) group:group2:3:6]             0.0  1.0   645 
## b[x1 group:group2:3:6]                      0.0  1.0  1000 
## b[(Intercept) group:1]                      0.0  1.0   250 
## b[x1 group:1]                               0.1  1.0   360 
## b[(Intercept) group:2]                      0.0  1.0   260 
## b[x1 group:2]                               0.0  1.0   384 
## b[(Intercept) group:3]                      0.1  1.0   251 
## b[x1 group:3]                               0.0  1.0   371 
## Sigma[group:group2:(Intercept),(Intercept)] 0.0  1.0   417 
## Sigma[group:group2:x1,(Intercept)]          0.0  1.0   578 
## Sigma[group:group2:x1,x1]                   0.0  1.0   493 
## Sigma[group:(Intercept),(Intercept)]        0.1  1.0   433 
## Sigma[group:x1,(Intercept)]                 0.0  1.0   745 
## Sigma[group:x1,x1]                          0.2  1.0   352 
## mean_PPD                                    0.0  1.0   948 
## log-posterior                               0.2  1.0   380 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
fixef(fit6)
## (Intercept)          x1 
##   0.7092476   0.5590312
ranef(fit6)
## $`group:group2`
##       (Intercept)           x1
## 1:1  0.0040864017 -0.004233194
## 1:2  0.0008784613  0.021137828
## 2:3  0.0037894002 -0.010469049
## 2:4 -0.0001487665  0.002474700
## 3:5  0.0002567095  0.021699918
## 3:6 -0.0119381073 -0.022397066
## 
## $group
##   (Intercept)          x1
## 1   0.1622064  1.24514980
## 2   0.6616946 -1.01873322
## 3  -0.7743640 -0.08026877
coef(fit6)
## $`group:group2`
##     (Intercept)        x1
## 1:1   0.7133340 0.5547980
## 1:2   0.7101260 0.5801690
## 2:3   0.7130370 0.5485622
## 2:4   0.7090988 0.5615059
## 3:5   0.7095043 0.5807311
## 3:6   0.6973095 0.5366342
## 
## $group
##   (Intercept)         x1
## 1   0.8714539  1.8041810
## 2   1.3709422 -0.4597020
## 3  -0.0651164  0.4787625
## 
## attr(,"class")
## [1] "coef.mer"