This note is a brief introduction to the stan_glmer()
function in rstanarm which allows you to fit Bayesian hierarchical models. For the most up-to-date information on hierarchical models in rstanarm consult the vignettes here and here.
You can consider using a hierarchical model if you believe that there is some sort of nested group structure in your data. For example, let \(\mathbf{y}\) and \(\mathbf{X}\) refer to the pooled outcome and predictor of your model, respectively. Now suppose that you can identify \(J\) groups within your model so that \(\{\mathbf{y}_j, \mathbf{X}_j\}\) refers to the data associated with the \(j\)th group. You have several options in terms of modeling, and we will discuss them below, after simulating the data.
Below we simulate the data that will be used in the examples. For simplicity we consider only three groups (i.e. \(J = 3\)) and one predictor.
# inverse logit function
invlogit <- function(x) {
return(exp(x)/(1+exp(x)))
}
# coefficients generated from prior distributions
b10 <- rnorm(1,0,1) # group 1 intercept
b11 <- rnorm(1,1,1) # group 1 slope
b20 <- rnorm(1,0,1) # group 2 intercept
b21 <- rnorm(1,1,1) # group 2 slope
b30 <- rnorm(1,0,1) # group 2 intercept
b31 <- rnorm(1,1,1) # group 2 slope
# for ease of comparison post-estimation
parameters <- rbind("(Intercept)" = c(b10, b20, b30),
"x1" = c(b11, b21, b31))
colnames(parameters) <- c("model1", "model2", "model3")
# generate data
n <- 900
X <- data.frame(matrix(c(c(rep(1,n/3),rep(2,n/3),rep(3,n/3)),rnorm(n,0,1)),ncol = 2))
names(X) <- c("group","x1")
y1 <- invlogit(b10+b11*X[which(X$group==1),2])
y2 <- invlogit(b20+b21*X[which(X$group==2),2])
y3 <- invlogit(b30+b31*X[which(X$group==3),2])
y <- c(y1,y2,y3)
dat <- data.frame(cbind(rbinom(n,1,y),X))
names(dat)[1] <- "y"
head(dat)
## y group x1
## 1 0 1 -0.1278102
## 2 1 1 -0.6042008
## 3 1 1 0.7835122
## 4 1 1 1.3770432
## 5 1 1 0.0153970
## 6 0 1 -0.8372943
If we completely pool the data then we are not taking into account the different groups. So we are modeling, \[ \begin{aligned} \mathbf{y} &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_0 + \mathbf{x}\beta]) \\ &\mbox{... include priors} \end{aligned} \]
The problem here is that we are not accounting for group-specific variability. (However, if there is no identifiable group structure then this may be an appropriate model.) In other words, our model has low variability and high bias.
If there is no pooling of the data then we are modeling each group in separately. This means that we will have \(J\) separate regression equations, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_1\beta_1]) \\ \mathbf{y}_2 &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_2\beta_2]) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_J\beta_J]) \\ &\mbox{... include priors} \end{aligned} \]
The problem here is that we are not allowing the information used in one group to explain the model of another group. Here our model has high variability and low bias.
Using the data generated above we can estimate these two models using stan_glm()
. The completely pooled model would be,
fit1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
coef(fit1)
## (Intercept) x1
## 0.5553091 0.4919180
The model (or models) with no pooling would be,
fit2_1 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 1, cores = 4, iter = 500)
fit2_2 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 2, cores = 4, iter = 500)
fit2_3 <- stan_glm(y ~ x1, data = dat, family = binomial(link = "logit"),
subset = group == 3, cores = 4, iter = 500)
# compare parameter estimates
cbind("model1" = coef(fit2_1),
"model2" = coef(fit2_2),
"model3" = coef(fit2_3))
## model1 model2 model3
## (Intercept) 0.902855 1.415597 -0.1270562
## x1 1.902605 -0.510234 0.4742643
parameters
## model1 model2 model3
## (Intercept) 0.9832633 1.5129626 -0.2624571
## x1 1.7452563 -0.2348312 0.5261211
Bayesian hierarchical models provide an intermediate solution to the two extremes above. Most importantly, these hierarchical models allow you to model group specific behavior while allowing interactions to exist across the groups.
First a quick summary of the formula syntax for stan_glmer
models:
-1
anywhere in the formula
will drop the intercept.stan_glm
(e.g. y ~ x1 + x2 ...
), but now we have the option to nest observations. We denote a nested linear predictor within parentheses and a vertical bar: (|)
. The formula for the linear predictor will be to the left of the vertical bar and the group structure will be identified to the right of the vertical bar.y ~ x1 + (x1 - 1| group)
x1
says to model an overall intercept and a slope parameter for the variable x1
(x1 - 1| group)
says that within each group model a slope parameter that feeds into the overall slope paremeter in (1).Here we want the intercept for each group to vary but the slopes will remain constant across the groups. Our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}\beta]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}\beta]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}\beta]\bigg) \\ &\mbox{... include priors} \end{aligned} \]
where \(\beta_{0j}\) refers to the intercept of the \(j\)th group and \(\mu_0\) is the overall intercept.
We can fit the model using stan_glmer()
as follows,
fit3 <- stan_glmer(y ~ x1 + (1 | group), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
summary(fit3)
##
## Model Info:
##
## function: stan_glmer
## family: binomial [logit]
## formula: y ~ x1 + (1 | group)
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 1000 (posterior sample size)
## num obs: 900
## groups: group (3)
##
## Estimates:
## mean sd 2.5% 25% 50%
## (Intercept) 0.6 0.6 -0.7 0.3 0.6
## x1 0.5 0.1 0.4 0.4 0.5
## b[(Intercept) group:1] 0.0 0.6 -1.4 -0.3 0.0
## b[(Intercept) group:2] 0.7 0.6 -0.6 0.4 0.7
## b[(Intercept) group:3] -0.7 0.6 -2.1 -1.0 -0.7
## Sigma[group:(Intercept),(Intercept)] 1.4 2.0 0.2 0.4 0.8
## mean_PPD 0.6 0.0 0.6 0.6 0.6
## log-posterior -548.3 1.8 -552.8 -549.3 -547.9
## 75% 97.5%
## (Intercept) 0.9 1.9
## x1 0.6 0.6
## b[(Intercept) group:1] 0.3 1.3
## b[(Intercept) group:2] 1.0 2.1
## b[(Intercept) group:3] -0.4 0.7
## Sigma[group:(Intercept),(Intercept)] 1.5 6.5
## mean_PPD 0.6 0.7
## log-posterior -546.9 -545.6
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 261
## x1 0.0 1.0 782
## b[(Intercept) group:1] 0.0 1.0 263
## b[(Intercept) group:2] 0.0 1.0 270
## b[(Intercept) group:3] 0.0 1.0 275
## Sigma[group:(Intercept),(Intercept)] 0.1 1.0 302
## mean_PPD 0.0 1.0 1000
## log-posterior 0.1 1.0 295
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
We should take a look at the traceplots to confirm that our chains have mixed well.
plot(fit3, plotfun = "trace")
We can have a look at the group-specific deviations (ranef()
) from the overall parameter values (fixef()
).
ranef(fit3)
## $group
## (Intercept)
## 1 0.03368735
## 2 0.73108443
## 3 -0.69122956
fixef(fit3)
## (Intercept) x1
## 0.5733216 0.4971398
We can also look at the group-specific parameters (coef()
)
coef(fit3)
## $group
## (Intercept) x1
## 1 0.6070090 0.4971398
## 2 1.3044061 0.4971398
## 3 -0.1179079 0.4971398
##
## attr(,"class")
## [1] "coef.mer"
Note that coef()
is the group-wise sum of ranef()
and fixef()
ranef(fit3)$group["(Intercept)"] + fixef(fit3)["(Intercept)"]
## (Intercept)
## 1 0.6070090
## 2 1.3044061
## 3 -0.1179079
Here we want the slope to vary for each group while keeping the group-specific intercepts constant. Our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0} + \mathbf{x}_{1}\beta_{1}]\bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0} + \mathbf{x}_{2}\beta_{2}]\bigg) \\ &\vdots \\ \mathbf{y}_J &\sim \mbox{Bin}\bigg(1,\ \mbox{logit}^{-1}[\beta_{0} + \mathbf{x}_{J}\beta_{J}]\bigg) \\ &\mbox{... include priors} \end{aligned} \]
where \(\mu\) is the overall slope parameter.
We can fit the model using stan_glmer()
as follows,
fit4 <- stan_glmer(y ~ x1 + (x1 - 1 | group), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
summary(fit4)
##
## Model Info:
##
## function: stan_glmer
## family: binomial [logit]
## formula: y ~ x1 + (x1 - 1 | group)
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 1000 (posterior sample size)
## num obs: 900
## groups: group (3)
##
## Estimates:
## mean sd 2.5% 25% 50% 75% 97.5%
## (Intercept) 0.6 0.1 0.5 0.6 0.6 0.7 0.8
## x1 0.7 0.7 -0.9 0.3 0.7 1.1 2.0
## b[x1 group:1] 1.1 0.7 -0.2 0.6 1.1 1.5 2.6
## b[x1 group:2] -1.0 0.7 -2.4 -1.4 -1.0 -0.6 0.5
## b[x1 group:3] -0.1 0.7 -1.5 -0.5 -0.1 0.3 1.4
## Sigma[group:x1,x1] 1.7 1.8 0.3 0.6 1.1 2.1 6.7
## mean_PPD 0.6 0.0 0.6 0.6 0.6 0.6 0.7
## log-posterior -532.5 2.0 -537.2 -533.7 -532.2 -531.0 -529.6
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 878
## x1 0.0 1.0 262
## b[x1 group:1] 0.0 1.0 250
## b[x1 group:2] 0.0 1.0 268
## b[x1 group:3] 0.0 1.0 268
## Sigma[group:x1,x1] 0.1 1.0 438
## mean_PPD 0.0 1.0 838
## log-posterior 0.1 1.0 330
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Examine the traceplots.
plot(fit4, plotfun = "trace")
Have a look at the overall and group-specific parameters
fixef(fit4)
## (Intercept) x1
## 0.6400170 0.6670047
coef(fit4)
## $group
## (Intercept) x1
## 1 0.640017 1.7361177
## 2 0.640017 -0.3376267
## 3 0.640017 0.5712161
##
## attr(,"class")
## [1] "coef.mer"
Here we want the intercept and the slope parameters to vary among groups, so our model might look like, \[ \begin{aligned} \mathbf{y}_1 &\sim \mbox{Bin} \bigg( 1\ , \mbox{logit}^{-1}[\beta_{0_1} + \mathbf{x}_{1}\beta_{1}] \bigg) \\ \mathbf{y}_2 &\sim \mbox{Bin} \bigg( 1\ , \mbox{logit}^{-1}[\beta_{0_2} + \mathbf{x}_{2}\beta_{2}] \bigg) \\ &\vdots\\ \mathbf{y}_J &\sim \mbox{Bin} \bigg( 1\ , \mbox{logit}^{-1}[\beta_{0_J} + \mathbf{x}_{J}\beta_{J}] \bigg) \\ &\mbox{... include priors} \end{aligned} \]
We can fit the model using stan_glmer()
as follows,
fit5 <- stan_glmer(y ~ x1 + (x1 | group), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
Examine the traceplots.
plot(fit5, plotfun = "trace")
Have a look at the overall and group-specific parameters
fixef(fit5)
## (Intercept) x1
## 0.7288699 0.6385118
coef(fit5)
## $group
## (Intercept) x1
## 1 0.8747940 1.8235750
## 2 1.3807621 -0.4841929
## 3 -0.1132491 0.4959946
##
## attr(,"class")
## [1] "coef.mer"
By no means are the following models deep, however, introducing a new grouping variable (despite being spurious) helps us identify more models that we can specify with stan_glmer()
.
First we indtroduce a second grouping variable group2
which includes two groups within each group of group
.
dat$group2 <- c(rep(1,150),rep(2,150),rep(3,150),rep(4,150),rep(5,150),rep(6,150))
dat[140:160,]
## y group x1 group2
## 140 1 1 1.73661726 1
## 141 1 1 -0.25149303 1
## 142 1 1 1.54106624 1
## 143 0 1 -1.98128265 1
## 144 0 1 0.32841672 1
## 145 1 1 0.89554566 1
## 146 0 1 -1.15139924 1
## 147 1 1 1.86872587 1
## 148 0 1 -0.93532497 1
## 149 1 1 0.54238369 1
## 150 0 1 1.03696006 1
## 151 1 1 0.79495724 2
## 152 1 1 -0.06089017 2
## 153 1 1 0.40786948 2
## 154 0 1 -0.11964061 2
## 155 1 1 1.80909155 2
## 156 1 1 0.64158977 2
## 157 1 1 0.73768366 2
## 158 0 1 1.28069633 2
## 159 1 1 1.72004086 2
## 160 0 1 -1.07970858 2
In the model we fit below, the new term group:group2
is saying to estimate group2
specific parameters where the subgroups of group
are group2
.
Identifying the number of parameters we are estimating helps us understand the type of model we are fitting. We can decompose the formula
in stan_glmer()
below into three parts:
x1
x1
variable).(x1 | group)
group
.(x1 | group:group2)
group2
within group
.So we are estimating a total of \(2 + 6 + 12 = 20\) parameters.
fit6 <- stan_glmer(y ~ x1 + (x1 | group) + (x1 | group:group2), data = dat, family = binomial(link = "logit"),
cores = 4, iter = 500)
summary(fit6)
##
## Model Info:
##
## function: stan_glmer
## family: binomial [logit]
## formula: y ~ x1 + (x1 | group) + (x1 | group:group2)
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 1000 (posterior sample size)
## num obs: 900
## groups: group:group2 (6), group (3)
##
## Estimates:
## mean sd 2.5% 25%
## (Intercept) 0.7 0.8 -1.0 0.3
## x1 0.5 0.9 -1.5 0.1
## b[(Intercept) group:group2:1:1] 0.0 0.2 -0.3 0.0
## b[x1 group:group2:1:1] 0.0 0.2 -0.5 -0.1
## b[(Intercept) group:group2:1:2] 0.0 0.2 -0.3 0.0
## b[x1 group:group2:1:2] 0.1 0.2 -0.2 0.0
## b[(Intercept) group:group2:2:3] 0.0 0.2 -0.2 0.0
## b[x1 group:group2:2:3] -0.1 0.2 -0.5 -0.1
## b[(Intercept) group:group2:2:4] 0.0 0.2 -0.3 0.0
## b[x1 group:group2:2:4] 0.0 0.2 -0.4 0.0
## b[(Intercept) group:group2:3:5] 0.0 0.1 -0.3 0.0
## b[x1 group:group2:3:5] 0.1 0.2 -0.2 0.0
## b[(Intercept) group:group2:3:6] 0.0 0.2 -0.5 -0.1
## b[x1 group:group2:3:6] -0.1 0.2 -0.6 -0.1
## b[(Intercept) group:1] 0.2 0.8 -1.1 -0.2
## b[x1 group:1] 1.3 1.0 -0.5 0.7
## b[(Intercept) group:2] 0.7 0.8 -0.6 0.3
## b[x1 group:2] -1.0 0.9 -3.0 -1.5
## b[(Intercept) group:3] -0.7 0.8 -2.1 -1.2
## b[x1 group:3] -0.1 0.9 -1.9 -0.6
## Sigma[group:group2:(Intercept),(Intercept)] 0.0 0.1 0.0 0.0
## Sigma[group:group2:x1,(Intercept)] 0.0 0.1 -0.1 0.0
## Sigma[group:group2:x1,x1] 0.1 0.1 0.0 0.0
## Sigma[group:(Intercept),(Intercept)] 1.7 2.6 0.2 0.5
## Sigma[group:x1,(Intercept)] -0.2 1.1 -2.5 -0.5
## Sigma[group:x1,x1] 2.5 3.0 0.4 0.9
## mean_PPD 0.6 0.0 0.6 0.6
## log-posterior -534.0 4.2 -543.0 -536.6
## 50% 75% 97.5%
## (Intercept) 0.7 1.1 2.0
## x1 0.6 1.1 2.4
## b[(Intercept) group:group2:1:1] 0.0 0.1 0.3
## b[x1 group:group2:1:1] 0.0 0.0 0.4
## b[(Intercept) group:group2:1:2] 0.0 0.0 0.3
## b[x1 group:group2:1:2] 0.0 0.1 0.7
## b[(Intercept) group:group2:2:3] 0.0 0.1 0.5
## b[x1 group:group2:2:3] 0.0 0.0 0.3
## b[(Intercept) group:group2:2:4] 0.0 0.0 0.4
## b[x1 group:group2:2:4] 0.0 0.1 0.4
## b[(Intercept) group:group2:3:5] 0.0 0.0 0.3
## b[x1 group:group2:3:5] 0.0 0.1 0.5
## b[(Intercept) group:group2:3:6] 0.0 0.0 0.2
## b[x1 group:group2:3:6] 0.0 0.0 0.2
## b[(Intercept) group:1] 0.2 0.6 1.8
## b[x1 group:1] 1.2 1.7 3.4
## b[(Intercept) group:2] 0.7 1.1 2.4
## b[x1 group:2] -1.0 -0.5 0.9
## b[(Intercept) group:3] -0.8 -0.4 0.9
## b[x1 group:3] -0.1 0.4 1.9
## Sigma[group:group2:(Intercept),(Intercept)] 0.0 0.0 0.4
## Sigma[group:group2:x1,(Intercept)] 0.0 0.0 0.1
## Sigma[group:group2:x1,x1] 0.0 0.0 0.5
## Sigma[group:(Intercept),(Intercept)] 0.9 1.9 8.5
## Sigma[group:x1,(Intercept)] -0.1 0.3 2.0
## Sigma[group:x1,x1] 1.5 2.9 10.1
## mean_PPD 0.6 0.6 0.7
## log-posterior -533.8 -531.0 -526.3
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 253
## x1 0.0 1.0 357
## b[(Intercept) group:group2:1:1] 0.0 1.0 739
## b[x1 group:group2:1:1] 0.0 1.0 847
## b[(Intercept) group:group2:1:2] 0.0 1.0 838
## b[x1 group:group2:1:2] 0.0 1.0 538
## b[(Intercept) group:group2:2:3] 0.0 1.0 457
## b[x1 group:group2:2:3] 0.0 1.0 559
## b[(Intercept) group:group2:2:4] 0.0 1.0 491
## b[x1 group:group2:2:4] 0.0 1.0 682
## b[(Intercept) group:group2:3:5] 0.0 1.0 577
## b[x1 group:group2:3:5] 0.0 1.0 876
## b[(Intercept) group:group2:3:6] 0.0 1.0 645
## b[x1 group:group2:3:6] 0.0 1.0 1000
## b[(Intercept) group:1] 0.0 1.0 250
## b[x1 group:1] 0.1 1.0 360
## b[(Intercept) group:2] 0.0 1.0 260
## b[x1 group:2] 0.0 1.0 384
## b[(Intercept) group:3] 0.1 1.0 251
## b[x1 group:3] 0.0 1.0 371
## Sigma[group:group2:(Intercept),(Intercept)] 0.0 1.0 417
## Sigma[group:group2:x1,(Intercept)] 0.0 1.0 578
## Sigma[group:group2:x1,x1] 0.0 1.0 493
## Sigma[group:(Intercept),(Intercept)] 0.1 1.0 433
## Sigma[group:x1,(Intercept)] 0.0 1.0 745
## Sigma[group:x1,x1] 0.2 1.0 352
## mean_PPD 0.0 1.0 948
## log-posterior 0.2 1.0 380
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
fixef(fit6)
## (Intercept) x1
## 0.7092476 0.5590312
ranef(fit6)
## $`group:group2`
## (Intercept) x1
## 1:1 0.0040864017 -0.004233194
## 1:2 0.0008784613 0.021137828
## 2:3 0.0037894002 -0.010469049
## 2:4 -0.0001487665 0.002474700
## 3:5 0.0002567095 0.021699918
## 3:6 -0.0119381073 -0.022397066
##
## $group
## (Intercept) x1
## 1 0.1622064 1.24514980
## 2 0.6616946 -1.01873322
## 3 -0.7743640 -0.08026877
coef(fit6)
## $`group:group2`
## (Intercept) x1
## 1:1 0.7133340 0.5547980
## 1:2 0.7101260 0.5801690
## 2:3 0.7130370 0.5485622
## 2:4 0.7090988 0.5615059
## 3:5 0.7095043 0.5807311
## 3:6 0.6973095 0.5366342
##
## $group
## (Intercept) x1
## 1 0.8714539 1.8041810
## 2 1.3709422 -0.4597020
## 3 -0.0651164 0.4787625
##
## attr(,"class")
## [1] "coef.mer"