-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
010ddd4
commit 1edd9bd
Showing
3 changed files
with
386 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
# Binary response GLMM using Frequentist Methods | ||
[Julian Faraway](https://julianfaraway.github.io/) | ||
2024-10-14 | ||
|
||
- [Data and Model](#data-and-model) | ||
- [LME4](#lme4) | ||
- [NLME](#nlme) | ||
- [MMRM](#mmrm) | ||
- [GLMMTMB](#glmmtmb) | ||
- [Discussion](#discussion) | ||
- [Package version info](#package-version-info) | ||
|
||
See the [introduction](../index.md) for an overview. | ||
|
||
See a [mostly Bayesian analysis](ohio.md) analysis of the same data. | ||
|
||
This example is discussed in more detail in the book [Bayesian | ||
Regression Modeling with | ||
INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html) | ||
|
||
Packages used: | ||
|
||
``` r | ||
library(ggplot2) | ||
library(knitr) | ||
``` | ||
|
||
# Data and Model | ||
|
||
In [Fitzmaurice and Laird, | ||
1993](https://doi.org/10.1093/biomet/80.1.141), data on 537 children | ||
aged 7–10 in six Ohio cities are reported. The response is binary — does | ||
the child suffer from wheezing (indication of a pulmonary problem) where | ||
one indicates yes and zero no. This status is reported for each of four | ||
years at ages 7, 8, 9 and 10. There is also an indicator variable for | ||
whether the mother of the child is a smoker. Because we have four binary | ||
responses for each child, we expect these to be correlated and our model | ||
needs to reflect this. | ||
|
||
We sum the number of smoking and non-smoking mothers: | ||
|
||
``` r | ||
data(ohio, package="brinla") | ||
table(ohio$smoke)/4 | ||
``` | ||
|
||
|
||
0 1 | ||
350 187 | ||
|
||
We use this to produce the proportion of wheezing children classified by | ||
age and maternal smoking status: | ||
|
||
``` r | ||
xtabs(resp ~ smoke + age, ohio)/c(350,187) | ||
``` | ||
|
||
age | ||
smoke -2 -1 0 1 | ||
0 0.16000 0.14857 0.14286 0.10571 | ||
1 0.16578 0.20856 0.18717 0.13904 | ||
|
||
Age has been adjusted so that nine years old is zero. We see that | ||
wheezing appears to decline with age and that there may be more wheezing | ||
in children with mothers who smoke. But the effects are not clear and we | ||
need modeling to be sure about these conclusions. | ||
|
||
A plausible model uses a logit link with a linear predictor of the form: | ||
|
||
``` math | ||
\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4, | ||
``` | ||
|
||
with | ||
|
||
``` math | ||
P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}. | ||
``` | ||
|
||
The random effect $u_i$ models the propensity of child $i$ to wheeze. | ||
Children are likely to vary in their health condition and this effect | ||
enables us to include this unknown variation in the model. Because $u_i$ | ||
is added to all four observations for a child, we induce a positive | ||
correlation among the four responses as we might naturally expect. The | ||
response is Bernoulli or, in other words, binomial with trial size one. | ||
|
||
# LME4 | ||
|
||
Here is the model fit penalized quasi-likelihood using the `lme4` | ||
package: | ||
|
||
``` r | ||
library(lme4) | ||
modagh <- glmer(resp ~ age + smoke + (1|id), | ||
family=binomial, data=ohio) | ||
summary(modagh, correlation = FALSE) | ||
``` | ||
|
||
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] | ||
Family: binomial ( logit ) | ||
Formula: resp ~ age + smoke + (1 | id) | ||
Data: ohio | ||
|
||
AIC BIC logLik deviance df.resid | ||
1597.9 1620.6 -794.9 1589.9 2144 | ||
|
||
Scaled residuals: | ||
Min 1Q Median 3Q Max | ||
-1.403 -0.180 -0.158 -0.132 2.518 | ||
|
||
Random effects: | ||
Groups Name Variance Std.Dev. | ||
id (Intercept) 5.49 2.34 | ||
Number of obs: 2148, groups: id, 537 | ||
|
||
Fixed effects: | ||
Estimate Std. Error z value Pr(>|z|) | ||
(Intercept) -3.374 0.275 -12.27 <2e-16 | ||
age -0.177 0.068 -2.60 0.0093 | ||
smoke 0.415 0.287 1.45 0.1485 | ||
|
||
We see that there is no significant effect due to maternal smoking. | ||
|
||
Suppose you do not take into account the correlated response within the | ||
individuals and fit a GLM ignoring the ID random effect: | ||
|
||
``` r | ||
modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio) | ||
faraway::sumary(modglm) | ||
``` | ||
|
||
Estimate Std. Error z value Pr(>|z|) | ||
(Intercept) -1.8837 0.0838 -22.5 <2e-16 | ||
age -0.1134 0.0541 -2.1 0.036 | ||
smoke 0.2721 0.1235 2.2 0.028 | ||
|
||
n = 2148 p = 3 | ||
Deviance = 1819.889 Null Deviance = 1829.089 (Difference = 9.199) | ||
|
||
We see that the effect of maternal smoking is significant (but this | ||
would be the incorrect conclusion). | ||
|
||
# NLME | ||
|
||
Cannot fit GLMMs (other than normal response). | ||
|
||
# MMRM | ||
|
||
Cannot fit GLMMs (other than normal response). | ||
|
||
# GLMMTMB | ||
|
||
See the discussion for the [single random effect | ||
example](pulpfreq.md#GLMMTMB) for some introduction. | ||
|
||
``` r | ||
library(glmmTMB) | ||
``` | ||
|
||
We can fit the model with: | ||
|
||
``` r | ||
gtmod <- glmmTMB(resp ~ age + smoke + (1|id), | ||
family=binomial, data=ohio) | ||
summary(gtmod) | ||
``` | ||
|
||
Family: binomial ( logit ) | ||
Formula: resp ~ age + smoke + (1 | id) | ||
Data: ohio | ||
|
||
AIC BIC logLik deviance df.resid | ||
1597.9 1620.6 -794.9 1589.9 2144 | ||
|
||
Random effects: | ||
|
||
Conditional model: | ||
Groups Name Variance Std.Dev. | ||
id (Intercept) 5.49 2.34 | ||
Number of obs: 2148, groups: id, 537 | ||
|
||
Conditional model: | ||
Estimate Std. Error z value Pr(>|z|) | ||
(Intercept) -3.374 0.275 -12.27 <2e-16 | ||
age -0.177 0.068 -2.60 0.0093 | ||
smoke 0.415 0.287 1.45 0.1484 | ||
|
||
Same as `lme4`. | ||
|
||
# Discussion | ||
|
||
- In both cases, we do not find evidence of an effect for maternal | ||
smoking. | ||
|
||
# Package version info | ||
|
||
``` r | ||
sessionInfo() | ||
``` | ||
|
||
R version 4.4.1 (2024-06-14) | ||
Platform: x86_64-apple-darwin20 | ||
Running under: macOS Sonoma 14.7 | ||
|
||
Matrix products: default | ||
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib | ||
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 | ||
|
||
locale: | ||
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 | ||
|
||
time zone: Europe/London | ||
tzcode source: internal | ||
|
||
attached base packages: | ||
[1] stats graphics grDevices utils datasets methods base | ||
|
||
other attached packages: | ||
[1] glmmTMB_1.1.9 lme4_1.1-35.5 Matrix_1.7-0 knitr_1.48 ggplot2_3.5.1 | ||
|
||
loaded via a namespace (and not attached): | ||
[1] gtable_0.3.5 jsonlite_1.8.8 dplyr_1.1.4 compiler_4.4.1 tidyselect_1.2.1 | ||
[6] Rcpp_1.0.13 splines_4.4.1 systemfonts_1.1.0 scales_1.3.0 boot_1.3-31 | ||
[11] yaml_2.3.10 fastmap_1.2.0 coda_0.19-4.1 lattice_0.22-6 R6_2.5.1 | ||
[16] generics_0.1.3 MASS_7.3-61 tibble_3.2.1 nloptr_2.1.1 munsell_0.5.1 | ||
[21] minqa_1.2.8 svglite_2.1.3 TMB_1.9.14 pillar_1.9.0 rlang_1.1.4 | ||
[26] utf8_1.2.4 xfun_0.47 estimability_1.5.1 cli_3.6.3 mgcv_1.9-1 | ||
[31] withr_3.0.1 magrittr_2.0.3 emmeans_1.10.4 digest_0.6.37 grid_4.4.1 | ||
[36] xtable_1.8-4 mvtnorm_1.2-6 faraway_1.0.8 rstudioapi_0.16.0 lifecycle_1.0.4 | ||
[41] nlme_3.1-166 vctrs_0.6.5 evaluate_0.24.0 glue_1.8.0 numDeriv_2016.8-1.1 | ||
[46] fansi_1.0.6 colorspace_2.1-1 rmarkdown_2.28 tools_4.4.1 pkgconfig_2.0.3 | ||
[51] htmltools_0.5.8.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
--- | ||
title: Binary response GLMM using Frequentist Methods | ||
author: "[Julian Faraway](https://julianfaraway.github.io/)" | ||
date: "`r format(Sys.time(), '%d %B %Y')`" | ||
format: | ||
gfm: | ||
toc: true | ||
--- | ||
|
||
```{r global_options, include=FALSE} | ||
knitr::opts_chunk$set(comment=NA, | ||
echo = TRUE, | ||
fig.path="figs/", | ||
dev = 'svglite', | ||
fig.ext = ".svg", | ||
warning=FALSE, | ||
message=FALSE) | ||
knitr::opts_knit$set(global.par = TRUE) | ||
``` | ||
|
||
```{r graphopts, include=FALSE} | ||
par(mgp=c(1.5,0.5,0), mar=c(3.1,3.1,0.1,0), pch=20) | ||
ggplot2::theme_set(ggplot2::theme_bw()) | ||
``` | ||
|
||
|
||
See the [introduction](../index.md) for an overview. | ||
|
||
See a [mostly Bayesian analysis](ohio.md) analysis of the same | ||
data. | ||
|
||
This example is discussed in more detail in the book | ||
[Bayesian Regression Modeling with INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html) | ||
|
||
Packages used: | ||
|
||
```{r} | ||
library(ggplot2) | ||
library(knitr) | ||
``` | ||
|
||
|
||
# Data and Model | ||
|
||
In [Fitzmaurice and Laird, 1993](https://doi.org/10.1093/biomet/80.1.141), data on 537 children aged 7--10 in six | ||
Ohio cities are reported. The response is binary --- does the child | ||
suffer from wheezing (indication of a pulmonary problem) where one | ||
indicates yes and zero no. This status is reported for each of four | ||
years at ages 7, 8, 9 and 10. There is also an indicator variable for | ||
whether the mother of the child is a smoker. Because we have four binary | ||
responses for each child, we expect these to be correlated and our model | ||
needs to reflect this. | ||
|
||
We sum the number of smoking and non-smoking mothers: | ||
```{r} | ||
#| label: ohiodat | ||
data(ohio, package="brinla") | ||
table(ohio$smoke)/4 | ||
``` | ||
We use this to produce the proportion of wheezing children classified by | ||
age and maternal smoking status: | ||
```{r} | ||
xtabs(resp ~ smoke + age, ohio)/c(350,187) | ||
``` | ||
|
||
Age has been adjusted so that nine years old is zero. We see that | ||
wheezing appears to decline with age and that there may be more wheezing | ||
in children with mothers who smoke. But the effects are not clear and we | ||
need modeling to be sure about these conclusions. | ||
|
||
A plausible model uses a logit link with a linear predictor of the form: | ||
```{math} | ||
\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4, | ||
``` | ||
with | ||
```{math} | ||
P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}. | ||
``` | ||
The | ||
random effect $u_i$ models the propensity of child $i$ to wheeze. | ||
Children are likely to vary in their health condition and this effect | ||
enables us to include this unknown variation in the model. Because $u_i$ | ||
is added to all four observations for a child, we induce a positive | ||
correlation among the four responses as we might naturally expect. The | ||
response is Bernoulli or, in other words, binomial with trial size one. | ||
|
||
|
||
# LME4 | ||
|
||
Here is the model fit penalized quasi-likelihood using the `lme4` package: | ||
|
||
```{r} | ||
#| label: ohiolmerfit | ||
library(lme4) | ||
modagh <- glmer(resp ~ age + smoke + (1|id), | ||
family=binomial, data=ohio) | ||
summary(modagh, correlation = FALSE) | ||
``` | ||
|
||
We see that there is no significant effect due to maternal smoking. | ||
|
||
Suppose you do not take into account the correlated response | ||
within the individuals and fit a GLM ignoring the ID random effect: | ||
|
||
```{r} | ||
modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio) | ||
faraway::sumary(modglm) | ||
``` | ||
|
||
We see that the effect of maternal smoking is significant (but this | ||
would be the incorrect conclusion). | ||
|
||
# NLME | ||
|
||
Cannot fit GLMMs (other than normal response). | ||
|
||
# MMRM | ||
|
||
Cannot fit GLMMs (other than normal response). | ||
|
||
# GLMMTMB | ||
|
||
See the discussion for the [single random effect example](pulpfreq.md#GLMMTMB) | ||
for some introduction. | ||
|
||
```{r} | ||
library(glmmTMB) | ||
``` | ||
|
||
We can fit the model with: | ||
|
||
```{r} | ||
gtmod <- glmmTMB(resp ~ age + smoke + (1|id), | ||
family=binomial, data=ohio) | ||
summary(gtmod) | ||
``` | ||
|
||
Same as `lme4`. | ||
|
||
|
||
# Discussion | ||
|
||
- In both cases, we do not find evidence of an effect for maternal smoking. | ||
|
||
|
||
# Package version info | ||
|
||
|
||
```{r} | ||
sessionInfo() | ||
``` | ||
|