freq version

julianfaraway · Oct 14, 2024 · 1edd9bd · 1edd9bd
1 parent 010ddd4
commit 1edd9bd
Show file tree

Hide file tree

Showing 3 changed files with 386 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -53,4 +53,5 @@ we explore below using the following packages:
 - [Longitudinal Models](mixed/longitfreq.md) - the `psid` data
 - [Repeated Measures](mixed/visionfreq.md) - the `vision` data
 - [Multiple Response Models](mixed/jspmultiplefreq.md) - the `jsp` data
-- [Poisson reponse model](mixed/nitrofreq.md) - the `nitrofen` data
+- [Poisson reponse model](mixed/nitrofreq.md) - the `nitrofen` data
+- [Binary response model](mixed/ohiofreq.md) - the `ohio` data
diff --git a/mixed/ohiofreq.md b/mixed/ohiofreq.md
@@ -0,0 +1,232 @@
+# Binary response GLMM using Frequentist Methods
+[Julian Faraway](https://julianfaraway.github.io/)
+2024-10-14
+
+- [Data and Model](#data-and-model)
+- [LME4](#lme4)
+- [NLME](#nlme)
+- [MMRM](#mmrm)
+- [GLMMTMB](#glmmtmb)
+- [Discussion](#discussion)
+- [Package version info](#package-version-info)
+
+See the [introduction](../index.md) for an overview.
+
+See a [mostly Bayesian analysis](ohio.md) analysis of the same data.
+
+This example is discussed in more detail in the book [Bayesian
+Regression Modeling with
+INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html)
+
+Packages used:
+
+``` r
+library(ggplot2)
+library(knitr)
+```
+
+# Data and Model
+
+In [Fitzmaurice and Laird,
+1993](https://doi.org/10.1093/biomet/80.1.141), data on 537 children
+aged 7–10 in six Ohio cities are reported. The response is binary — does
+the child suffer from wheezing (indication of a pulmonary problem) where
+one indicates yes and zero no. This status is reported for each of four
+years at ages 7, 8, 9 and 10. There is also an indicator variable for
+whether the mother of the child is a smoker. Because we have four binary
+responses for each child, we expect these to be correlated and our model
+needs to reflect this.
+
+We sum the number of smoking and non-smoking mothers:
+
+``` r
+data(ohio, package="brinla")
+table(ohio$smoke)/4
+```
+
+
+      0   1 
+    350 187 
+
+We use this to produce the proportion of wheezing children classified by
+age and maternal smoking status:
+
+``` r
+xtabs(resp ~ smoke + age, ohio)/c(350,187)
+```
+
+         age
+    smoke      -2      -1       0       1
+        0 0.16000 0.14857 0.14286 0.10571
+        1 0.16578 0.20856 0.18717 0.13904
+
+Age has been adjusted so that nine years old is zero. We see that
+wheezing appears to decline with age and that there may be more wheezing
+in children with mothers who smoke. But the effects are not clear and we
+need modeling to be sure about these conclusions.
+
+A plausible model uses a logit link with a linear predictor of the form:
+
+``` math
+\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4,
+```
+
+with
+
+``` math
+P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}.
+```
+
+The random effect $u_i$ models the propensity of child $i$ to wheeze.
+Children are likely to vary in their health condition and this effect
+enables us to include this unknown variation in the model. Because $u_i$
+is added to all four observations for a child, we induce a positive
+correlation among the four responses as we might naturally expect. The
+response is Bernoulli or, in other words, binomial with trial size one.
+
+# LME4
+
+Here is the model fit penalized quasi-likelihood using the `lme4`
+package:
+
+``` r
+library(lme4)
+modagh <- glmer(resp ~ age + smoke + (1|id), 
+              family=binomial, data=ohio)
+summary(modagh, correlation = FALSE)
+```
+
+    Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
+     Family: binomial  ( logit )
+    Formula: resp ~ age + smoke + (1 | id)
+       Data: ohio
+
+         AIC      BIC   logLik deviance df.resid 
+      1597.9   1620.6   -794.9   1589.9     2144 
+
+    Scaled residuals: 
+       Min     1Q Median     3Q    Max 
+    -1.403 -0.180 -0.158 -0.132  2.518 
+
+    Random effects:
+     Groups Name        Variance Std.Dev.
+     id     (Intercept) 5.49     2.34    
+    Number of obs: 2148, groups:  id, 537
+
+    Fixed effects:
+                Estimate Std. Error z value Pr(>|z|)
+    (Intercept)   -3.374      0.275  -12.27   <2e-16
+    age           -0.177      0.068   -2.60   0.0093
+    smoke          0.415      0.287    1.45   0.1485
+
+We see that there is no significant effect due to maternal smoking.
+
+Suppose you do not take into account the correlated response within the
+individuals and fit a GLM ignoring the ID random effect:
+
+``` r
+modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio)
+faraway::sumary(modglm)
+```
+
+                Estimate Std. Error z value Pr(>|z|)
+    (Intercept)  -1.8837     0.0838   -22.5   <2e-16
+    age          -0.1134     0.0541    -2.1    0.036
+    smoke         0.2721     0.1235     2.2    0.028
+
+    n = 2148 p = 3
+    Deviance = 1819.889 Null Deviance = 1829.089 (Difference = 9.199) 
+
+We see that the effect of maternal smoking is significant (but this
+would be the incorrect conclusion).
+
+# NLME
+
+Cannot fit GLMMs (other than normal response).
+
+# MMRM
+
+Cannot fit GLMMs (other than normal response).
+
+# GLMMTMB
+
+See the discussion for the [single random effect
+example](pulpfreq.md#GLMMTMB) for some introduction.
+
+``` r
+library(glmmTMB)
+```
+
+We can fit the model with:
+
+``` r
+gtmod <- glmmTMB(resp ~ age + smoke + (1|id), 
+             family=binomial, data=ohio)
+summary(gtmod)
+```
+
+     Family: binomial  ( logit )
+    Formula:          resp ~ age + smoke + (1 | id)
+    Data: ohio
+
+         AIC      BIC   logLik deviance df.resid 
+      1597.9   1620.6   -794.9   1589.9     2144 
+
+    Random effects:
+
+    Conditional model:
+     Groups Name        Variance Std.Dev.
+     id     (Intercept) 5.49     2.34    
+    Number of obs: 2148, groups:  id, 537
+
+    Conditional model:
+                Estimate Std. Error z value Pr(>|z|)
+    (Intercept)   -3.374      0.275  -12.27   <2e-16
+    age           -0.177      0.068   -2.60   0.0093
+    smoke          0.415      0.287    1.45   0.1484
+
+Same as `lme4`.
+
+# Discussion
+
+- In both cases, we do not find evidence of an effect for maternal
+  smoking.
+
+# Package version info
+
+``` r
+sessionInfo()
+```
+
+    R version 4.4.1 (2024-06-14)
+    Platform: x86_64-apple-darwin20
+    Running under: macOS Sonoma 14.7
+
+    Matrix products: default
+    BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
+    LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
+
+    locale:
+    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
+
+    time zone: Europe/London
+    tzcode source: internal
+
+    attached base packages:
+    [1] stats     graphics  grDevices utils     datasets  methods   base     
+
+    other attached packages:
+    [1] glmmTMB_1.1.9 lme4_1.1-35.5 Matrix_1.7-0  knitr_1.48    ggplot2_3.5.1
+
+    loaded via a namespace (and not attached):
+     [1] gtable_0.3.5        jsonlite_1.8.8      dplyr_1.1.4         compiler_4.4.1      tidyselect_1.2.1   
+     [6] Rcpp_1.0.13         splines_4.4.1       systemfonts_1.1.0   scales_1.3.0        boot_1.3-31        
+    [11] yaml_2.3.10         fastmap_1.2.0       coda_0.19-4.1       lattice_0.22-6      R6_2.5.1           
+    [16] generics_0.1.3      MASS_7.3-61         tibble_3.2.1        nloptr_2.1.1        munsell_0.5.1      
+    [21] minqa_1.2.8         svglite_2.1.3       TMB_1.9.14          pillar_1.9.0        rlang_1.1.4        
+    [26] utf8_1.2.4          xfun_0.47           estimability_1.5.1  cli_3.6.3           mgcv_1.9-1         
+    [31] withr_3.0.1         magrittr_2.0.3      emmeans_1.10.4      digest_0.6.37       grid_4.4.1         
+    [36] xtable_1.8-4        mvtnorm_1.2-6       faraway_1.0.8       rstudioapi_0.16.0   lifecycle_1.0.4    
+    [41] nlme_3.1-166        vctrs_0.6.5         evaluate_0.24.0     glue_1.8.0          numDeriv_2016.8-1.1
+    [46] fansi_1.0.6         colorspace_2.1-1    rmarkdown_2.28      tools_4.4.1         pkgconfig_2.0.3    
+    [51] htmltools_0.5.8.1  
diff --git a/mixed/ohiofreq.qmd b/mixed/ohiofreq.qmd
@@ -0,0 +1,152 @@
+---
+title: Binary response GLMM using Frequentist Methods
+author: "[Julian Faraway](https://julianfaraway.github.io/)"
+date: "`r format(Sys.time(), '%d %B %Y')`"
+format: 
+  gfm:
+    toc: true
+---
+
+```{r global_options, include=FALSE}
+knitr::opts_chunk$set(comment=NA, 
+                      echo = TRUE,
+                      fig.path="figs/",
+                      dev = 'svglite',  
+                      fig.ext = ".svg",
+                      warning=FALSE, 
+                      message=FALSE)
+knitr::opts_knit$set(global.par = TRUE)
+```
+
+```{r graphopts, include=FALSE}
+par(mgp=c(1.5,0.5,0), mar=c(3.1,3.1,0.1,0), pch=20)
+ggplot2::theme_set(ggplot2::theme_bw())
+```
+
+
+See the [introduction](../index.md) for an overview. 
+
+See a [mostly Bayesian analysis](ohio.md) analysis of the same
+data. 
+
+This example is discussed in more detail in the book
+[Bayesian Regression Modeling with INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html)
+
+Packages used:
+
+```{r}
+library(ggplot2)
+library(knitr)
+```
+
+
+# Data and Model
+
+In [Fitzmaurice and Laird, 1993](https://doi.org/10.1093/biomet/80.1.141), data on 537 children aged 7--10 in six
+Ohio cities are reported. The response is binary --- does the child
+suffer from wheezing (indication of a pulmonary problem) where one
+indicates yes and zero no. This status is reported for each of four
+years at ages 7, 8, 9 and 10. There is also an indicator variable for
+whether the mother of the child is a smoker. Because we have four binary
+responses for each child, we expect these to be correlated and our model
+needs to reflect this.
+
+We sum the number of smoking and non-smoking mothers:
+```{r}
+#| label: ohiodat
+data(ohio, package="brinla")
+table(ohio$smoke)/4
+```
+We use this to produce the proportion of wheezing children classified by
+age and maternal smoking status:
+```{r}
+xtabs(resp ~ smoke + age, ohio)/c(350,187)
+```
+
+Age has been adjusted so that nine years old is zero. We see that
+wheezing appears to decline with age and that there may be more wheezing
+in children with mothers who smoke. But the effects are not clear and we
+need modeling to be sure about these conclusions.
+
+A plausible model uses a logit link with a linear predictor of the form:
+```{math}
+\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4,
+```
+with 
+```{math}
+P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}.
+```
+The
+random effect $u_i$ models the propensity of child $i$ to wheeze.
+Children are likely to vary in their health condition and this effect
+enables us to include this unknown variation in the model. Because $u_i$
+is added to all four observations for a child, we induce a positive
+correlation among the four responses as we might naturally expect. The
+response is Bernoulli or, in other words, binomial with trial size one.
+
+
+# LME4
+
+Here is the model fit penalized quasi-likelihood using the `lme4` package:
+
+```{r}
+#| label: ohiolmerfit
+library(lme4)
+modagh <- glmer(resp ~ age + smoke + (1|id), 
+              family=binomial, data=ohio)
+summary(modagh, correlation = FALSE)
+```
+
+We see that there is no significant effect due to maternal smoking.
+
+Suppose you do not take into account the correlated response
+within the individuals and fit a GLM ignoring the ID random effect:
+
+```{r}
+modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio)
+faraway::sumary(modglm)
+```
+
+We see that the effect of maternal smoking is significant (but this
+would be the incorrect conclusion).
+
+# NLME
+
+Cannot fit GLMMs (other than normal response).
+
+# MMRM
+
+Cannot fit GLMMs (other than normal response).
+
+# GLMMTMB
+
+See the discussion for the [single random effect example](pulpfreq.md#GLMMTMB)
+for some introduction.
+
+```{r}
+library(glmmTMB)
+```
+
+We can fit the model with:
+
+```{r}
+gtmod <- glmmTMB(resp ~ age + smoke + (1|id), 
+             family=binomial, data=ohio)
+summary(gtmod)
+```
+
+Same as `lme4`.
+
+
+# Discussion
+
+-  In both cases, we do not find evidence of an effect for maternal smoking.
+
+
+# Package version info
+
+
+```{r}
+sessionInfo()
+```
+