Skip to content

Commit

Permalink
freq version
Browse files Browse the repository at this point in the history
  • Loading branch information
julianfaraway committed Oct 14, 2024
1 parent 010ddd4 commit 1edd9bd
Show file tree
Hide file tree
Showing 3 changed files with 386 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,5 @@ we explore below using the following packages:
- [Longitudinal Models](mixed/longitfreq.md) - the `psid` data
- [Repeated Measures](mixed/visionfreq.md) - the `vision` data
- [Multiple Response Models](mixed/jspmultiplefreq.md) - the `jsp` data
- [Poisson reponse model](mixed/nitrofreq.md) - the `nitrofen` data
- [Poisson reponse model](mixed/nitrofreq.md) - the `nitrofen` data
- [Binary response model](mixed/ohiofreq.md) - the `ohio` data
232 changes: 232 additions & 0 deletions mixed/ohiofreq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
# Binary response GLMM using Frequentist Methods
[Julian Faraway](https://julianfaraway.github.io/)
2024-10-14

- [Data and Model](#data-and-model)
- [LME4](#lme4)
- [NLME](#nlme)
- [MMRM](#mmrm)
- [GLMMTMB](#glmmtmb)
- [Discussion](#discussion)
- [Package version info](#package-version-info)

See the [introduction](../index.md) for an overview.

See a [mostly Bayesian analysis](ohio.md) analysis of the same data.

This example is discussed in more detail in the book [Bayesian
Regression Modeling with
INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html)

Packages used:

``` r
library(ggplot2)
library(knitr)
```

# Data and Model

In [Fitzmaurice and Laird,
1993](https://doi.org/10.1093/biomet/80.1.141), data on 537 children
aged 7–10 in six Ohio cities are reported. The response is binary — does
the child suffer from wheezing (indication of a pulmonary problem) where
one indicates yes and zero no. This status is reported for each of four
years at ages 7, 8, 9 and 10. There is also an indicator variable for
whether the mother of the child is a smoker. Because we have four binary
responses for each child, we expect these to be correlated and our model
needs to reflect this.

We sum the number of smoking and non-smoking mothers:

``` r
data(ohio, package="brinla")
table(ohio$smoke)/4
```


0 1
350 187

We use this to produce the proportion of wheezing children classified by
age and maternal smoking status:

``` r
xtabs(resp ~ smoke + age, ohio)/c(350,187)
```

age
smoke -2 -1 0 1
0 0.16000 0.14857 0.14286 0.10571
1 0.16578 0.20856 0.18717 0.13904

Age has been adjusted so that nine years old is zero. We see that
wheezing appears to decline with age and that there may be more wheezing
in children with mothers who smoke. But the effects are not clear and we
need modeling to be sure about these conclusions.

A plausible model uses a logit link with a linear predictor of the form:

``` math
\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4,
```

with

``` math
P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}.
```

The random effect $u_i$ models the propensity of child $i$ to wheeze.
Children are likely to vary in their health condition and this effect
enables us to include this unknown variation in the model. Because $u_i$
is added to all four observations for a child, we induce a positive
correlation among the four responses as we might naturally expect. The
response is Bernoulli or, in other words, binomial with trial size one.

# LME4

Here is the model fit penalized quasi-likelihood using the `lme4`
package:

``` r
library(lme4)
modagh <- glmer(resp ~ age + smoke + (1|id),
family=binomial, data=ohio)
summary(modagh, correlation = FALSE)
```

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: resp ~ age + smoke + (1 | id)
Data: ohio

AIC BIC logLik deviance df.resid
1597.9 1620.6 -794.9 1589.9 2144

Scaled residuals:
Min 1Q Median 3Q Max
-1.403 -0.180 -0.158 -0.132 2.518

Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 5.49 2.34
Number of obs: 2148, groups: id, 537

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.374 0.275 -12.27 <2e-16
age -0.177 0.068 -2.60 0.0093
smoke 0.415 0.287 1.45 0.1485

We see that there is no significant effect due to maternal smoking.

Suppose you do not take into account the correlated response within the
individuals and fit a GLM ignoring the ID random effect:

``` r
modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio)
faraway::sumary(modglm)
```

Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.8837 0.0838 -22.5 <2e-16
age -0.1134 0.0541 -2.1 0.036
smoke 0.2721 0.1235 2.2 0.028

n = 2148 p = 3
Deviance = 1819.889 Null Deviance = 1829.089 (Difference = 9.199)

We see that the effect of maternal smoking is significant (but this
would be the incorrect conclusion).

# NLME

Cannot fit GLMMs (other than normal response).

# MMRM

Cannot fit GLMMs (other than normal response).

# GLMMTMB

See the discussion for the [single random effect
example](pulpfreq.md#GLMMTMB) for some introduction.

``` r
library(glmmTMB)
```

We can fit the model with:

``` r
gtmod <- glmmTMB(resp ~ age + smoke + (1|id),
family=binomial, data=ohio)
summary(gtmod)
```

Family: binomial ( logit )
Formula: resp ~ age + smoke + (1 | id)
Data: ohio

AIC BIC logLik deviance df.resid
1597.9 1620.6 -794.9 1589.9 2144

Random effects:

Conditional model:
Groups Name Variance Std.Dev.
id (Intercept) 5.49 2.34
Number of obs: 2148, groups: id, 537

Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.374 0.275 -12.27 <2e-16
age -0.177 0.068 -2.60 0.0093
smoke 0.415 0.287 1.45 0.1484

Same as `lme4`.

# Discussion

- In both cases, we do not find evidence of an effect for maternal
smoking.

# Package version info

``` r
sessionInfo()
```

R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] glmmTMB_1.1.9 lme4_1.1-35.5 Matrix_1.7-0 knitr_1.48 ggplot2_3.5.1

loaded via a namespace (and not attached):
[1] gtable_0.3.5 jsonlite_1.8.8 dplyr_1.1.4 compiler_4.4.1 tidyselect_1.2.1
[6] Rcpp_1.0.13 splines_4.4.1 systemfonts_1.1.0 scales_1.3.0 boot_1.3-31
[11] yaml_2.3.10 fastmap_1.2.0 coda_0.19-4.1 lattice_0.22-6 R6_2.5.1
[16] generics_0.1.3 MASS_7.3-61 tibble_3.2.1 nloptr_2.1.1 munsell_0.5.1
[21] minqa_1.2.8 svglite_2.1.3 TMB_1.9.14 pillar_1.9.0 rlang_1.1.4
[26] utf8_1.2.4 xfun_0.47 estimability_1.5.1 cli_3.6.3 mgcv_1.9-1
[31] withr_3.0.1 magrittr_2.0.3 emmeans_1.10.4 digest_0.6.37 grid_4.4.1
[36] xtable_1.8-4 mvtnorm_1.2-6 faraway_1.0.8 rstudioapi_0.16.0 lifecycle_1.0.4
[41] nlme_3.1-166 vctrs_0.6.5 evaluate_0.24.0 glue_1.8.0 numDeriv_2016.8-1.1
[46] fansi_1.0.6 colorspace_2.1-1 rmarkdown_2.28 tools_4.4.1 pkgconfig_2.0.3
[51] htmltools_0.5.8.1
152 changes: 152 additions & 0 deletions mixed/ohiofreq.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
title: Binary response GLMM using Frequentist Methods
author: "[Julian Faraway](https://julianfaraway.github.io/)"
date: "`r format(Sys.time(), '%d %B %Y')`"
format:
gfm:
toc: true
---

```{r global_options, include=FALSE}
knitr::opts_chunk$set(comment=NA,
echo = TRUE,
fig.path="figs/",
dev = 'svglite',
fig.ext = ".svg",
warning=FALSE,
message=FALSE)
knitr::opts_knit$set(global.par = TRUE)
```

```{r graphopts, include=FALSE}
par(mgp=c(1.5,0.5,0), mar=c(3.1,3.1,0.1,0), pch=20)
ggplot2::theme_set(ggplot2::theme_bw())
```


See the [introduction](../index.md) for an overview.

See a [mostly Bayesian analysis](ohio.md) analysis of the same
data.

This example is discussed in more detail in the book
[Bayesian Regression Modeling with INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html)

Packages used:

```{r}
library(ggplot2)
library(knitr)
```


# Data and Model

In [Fitzmaurice and Laird, 1993](https://doi.org/10.1093/biomet/80.1.141), data on 537 children aged 7--10 in six
Ohio cities are reported. The response is binary --- does the child
suffer from wheezing (indication of a pulmonary problem) where one
indicates yes and zero no. This status is reported for each of four
years at ages 7, 8, 9 and 10. There is also an indicator variable for
whether the mother of the child is a smoker. Because we have four binary
responses for each child, we expect these to be correlated and our model
needs to reflect this.

We sum the number of smoking and non-smoking mothers:
```{r}
#| label: ohiodat
data(ohio, package="brinla")
table(ohio$smoke)/4
```
We use this to produce the proportion of wheezing children classified by
age and maternal smoking status:
```{r}
xtabs(resp ~ smoke + age, ohio)/c(350,187)
```

Age has been adjusted so that nine years old is zero. We see that
wheezing appears to decline with age and that there may be more wheezing
in children with mothers who smoke. But the effects are not clear and we
need modeling to be sure about these conclusions.

A plausible model uses a logit link with a linear predictor of the form:
```{math}
\eta_{ij} = \beta_0 + \beta_1 age_j + \beta_2 smoke_i + u_i, \quad i=1, \dots ,537, \quad j=1,2,3,4,
```
with
```{math}
P(Y_{ij} = 1) = {\exp(\eta_{ij}) \over 1+\exp(\eta_{ij})}.
```
The
random effect $u_i$ models the propensity of child $i$ to wheeze.
Children are likely to vary in their health condition and this effect
enables us to include this unknown variation in the model. Because $u_i$
is added to all four observations for a child, we induce a positive
correlation among the four responses as we might naturally expect. The
response is Bernoulli or, in other words, binomial with trial size one.


# LME4

Here is the model fit penalized quasi-likelihood using the `lme4` package:

```{r}
#| label: ohiolmerfit
library(lme4)
modagh <- glmer(resp ~ age + smoke + (1|id),
family=binomial, data=ohio)
summary(modagh, correlation = FALSE)
```

We see that there is no significant effect due to maternal smoking.

Suppose you do not take into account the correlated response
within the individuals and fit a GLM ignoring the ID random effect:

```{r}
modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio)
faraway::sumary(modglm)
```

We see that the effect of maternal smoking is significant (but this
would be the incorrect conclusion).

# NLME

Cannot fit GLMMs (other than normal response).

# MMRM

Cannot fit GLMMs (other than normal response).

# GLMMTMB

See the discussion for the [single random effect example](pulpfreq.md#GLMMTMB)
for some introduction.

```{r}
library(glmmTMB)
```

We can fit the model with:

```{r}
gtmod <- glmmTMB(resp ~ age + smoke + (1|id),
family=binomial, data=ohio)
summary(gtmod)
```

Same as `lme4`.


# Discussion

- In both cases, we do not find evidence of an effect for maternal smoking.


# Package version info


```{r}
sessionInfo()
```

0 comments on commit 1edd9bd

Please sign in to comment.