--- title: Poisson GLMM using frequentist methods author: "[Julian Faraway](https://julianfaraway.github.io/)" date: "`r format(Sys.time(), '%d %B %Y')`" format: gfm: toc: true --- ```{r global_options, include=FALSE} knitr::opts_chunk$set(comment=NA, echo = TRUE, fig.path="figs/", dev = 'svglite', fig.ext = ".svg", warning=FALSE, message=FALSE) knitr::opts_knit$set(global.par = TRUE) ``` ```{r graphopts, include=FALSE} par(mgp=c(1.5,0.5,0), mar=c(3.1,3.1,0.1,0), pch=20) ggplot2::theme_set(ggplot2::theme_bw()) ``` See the [introduction](../index.md) for an overview. See a [mostly Bayesian analysis](nitrofen.md) analysis of the same data. This example is discussed in more detail in the book [Bayesian Regression Modeling with INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html#sec:poissonglmm) Packages used: ```{r} library(ggplot2) ``` # Data In [Davison and Hinkley, 1997]( https://doi.org/10.1017/CBO9780511802843), the results of a study on Nitrofen, a herbicide, are reported. Due to concern regarding the effect on animal life, 50 female water fleas were divided into five groups of ten each and treated with different concentrations of the herbicide. The number of offspring in three subsequent broods for each flea was recorded. We start by loading the data from the `boot` package: (the `boot` package comes with base R distribution so there is no need to download this) ```{r} #| label: nitrofendat data(nitrofen, package="boot") head(nitrofen) ``` We need to rearrange the data to have one response value per line: ```{r} lnitrofen = data.frame(conc = rep(nitrofen$conc,each=3), live = as.numeric(t(as.matrix(nitrofen[,2:4]))), id = rep(1:50,each=3), brood = rep(1:3,50)) head(lnitrofen) ``` Make a plot of the data: ```{r} #| label: fig-nitrodat #| fig-cap: "The number of live offspring varies with the concentration of Nitrofen and the brood number." lnitrofen$jconc <- lnitrofen$conc + rep(c(-10,0,10),50) lnitrofen$fbrood = factor(lnitrofen$brood) ggplot(lnitrofen, aes(x=jconc,y=live, shape=fbrood, color=fbrood)) + geom_point(position = position_jitter(w = 0, h = 0.5)) + xlab("Concentration") + labs(shape = "Brood") ``` # Model Since the response is a small count, a Poisson model is a natural choice. We expect the rate of the response to vary with the brood and concentration level. The plot of the data suggests these two predictors may have an interaction. The three observations for a single flea are likely to be correlated. We might expect a given flea to tend to produce more, or less, offspring over a lifetime. We can model this with an additive random effect. The linear predictor is: ```{math} \eta_i = x_i^T \beta + u_{j(i)}, \quad i=1, \dots, 150. \quad j=1, \dots 50, ``` where $x_i$ is a vector from the design matrix encoding the information about the $i^{th}$ observation and $u_j$ is the random affect associated with the $j^{th}$ flea. The response has distribution $Y_i \sim Poisson(\exp(\eta_i))$. # LME4 We fit a model using penalized quasi-likelihood (PQL) using the `lme4` package: ```{r} library(lme4) lnitrofen$brood = factor(lnitrofen$brood) glmod <- glmer(live ~ I(conc/300)*brood + (1|id), nAGQ=25, family=poisson, data=lnitrofen) summary(glmod, correlation = FALSE) ``` We scaled the concentration by dividing by 300 (the maximum value is 310) to avoid scaling problems encountered with `glmer()`. This is helpful in any case since it puts all the parameter estimates on a similar scale. The first brood is the reference level so the slope for this group is estimated as $-0.0437$ and is not statistically significant, confirming the impression from the plot. We can see that numbers of offspring in the second and third broods start out significantly higher for zero concentration of the herbicide, with estimates of $1.1688$ and $1.3512$. But as concentration increases, we see that the numbers decrease significantly, with slopes of $-1.6730$ and $-1.8312$ relative to the first brood. The individual SD is estimated at $0.302$ which is noticeably smaller than the estimates above, indicating that the brood and concentration effects outweigh the individual variation. We would need to work harder to test the significance of the effects as `lmerTest` is of no help for GLMMs. We do have the z-tests but in this example, they do not give us the tests we want. # NLME Cannot fit GLMMs (other than normal response). # MMRM Cannot fit GLMMs (other than normal response). # GLMMTMB See the discussion for the [single random effect example](pulpfreq.md#GLMMTMB) for some introduction. ```{r} library(glmmTMB) ``` We can fit the model with: ```{r} gtmod <- glmmTMB(live ~ I(conc/300)*brood + (1|id), family=poisson, data=lnitrofen) summary(gtmod) ``` Almost the same output as with `lme4` although a different optimizer has been used for the fit. # Discussion Only `lme4` and `glmmTMB` can manage this. # Package version info ```{r} sessionInfo() ```