mixed/nitrofreq.qmd

---
title: Poisson GLMM using frequentist methods
author: "[Julian Faraway](https://julianfaraway.github.io/)"
date: "`r format(Sys.time(), '%d %B %Y')`"
format: 
  gfm:
    toc: true
---

```{r global_options, include=FALSE}
knitr::opts_chunk$set(comment=NA, 
                      echo = TRUE,
                      fig.path="figs/",
                      dev = 'svglite',  
                      fig.ext = ".svg",
                      warning=FALSE, 
                      message=FALSE)
knitr::opts_knit$set(global.par = TRUE)
```

```{r graphopts, include=FALSE}
par(mgp=c(1.5,0.5,0), mar=c(3.1,3.1,0.1,0), pch=20)
ggplot2::theme_set(ggplot2::theme_bw())
```


See the [introduction](../index.md) for an overview. 

See a [mostly Bayesian analysis](nitrofen.md) analysis of the same
data. 


This example is discussed in more detail in the book
[Bayesian Regression Modeling with INLA](https://julianfaraway.github.io/brinlabook/chaglmm.html#sec:poissonglmm)

Packages used:

```{r}
library(ggplot2)
```


# Data

In [Davison and Hinkley, 1997]( https://doi.org/10.1017/CBO9780511802843), the results of a study on Nitrofen, a herbicide, are reported. Due to concern regarding the effect on animal life, 50 female water fleas were divided into five groups of ten each and treated with different concentrations of the herbicide. The number of offspring in three subsequent broods for each flea was recorded. We start by loading the data from the `boot` package: (the `boot` package comes with base R distribution  so there is no need to download this)

```{r}
#| label: nitrofendat
data(nitrofen, package="boot")
head(nitrofen)
```

We need to rearrange the data to have one response value per line:

```{r}
lnitrofen = data.frame(conc = rep(nitrofen$conc,each=3),
  live = as.numeric(t(as.matrix(nitrofen[,2:4]))),
  id = rep(1:50,each=3),
  brood = rep(1:3,50))
head(lnitrofen)
```

Make a plot of the data:

```{r}
#| label: fig-nitrodat
#| fig-cap: "The number of live offspring varies with the concentration of Nitrofen and the brood number."
lnitrofen$jconc <- lnitrofen$conc + rep(c(-10,0,10),50)
lnitrofen$fbrood = factor(lnitrofen$brood)
ggplot(lnitrofen, aes(x=jconc,y=live, shape=fbrood, color=fbrood)) + 
       geom_point(position = position_jitter(w = 0, h = 0.5)) + 
       xlab("Concentration") + labs(shape = "Brood")
```

# Model

Since the response is a small count, a Poisson model is a natural
choice. We expect the rate of the response to vary with the brood and
concentration level. The plot of the data suggests these two predictors
may have an interaction. The three observations for a single flea are
likely to be correlated. We might expect a given flea to tend to produce
more, or less, offspring over a lifetime. We can model this with an
additive random effect. The linear predictor is:

```{math}
\eta_i = x_i^T \beta + u_{j(i)}, \quad i=1, \dots, 150. \quad j=1, \dots 50,
```

where $x_i$ is a vector from the design matrix encoding the information
about the $i^{th}$ observation and $u_j$ is the random affect associated
with the $j^{th}$ flea. The response has distribution
$Y_i \sim Poisson(\exp(\eta_i))$.

# LME4

We fit a model using penalized quasi-likelihood
(PQL) using the `lme4` package:
```{r}
library(lme4)
lnitrofen$brood = factor(lnitrofen$brood)
glmod <- glmer(live ~ I(conc/300)*brood + (1|id), nAGQ=25, 
             family=poisson, data=lnitrofen)
summary(glmod, correlation = FALSE)
```
We scaled the concentration by dividing by 300 (the maximum value is
310) to avoid scaling problems encountered with `glmer()`. This is
helpful in any case since it puts all the parameter estimates on a
similar scale. The first brood is the reference level so the slope for
this group is estimated as $-0.0437$ and is not statistically
significant, confirming the impression from the plot. We can see that
numbers of offspring in the second and third broods start out
significantly higher for zero concentration of the herbicide, with
estimates of $1.1688$ and $1.3512$. But as concentration increases, we
see that the numbers decrease significantly, with slopes of $-1.6730$
and $-1.8312$ relative to the first brood. The individual SD is
estimated at $0.302$ which is noticeably smaller than the estimates
above, indicating that the brood and concentration effects outweigh the
individual variation.

We would need to work harder to test the significance of the effects as
`lmerTest` is of no help for GLMMs. We do have the z-tests but in this
example, they do not give us the tests we want.

# NLME

Cannot fit GLMMs (other than normal response).

# MMRM

Cannot fit GLMMs (other than normal response).

# GLMMTMB

See the discussion for the [single random effect example](pulpfreq.md#GLMMTMB)
for some introduction.

```{r}
library(glmmTMB)
```

We can fit the model with:

```{r}
gtmod <- glmmTMB(live ~ I(conc/300)*brood + (1|id), 
             family=poisson, data=lnitrofen)
summary(gtmod)
```

Almost the same output as with `lme4` although a different optimizer
has been used for the fit.


# Discussion

Only `lme4` and `glmmTMB` can manage this.

# Package version info


```{r}
sessionInfo()
```