summary(glmerfit)
etc.? Are they
reliable?lme4
display denominator degrees of freedom/p
values? What other options do I have?This is an informal FAQ list for the r-sig-mixed-models
mailing list.
The most commonly used functions for mixed modeling in R are
aov()
,
nlme::lme
1, lme4::lmer
;
brms::brm
MASS::glmmPQL
, lme4::glmer
;
glmmTMB
MCMCglmm::MCMCglmm
;
brms::brm
nlme::nlme
,
lme4::nlmer
; brms::brm
brms::brm
Another quick-and-dirty way to search for mixed-model related packages on CRAN:
grep("l.?m[me][^t]",rownames(available.packages()),value=TRUE)
## [1] "blmeco" "buildmer" "cellVolumeDist"
## [4] "climenv" "climextRemes" "curtailment"
## [7] "glmertree" "glmm.hp" "glmmEP"
## [10] "glmmfields" "glmmLasso" "glmmML"
## [13] "glmmPen" "glmmrBase" "glmmrOptim"
## [16] "glmmSeq" "glmmTMB" "jlmerclusterperm"
## [19] "lamme" "limexhub" "lme4"
## [22] "lmeInfo" "lmeresampler" "lmerPerm"
## [25] "lmerTest" "lmeSplines" "lmmot"
## [28] "lmmpar" "lrmest" "lsmeans"
## [31] "mailmerge" "mlmm.gwas" "multilevelmediation"
## [34] "mvglmmRank" "nlmeU" "nlmeVPC"
## [37] "palmerpenguins" "plsmmLasso" "SherlockHolmes"
## [40] "tglkmeans" "trouBBlme4SolveR" "vagalumeR"
## [43] "vglmer"
There are some false positives here
(e.g. palmerpenguins
); see here if you’re interested in “regex
golf”.
[email protected]
[lme4]
tag).DISCLAIMERS:
The following formula extensions for specifying random-effects structures in R are used by
lme4
nlme
(nested effects only, although crossed effects can
be specified with more work)glmmADMB
and glmmTMB
MCMCglmm
uses a different specification, inherited from
AS-REML.
(Modified from Robin Jeffries, UCLA:)
formula | meaning |
---|---|
(1|group) |
random group intercept |
(x|group) =
(1+x|group) |
random slope of x within group with correlated intercept |
(0+x|group) =
(-1+x|group) |
random slope of x within group: no variation in intercept |
(1|group) + (0+x|group) |
uncorrelated random intercept and random slope within group |
(1|site/block) =
(1|site)+(1|site:block) |
intercept varying among sites and among blocks within sites (nested random effects) |
site+(1|site:block) |
fixed effect of sites plus random variation in intercept among blocks within sites |
(x|site/block) =
(x|site)+(x|site:block) =
(1 + x|site)+(1+x|site:block) |
slope and intercept varying among sites and among blocks within sites |
(x1|site)+(x2|block) |
two different effects, varying at different levels |
x*site+(x|site:block) |
fixed effect variation of slope and intercept varying among sites and random variation of slope and intercept among blocks within sites |
(1|group1)+(1|group2) |
intercept varying among crossed random effects (e.g. site, year) |
Or in a little more detail:
equation | formula |
---|---|
\(β_0 + β_{1}X_{i} + e_{si}\) | n/a (Not a mixed-effects model) |
\((β_0 + b_{S,0s}) + β_{1}X_i + e_{si}\) | ∼ X + (1∣Subject) |
\((β_0 + b_{S,0s}) + (β_{1} + b_{S,1s}) X_i + e_{si}\) | ~ X + (1 + X∣Subject) |
\((β_0 + b_{S,0s} + b_{I,0i}) + (β_{1} + b_{S,1s}) X_i + e_{si}\) | ∼ X + (1 + X∣Subject) + (1∣Item) |
As above, but \(S_{0s}\), \(S_{1s}\) independent | ∼ X + (1∣Subject) + (0 + X∣ Subject) + (1∣Item) |
\((β_0 + b_{S,0s} + b_{I,0i}) + β_{1}X_i + e_{si}\) | ∼ X + (1∣Subject) + (1∣Item) |
\((β_0 + b_{I,0i}) + (β_{1} + b_{S,1s})X_i + e_{si}\) | ∼ X + (0 + X∣Subject) + (1∣Item) |
Modified from: http://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet?lq=1 (Livius)
The magic development version of the equatiomatic
package can handle mixed models
(remotes::install_github("datalorax/equatiomatic")
),
e.g.
library(lme4)
library(equatiomatic)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
equatiomatic::extract_eq(fm1)
\[ \begin{aligned} \operatorname{Reaction}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{Days}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\mu_{\alpha_{j}} \\ &\mu_{\beta_{1j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for Subject j = 1,} \dots \text{,J} \end{aligned} \]
It doesn’t handle GLMMs (yet), but you could fit two fake models — one LMM like your GLMM but with a Gaussian response, and one GLM with the same family/link function as your GLMM but without the random effects — and put the pieces together.
More possibly useful links:
This is in general a far more difficult question than it seems on the surface. There are many competing philosophies and definitions. For example, from Gelman (2005):
Before discussing the technical issues, we briefly review what is meant by fixed and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics.
Another useful comment (via Kevin Wright) reinforcing the idea that “random vs. fixed” is not a simple, cut-and-dried decision: from Schabenberger and Pierce (2001), p. 627:
Before proceeding further with random field linear models we need to remind the reader of the adage that one modeler’s random effect is another modeler’s fixed effect.
Clark and Linzer (2015) address this question from a mostly econometric perspective, focusing mostly on practical variance/bias/RMSE criteria.
One point of particular relevance to ‘modern’ mixed model estimation (rather than ‘classical’ method-of-moments estimation) is that, for practical purposes, there must be a reasonable number of random-effects levels (e.g. blocks) – more than 5 or 6 at a minimum. This is not surprising if you consider that random effects estimation is trying to estimate an among-block variance. For example, from Crawley (2002) p. 670:
Are there enough levels of the factor in the data on which to base an estimate of the variance of the population of effects? No, means [you should probably treat the variable as] fixed effects.
Some researchers (who treat fixed vs random as a philosophical rather than a pragmatic decision) object to this approach.
Also see a very thoughtful chapter in Hodges (2016).
Treating factors with small numbers of levels as random will in the best case lead to very small and/or imprecise estimates of random effects; in the worst case it will lead to various numerical difficulties such as lack of convergence, zero variance estimates, etc.. (A small simulation exercise shows that at least the estimates of the standard deviation are downwardly biased in this case; it’s not clear whether/how this bias would affect the point estimates of fixed effects or their estimated confidence intervals.) In the classical method-of-moments approach these problems may not arise (because the sums of squares are always well defined as long as there are at least two units), but the underlying problems of lack of power are there nevertheless.
Thierry Onkelinx has a blog post with some simulations on the impact of the number of levels and concludes with a few recommendations for the number of levels of the grouping variable \(n_s\): > - get \(n_s > 1000\) levels when an accurate estimate of the random effect variance is crucial. E.g. when a single number will be use for power calculations. > - get \(n_s > 100\) levels when a reasonable estimate of the random effect variance is sufficient. E.g. power calculations with sensitivity analysis of the random effect variance. > - get \(n_s > 20\) levels for an experimental study > - in case \(10 < n_s <20\) you should validate the model very cautious before using the output > - in case \(n_s < 10\) it is safer to use the variable as a fixed effect.
Oberpriller, Leite, and Pichler (2021) also performed a simulation study and found that while the estimates are similar for treating a variable with a small number of levels as fixed or random are similar, there was an impact on Type 1 and Type 2 error rates. They also found that the precise random effects structure (e.g., inclusion of random slopes) had a large impact on these properties.
Also see this thread on the r-sig-mixed-models mailing list and this question on CrossValidated.
lme4
does handled crossed effects, efficientlynlme
offers (e.g. heteroscedasticity of
residuals via weights
/varStruct
, correlation
of residuals via correlation
/corStruct
, or if
you want to used crossed REs with the gamlss
package, see
p. 163ff of Pinheiro and Bates (2000)
(section 4.2.2: Google books
link). I give a worked example here.
As far as I can tell, a couple of hacks are necessary to get this to
work: (1) the data must be expressed as a groupedData
object (at least, I haven’t managed to get it to work in any other way);
(2) the crossed effects must be nested within another grouping
factor - in the example here I define a dummy group, which is
awkward (it makes the variance component for this group and the residual
variance jointly unidentifiable), but otherwise seems to work OK.response~treatment+(1|block)
in lme4
. Also, in
the case of fixed effects, crossed and nested specifications change the
parameterization of the model, but not anything else (e.g. the number of
parameters estimated, log-likelihood, model predictions are all
identical). That is, in R’s model.matrix
function (which
implements a version of Wilkinson-Rogers notation) a*b
and
a/b
(which expand to 1+a+b+a:b
and
1+a+a:b
respectively) give model matrices with the same
number of columns.(1|a/b)
(or (1|a)+(1|a:b)
) and
(1|a)+(1|b)
are equivalent. If the lower-level random
effect has the same labels within each larger group (e.g. blocks 1, 2,
3, 4 within sites A, B, and C) then the explicit nesting
(1|a/b)
is required. It seems to be considered best
practice to code the nested level uniquely (e.g. A1, A2, …, B1, B2, …)
so that confusion between nested and crossed effects is less
likely.glmmTMB
(e.g. zero-inflation/variable dispersion), where it’s less clear what to
measure to estimate overdispersion.The following function should work for a variety of model types (at
least glmmADMB
, glmmTMB
, lme4
,
…).
overdisp_fun <- function(model) {
rdf <- df.residual(model)
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
Example:
library(lme4)
library(glmmTMB)
set.seed(101)
d <- data.frame(x=runif(1000),
f=factor(sample(1:10,size=1000,replace=TRUE)))
suppressMessages(d$y <- simulate(~x+(1|f), family=poisson,
newdata=d,
newparams=list(theta=1,beta=c(0,2)))[[1]])
m1 <- glmer(y~x+(1|f),data=d,family=poisson)
overdisp_fun(m1)
## chisq ratio rdf p
## 1035.9966326 1.0391140 997.0000000 0.1902294
m2 <- glmmTMB(y~x+(1|f),data=d,family="poisson")
overdisp_fun(m2)
## chisq ratio rdf p
## 1035.9961394 1.0391135 997.0000000 0.1902323
The gof
function in the aods3
provides
similar functionality (it reports both deviance- and \(\chi^2\)-based estimates of overdispersion
and tests).
quasilikelihood estimation: MASS::glmmPQL.
Quasi- was deemed unreliable in lme4
, and is no longer
available. (Part of the problem was questionable numerical results in
some cases; the other problem was that DB felt that he did not have a
sufficiently good understanding of the theoretical framework that would
explain what the algorithm was actually estimating in this case.) geepack::geelgm
may be workable (haven’t tried it)
If you really want quasi-likelihood analysis for glmer
fits, you can do it yourself by adjusting the coefficient table - i.e.,
by multiplying the standard error by the square root of the dispersion
factor 2
and recomputing the \(Z\)- and \(p\)-values accordingly, as
follows:
## extract summary table; you may also be able to do this via
## broom::tidy or broom.mixed::tidy
quasi_table <- function(model,ctab=coef(summary(model)),
phi=overdisp_fun(model)["ratio"]) {
qctab <- within(as.data.frame(ctab),
{ `Std. Error` <- `Std. Error`*sqrt(phi)
`z value` <- Estimate/`Std. Error`
`Pr(>|z|)` <- 2*pnorm(abs(`z value`), lower.tail=FALSE)
})
return(qctab)
}
printCoefmat(quasi_table(m1),digits=3)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2277 0.2700 0.84 0.4
## x 2.0640 0.0528 39.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## to use this with glmmTMB, we need to separate out the
## conditional component of the summary
printCoefmat(quasi_table(m2,
ctab=coef(summary(m2))[["cond"]]),
digits=3)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2277 0.2700 0.84 0.4
## x 2.0640 0.0528 39.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Another version, this one tidyverse-centric:
library(broom.mixed)
library(dplyr)
tidy_quasi <- function(model, phi=overdisp_fun(model)["ratio"],
conf.level=0.95) {
tt <- (tidy(model, effects="fixed")
%>% mutate(std.error=std.error*sqrt(phi),
statistic=estimate/std.error,
p.value=2*pnorm(abs(statistic), lower.tail=FALSE))
)
return(tt)
}
tidy_quasi(m1)
## # A tibble: 2 × 6
## effect term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 fixed (Intercept) 0.228 0.270 0.843 0.399
## 2 fixed x 2.06 0.0528 39.1 0
tidy_quasi(m2)
## # A tibble: 2 × 7
## effect component term estimate std.error statistic p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 fixed cond (Intercept) 0.228 0.270 0.843 0.399
## 2 fixed cond x 2.06 0.0528 39.1 0
These functions make some simplifying assumptions: (1) this overdispersion computation is approximate (based on Pearson \(\chi^2\), see caveats above); (2) considers Gaussian sampling distributions only (i.e. no denominator-degree-of-freedom/\(t\) corrections).
In this case using quasi-likelihood doesn’t make much difference, since the data we simulated in the first place were Poisson.) Keep in mind that once you switch to quasi-likelihood you will either have to eschew inferential methods such as the likelihood ratio test, profile confidence intervals, AIC, etc., or make more heroic assumptions to compute “quasi-” analogs of all of the above (such as QAIC).
lme4::glmer.nb()
should fit a negative binomial,
although it is somewhat slow and fragile compared to some of the other
methods suggested here. lme4
cannot fit beta-binomial
models (these cannot be formulated as a part of the exponential family
of distributions)family="nbinom2"
gives the classic parameterization with
\(\sigma^2=\mu(1+\mu/k)\) (“NB2” in
Hardin and Hilbe’s terminology) while family="nbinom1"
gives a parameterization with \(\sigma^2=\phi
\mu\), \(\phi>1\) (“NB1” to
Hardin and Hilbe). The latter might also be called a “quasi-Poisson”
parameterization because it matches the mean-variance relationship
assumed by quasi-Poisson models, i.e. the variance is strictly
proportional to the mean (although the proportionality constant must be
>1, a limitation that does not apply to quasi-likelihood approaches).
(glmmADMB will also
fit these models, with family="nbinom"
for NB2, but is
deprecated in favour of glmmTMB.)glmmTMB
allows beta-binomial models ((Harrison 2015) suggests comparing beta-binomial
with OLRE models to assess reliability)brms
package has a negbinomial
family
(no beta-binomial, but it does have a wide range of other families)Negative binomial models in glmmTMB
and
lognormal-Poisson models in glmer
(or
MCMCglmm
) are probably the best quick alternatives for
overdispersed count data. If you need to explore alternatives (different
variance-mean relationships, different distributions), then
ADMB
, TMB
, WinBUGS
,
Stan
, NIMBLE
are the most flexible
alternatives.
Underdispersion (much less variability than expected) is a less common problem than overdispersion.
ordinal
package, or MCMCglmm
or
brms
for Bayesian solutions)glmmTMB
, can handle underdispersion (J.
Hilbe recommends the latter in this
CrossValidated answer). (VGAM
has a generalized Poisson
distribution, but doesn’t handle random effects.)While one (well, OK I) would naively think that GLMMs with Gamma
distributions would be just as easy (or hard) as any other sort of
GLMMs, it seems that they are in fact harder to implement. Basic
simulated examples of Gamma GLMMs can fail in lme4 despite analogous
problems with Poisson, binomial, etc. distributions. Solutions: - the
default inverse link seems particularly problematic; try other links
(especially family=Gamma(link="log")
) if that is
possible/makes sense - consider whether a lognormal model (i.e. a
regular LMM on logged data) would work/makes sense. - Lo and Andrews (2015) argue that the Gamma
family with an identity link is superior to lognormal models
for reaction-time data. I (BMB) don’t find their argument particularly
convincing, but lots of people want to do this. Unfortunately this is
technically challenging (see here), because it is
likely that some “illegal” values (predicted responses \(\le 0\)) will occur while fitting the
model, even if the final fitted model makes no impossible predictions.
Thus something has to be done to make the model-fitting machinery
tolerant of such values (i.e. returning NA
for these model
evaluations, or clamping illegal values to the constrained space with an
appropriate smooth penalty function).
Gamma models can be fitted by a wide variety of platforms
(lme4::glmer
, MASS::glmmPQL
,
glmmADMB
, glmmTMB
,
MixedModels.jl
, MCMCglmm
, brms
…
not sure about others.
Proportion data where the denominator (e.g. maximum possible number
of successes for a given observation) is not known can be modeled using
a Beta distribution. Smithson and Verkuilen
(2006) is a good introduction for non-statisticians (not
in the mixed-model case), and the betareg
package (Cribari-Neto and Zeileis 2009) handles
non-mixed Beta regressions. The glmmTMB
and
brms
packages handle Beta mixed models (brms
also handles zero-inflated and zero-one inflated models).
See e.g. Martin et al. (2005) or Warton (2005) (“many zeros does not mean zero inflation”) or Zuur et al. (2009a) for general information on zero-inflation.
MCMCglmm
handles zero-truncated, zero-inflated, and
zero-altered models, although specifying the models is a little bit
tricky: see Sections 5.3 to 5.5 of the CourseNotes
vignetteglmmADMB
handles
glmmTMB
handles a variety of Z-I and Z-T models (allows
covariates, and random effects, in the zero-alteration model)brms
does tooGLMMadaptive
mgcv::gam()
can do simple mixed
models (Poisson, not NB) with zero-inflation, and comparing
mgcv
with glmmTMB
resultsgamlssNP
in the gamlss.mx
package should
handle zero-inflation, and the gamlss.tr
package should
handle truncated (i.e. hurdle) models – but I haven’t tried themContinuous data are a special case where the mixture model for zero-inflated data is less relevant, because observations that are exactly zero occur with probability (but not probability density) zero. There are two cases of interest:
In this case zero is a problematic observation for the distribution; it’s either impossible or infinitely (locally) likely. Some examples:
The best solution depends very much on the data-generating mechanism.
brms
and
GLMMadaptive
both provide support for censored data in
mixed models.cplm
and glmmTMB
packages handles
‘Tweedie compound Poisson linear models’, which in a particular range of
parameters allows for skewed continuous responses with a spike at
zeroIn this case (e.g. a spike of zeros in the center of an otherwise continuous distribution), the hurdle model probably makes the most sense.
simulate()
method if it exists to construct a
simulated distribution of the proportion of zeros expected overall from
your model, and compare it to the observed proportion of zeros in the
data setperformance
package compares expected
vs. observed to see if they are within some specified threshold (by
default 0.05 - i.e. the function returns under- or overinflation if the
observed number of zeros is more than 5% different from the expected
number based on the predicted probabilities of zero for each
observation. It does not try to give confidence
intervals or a p-value for these probabilities (this would be possible
if we assume there is no uncertainty in the estimated parameters, but
would be harder otherwise …)In nlme
these so-called R-side (R for
“residual”) structures are accessible via the
weights
/VarStruct
(heteroscedasticity) and
correlation
/corStruct
(spatial or temporal
correlation) arguments and data structures. This extension is a bit
harder than it might seem. In LMMs it is a natural extension to allow
the residual error terms to be components of a single multivariate
normal draw; if that MVN distribution is uncorrelated and homoscedastic
(i.e. proportional to an identity matrix) we get the classic model, but
we can in principle allow it to be correlated and/or
heteroscedastic.
It is not too hard to define marginal correlation structures that don’t make sense. One class of reasonably sensible models is to always assume an observation-level random effect (as MCMCglmm does for computational reasons) and to allow that random effect to be MVN on the link scale (so that the full model is lognormal-Poisson, logit-normal binomial, etc., depending on the link function and family).
For example, a relatively simple Poisson model with spatially correlated errors might look like this:
\[ \begin{split} \eta & \sim \textrm{MVN}(a + b x, \Sigma) \\ \Sigma_{ij} & = \sigma^2 \exp(-d_{ij}/s) \\ y_i & \sim \textrm{Poisson}(\lambda=\exp(\eta_i)) \end{split} \]
That is, the marginal distributions of the response values are Poisson-lognormal, but on the link (log) scale the latent Normal variables underlying the response are multivariate normal, with a variance-covariance matrix described by an exponential spatial correlation function with scale parameter \(s\).
How can one achieve this?
lme4
, for
either LMMs or GLMMs; they are fairly low priority, and it is hard to
see how they could be implemented for GLMMs (the equivalent for LMMs is
tedious but should be straightforward to implement).library(sos)
findFn("corStruct")
finds additional possibilities in the ramps
(extended
geostatistical) and ape
(phylogenetic) packages.
MASS::glmmPQL
(see Dormann et al.)glmmADMB
package (and may not be for a while!), but it would be possible to run
AD Model Builder via the R2admb package (requires installing – and
learning! ADMB)Complete separation occurs in a binary-response model when there is some linear combination of the parameters that perfectly separates failures from successes - for example, when all of the observations are zero for some particular combination of categories. The symptoms of this problem are unrealistically large parameter estimates; ridiculously large Wald standard errors (the Hauck-Donner effect); and various warnings.
In particular, binomial glmer()
models with complete
separation can lead to “Downdated VtV is not positive definite”
(e.g. see here) or
“PIRLS step-halvings failed to reduce deviance in pwrssUpdate” errors
(e.g. see here).
Roughly speaking, the complete separation is likely to appear even if
one considers only the fixed effects part of the model (counterarguments
or counterexamples welcome!), suggesting two quick-and-dirty diagnostic
methods. If fixed_form
is the formula including only the
fixed effects:
summary(g1 <- glm(fixed_form, family=binomial, data=...))
will show one or more of the following symptoms:
glm.fit: fitted probabilities numerically 0 or 1 occurred
any(abs(g1$coefficients)>8)
, assuming that
predictors are either categorical or scaled to have standard deviations
of \(\approx 1\))detectseparation
package has a method for detecting
complete separation:
library("detectseparation"); update(g1,method="detect_separation")
.
This should say whether complete separation occurs, and in which
(combinations of) variables, e.g.Separation: TRUE
Existence of maximum likelihood estimates
(Intercept) height
Inf Inf
0: finite value, Inf: infinity, -Inf: -infinity
If complete separation is occurring between categories of a single categorical fixed-effect predictor with a large number of levels, one option would be to treat this fixed effect as a random effect, which will allow some degree of shrinkage to the mean. (It might be reasonable to specify the variance of this term a priori to a large value [minimal shrinkage], rather than trying to estimate it from the data.)
(TODO: worked example)
The general approach to handling complete separation in logistic
regression is called penalized regression; it’s available in
the brglm
, brglm2
, logistf
, and
rms
packages. However, these packages don’t handle mixed
models, so the best available general approach is to use a
Bayesian method that allows you to set a prior on the fixed effects,
e.g. a Gaussian with standard deviation of 3; this can be done in any of
the Bayesian GLMM packages (e.g. blme
,
MCMCglmm
, brms
, …) (See supplementary
material for Fox et al. 2016 for a worked example.)
I’m not aware of easy ways to fit mixed models with non-Gaussian
random effects distributions in R (i.e., convenient, flexible,
well-tested implementations). McCulloch and
Neuhaus (2011) discusses when this misspecification may be
important. This
presentation discusses various approaches to solving the problem
(e.g. using a Gamma rather than a Normal distribution of REs in log-link
models). The spaMM
package implements H-likelihood models
(Lee, Nelder, and Pawitan 2017), and
claims to allow a range of random-effects distributions (perhaps not
well tested though …)
In principle you can implement any random-effects distribution you
want in a fully capable Bayesian modeling language
(e.g. JAGS/Stan/PyMC/etc.); see e.g. this
StackOverflow answer, which uses the rethinking
package’s interface to Stan.
(adapted from Bolker et al TREE 2009)
Method | Advantages | Disadvantages | Packages |
---|---|---|---|
Penalized quasi-likelihood | Flexible, widely implemented | Likelihood inference may be inappropriate; biased for large variance or small means | PROC GLIMMIX (SAS), GLMM (GenStat), glmmPQL (R:MASS), ASREML-R |
Laplace approximation | More accurate than PQL | Slower and less flexible than PQL | glmer (R:lme4,lme4a), glmm.admb (R:glmmADMB), INLA, glmmTMB, AD Model Builder, HLM |
Gauss-Hermite quadrature | More accurate than Laplace | Slower than Laplace; limited to 2‑3 random effects | PROC NLMIXED (SAS), glmer (R:lme4, lme4a), glmmML (R:glmmML), xtlogit (Stata) |
Markov chain Monte Carlo | Highly flexible, arbitrary number of random effects; accurate | Slow, technically challenging, Bayesian framework | MCMCglmm (R:MCMCglmm), rstanarm (R), brms (R), MCMCpack (R), WinBUGS/OpenBUGS (R interface: BRugs/R2WinBUGS), JAGS (R interface: rjags/R2jags), AD Model Builder (R interface: R2admb), glmm.admb (post hoc MCMC after Laplace fit) (R:glmmADMB) |
These approaches (PQL, Laplace approximation, GHQ, MCMC) are most
common. Other less-common methods include Monte Carlo EM (Booth and Hobert 1999; Knudson et al. 2021)
(glmm
package), hierarchical GLMs (which use an entirely
different inference framework: (Jin and Lee 2021;
Meng 2009, 2011)). Also see the Mixed
Models Task View.
scale()
)optim
,
nlminb()
, …). While this will of course be slow for large
fits, we consider it the gold standard; if all optimizers converge to
values that are practically equivalent (it’s up to the user to decide
what “practically equivalent means for their case”), then we would
consider the model fit to be good enough. For example:modelfit.all <- lme4::allFit(model)
ss <- summary(modelfit.all)
Most of the current advice about troubleshooting lme4
convergence problems can be found in the help page
?convergence
. That page explains that the convergence tests
in the current version of lme4
(1.1-11, February 2016)
generate lots of false positives. We are considering raising the
gradient warning threshold to 0.01 in future releases of
lme4
. In addition to the general troubleshooting tips
above:
getME(model,c("theta","beta"))
should retrieve the
parameters in a form suitable to be used as the start
parameter)offset(effort)
rather than
offset(log(effort))
. While the intention is to fit a model
where \(\textrm{counts} \propto
\textrm{effort}\), specifying offset(effort)
leads
to a model where \(\textrm{counts} \propto
\exp(\textrm{effort})\) instead; exp(effort)
is
often a huge (and model-destabilizing) number.It is very common for overfitted mixed models to result in singular fits. Technically, singularity means that the random effects variance-covariance matrix is of less than full rank. There are various ways to describe this, from more to less technical:
some of the eigenvalues of the covariance matrix are zero, or effectively zero;
some combinations of the elements of the random-effects vector are perfectly multicollinear;
some linear combinations of elements of the random-effects vector have zero variance;
an \(n \times n\) covariance matrix corresponds to an \(n\)-dimensional ellipsoid where the lengths of the major axes are proportional to the eigenvalues; the ellipsoid is “flat” in some directions, e.g. an ellipse has collapsed to a line segment
In simple cases where a random effect term is represented by a
single variance (scalar random effects), this is reflected in a
variance estimate that is zero or near zero. Functions such as
nlme::lme()
or glmmTMB()
that estimate
variances on the log scale will often not report a singular
fit, but will instead return a very small value (1e-6 or less) for the
random-effects variance; on the log scale, this will correspond to a
parameter estimate that is a large negative number — and, usually,
warnings about non-positive-definite Hessians or (in the case of
lme()
) ridiculously large Wald confidence intervals
returned by intervals()
.
In the case of a two-dimensional random effect (such as a random-slopes model), this typically corresponds to a perfect (+/- 1) correlation between the slope and intercept
in higher-dimensional random effects (such as the random effect
of a categorical variable with more than two levels, or a random-slopes
model with more than one covariate), it’s pretty much impossible to see
at a glance that the covariance matrix is singular. Extracting the RE
covariance matrix and computing its eigenvalues (this is what
rePCA
in the lme4
package does) will tell you.
In the particular case of lme4
, singularity is detectable
by seeing if any of the elements of the \(\boldsymbol \theta\) (variance-covariance
Cholesky decomposition) vector corresponding to diagonal elements are
(near) zero; this is what ?isSingular
does.
Singular fits commonly occur in two scenarios:
small numbers of random-effect levels (e.g. <5), as illustrated in these simulations and discussed (in a somewhat different, Bayesian context) by Gelman (2006).
complex random-effects models, e.g. models of the form
(f|g)
where f
is a categorical variable with a
relatively large number of levels, or models with several different
random-slopes terms.
In MCMCglmm
, singular or near-singular fits will
provoke an error and a requirement to specify a stronger prior.
At present there are a variety of strong opinions about how to resolve such problems, which are sometimes conflated with the general problem of how to decide on the appropriate complexity of the random-effects component of a model. Briefly:
blme
package (Chung et al.
2013) provides a wrapper for the lme4
machinery that
adds a particular form of weak prior to get an approximate a Bayesian
maximum a posteriori estimate that avoids singularity.MCMCglmm
package allows for priors on the
variance-covariance matrixrstanarm
and brms
packages provide
wrappers for the Stan Hamiltonian MCMC engine that fit GLMMs via
lme4
syntax, again allowing a variety of priors to be
set.For some problems it would be convenient to be able to set the
residual variance term to zero, or a fixed value. This is difficult in
lme4
, because the model is parameterized internally in such
a way that the residual variance is profiled out (i.e., calculated
directly from a residual deviance term) and the random-effects variances
are scaled by the residual variance.
Searching the r-sig-mixed-models list for “fix residual variance”
metafor
package, for meta-analytic
modelsblme
package to fix the residual
variance: from Vincent Dorie,library(blme)
blmer(formula = y ~ 1 + (1 | group), weights = V,
resid.prior = point(1.0), cov.prior = NULL)
This sets the residual variance to 1.0. You cannot use this
to make it exactly zero, but you can make it very small (and experiment
with setting it to different small values, e.g. 0.001 vs 0.0001, to see
how sensitive the results are). - Similarly, you can fix the residual
variance to a small positive value in [n]lme
via the
control()
argument (Heisterkamp et
al. 2017):
nlme::lme(Reaction~Days,random=~1|Subject,
data=lme4::sleepstudy,
control=list(sigma=1e-8))
glmmTMB
package can set the residual variance to
(approximately) zero, by specifying dispformula = ~0
(in
fact the value can be set via
glmmTMBControl(zerodisp_val=...)
; the default value is
log(sqrt(.Machine$double.eps))
)lme4
’s internal devfun2()
function (used in
the likelihood profiling computation for LMMs), which uses a specified
value of the residual standard deviation in computing likelihood, but as
Bates, Mächler, et al. (2015) say:The resulting function is not useful for general nonlinear optimization — one can easily wander into parameter regimes corresponding to infeasible (non-positive semidefinite) variance-covariance matrices — but it serves for likelihood profiling, where one focal parameter is varied at a time and the optimization over the other parameters is likely to start close to an optimum.
lme4
error messagesMost of the following error messages are relatively unusual, and happen mostly with complex/large/unstable models. There is often no simple fix; the standard suggestions for troubleshooting are (1) try rescaling and/or centering predictors; (2) see if a simpler model can be made to work; (3) look for severe lack of balance and/or complete separation in the data set.
PIRLS step-halvings failed to reduce deviance in pwrssUpdate
lme4
to fit GLMMs with link functions that
do not automatically constrain the response to the allowable range of
the distributional family (e.g. binomial models with a log link, where
the estimated probability can be >1, or inverse-Gamma models, where
the estimated mean can be negative), it is not unusual to get this
error. This occurs because lme4
doesn’t do anything to
constrain the predicted values, so NaN
values pop up, which
aren’t handled gracefully. If possible, switch to a link function to one
that constrains the response (e.g. logit link for binomial or log link
for Gamma).Downdated VtV is not positive definite
: no specific
advice, see general suggestions aboveconvergence code 3 from bobyqa: bobyqa -- a trust region step failed to reduce q
:
again no specific advice about fixing this, although there is a useful
discussion of the meaning of the error message on
CrossValidatedREML=TRUE
with glmer
will produce the warning
extra argument(s) ‘REML’ disregarded
glmmTMB
allows REML=TRUE
for GLMMs (it
uses the Laplace approximation to integrate over the fixed effect
parameters), since version 0.2.2summary(glmerfit)
etc.?
Are they reliable?By default, in keeping with the tradition in analysis of generalized
linear models, lme4
and similar packages display the Wald
Z-statistics for each parameter in the model summary. These have one big
advantage: they’re convenient to compute. However, they are asymptotic
approximations, assuming both that (1) the sampling distributions of the
parameters are multivariate normal (or equivalently that the
log-likelihood surface is quadratic) and that (2) the sampling
distribution of the log-likelihood is (proportional to) \(\chi^2\). The second approximation is
discussed further under “Degrees of freedom”. The first assumption
usually requires an even greater leap of faith, and is known to cause
problems in some contexts (for binomial models failures of this
assumption are called the Hauck-Donner effect), especially with
extreme-valued parameters.
From worst to best:
anova
or
drop1
, or via computing likelihood profilesFrom worst to best:
car::Anova
)anova
or
drop1
)pbkrtest
package: see notes on K-R
etc below.lme4
display denominator degrees of
freedom/p values? What other options do I have?There is an R FAQ entry on this topic, which links to a mailing list post by Doug Bates (there is also a voluminous mailing list thread reproduced on the R wiki). The bottom line is
lmerTest
and
Kenward-Roger in pbkrtest
(as KRmodcomp
)
(various packages such as lmerTest
, emmeans
,
car
, etc., import pbkrtest::get_Lb_ddf
).
K-R is probably the most reliable option (Schaalje, McBride, and Fellingham 2002), although it may be prohibitively computationally expensive for large data sets.
K-R was derived for LMMs (and for REML?) in particular, it isn’t clear how it would apply to GLMMs. Walter W. Stroup (2014) states (referencing W. W. Stroup (2013)) that K-R actually works reasonably well for GLMMs (K-R is not implemented in R for GLMMs; Stroup suggests that a pseudo-likelihood (Wolfinger and O’Connell 1993) approach is necessary in order to implement K-R for GLMMs):
Notice the non-integer values of the denominator df. They, and the \(F\) and \(p\) values, reflect the procedure developed by Kenward and Roger (2009) to account for the effect of the covariance structure on degrees of freedom and standard errors. Although the Kenward–Roger adjustment was derived for the LMM with normally distributed data and is an ad hoc procedure for GLMMs with non-normal data, informal simulation studies consistently have suggested that the adjustment is accurate. The Kenward-Roger adjustment requires that the SAS GLIMMIX default computing algorithm, pseudo-likelihood, be used rather than the Laplace algorithm used to obtain AICC statistics. Stroup (2013b) found that for binomial and Poisson GLMMs, pseudo-likelihood with the Kenward–Roger adjustment yields better Type I error control than Laplace while preserving the GLMM’s advantage with respect to power and accuracy in estimating treatment means.
cond
package (on CRAN) [which works
with GLMs, not GLMMs]), but it’s rarely used. (The bias correction/Firth
approach implemented in the brglm
package attempts to
address the problem of finite-size bias, not finite-size
non-chi-squaredness of the deviance differences.)anova(...,test="F")
for
quasi-likelihood models)nlme
rules approximately
equivalent to SAS ‘inner-outer’/‘within-between’ rules) for GLMMs, or
(n)lme
for LMMslme
(if possible) and use the
denominator df reported there (which follow a simple ‘inner-outer’ rule
which should correspond to the canonical answer for simple/orthogonal
designs), applied to \(t\) or \(F\) tests. For the explicit specification
of the rules that lme
uses, see page 91 of Pinheiro and
Bates (this page was previously available on Google Books, but the link is no
longer useful, so here are the relevant paragraphs):These conditional tests for fixed-effects terms require denominator degrees of freedom. In the case of the conditional \(F\)-tests, the numerator degrees of freedom are also required, being determined by the term itself. The denominator degrees of freedom are determined by the grouping level at which the term is estimated. A term is called inner relative to a factor if its value can change within a given level of the grouping factor. A term is outer to a grouping factor if its value does not changes within levels of the grouping factor. A term is said to be estimated at level \(i\), if it is inner to the \(i-1\)st grouping factor and outer to the \(i\)th grouping factor. For example, the term
Machine
in thefm2Machine
model is outer toMachine %in% Worker
and inner toWorker
, so it is estimated at level 2 (Machine %in% Worker
). If a term is inner to all \(Q\) grouping factors in a model, it is estimated at the level of the within-group errors, which we denote as the \(Q+1\)st level.The intercept, which is the parameter corresponding to the column of all 1’s in the model matrices \(X_i\), is treated differently from all the other parameters, when it is present. As a parameter it is regarded as being estimated at level 0 because it is outer to all the grouping factors. However, its denominator degrees of freedom are calculated as if it were estimated at level \(Q+1\). This is because the intercept is the one parameter that pools information from all the observations at a level even when the corresponding column in \(X_i\) doesn’t change with the level.
Letting \(m_i\) denote the total number of groups in level \(i\) (with the convention that \(m_0=1\) when the fixed effects model includes an intercept and 0 otherwise, and \(m_{Q+1}=N\)) and \(p_i\) denote the sum of the degrees of freedom corresponding to the terms estimated at level \(i\), the \(i\)th level denominator degrees of freedom is defined as
\[ \mathrm{denDF}_i = m_i - (m_{i-1} + p_i), i = 1, \dots, Q \]
This definition coincides with the classical decomposition of degrees of freedom in balanced, multilevel ANOVA designs and gives a reasonable approximation for more general mixed-effects models.
Note that the implementation used in lme
gets
the wrong answer for random-slopes models:
library(nlme)
lmeDF <- function(formula=distance~age,random=~1|Subject) {
mod <- lme(formula,random,data=Orthodont)
aa <- anova(mod)
return(setNames(aa[,"denDF"],rownames(aa)))
}
lmeDF()
## (Intercept) age
## 80 80
lmeDF(random=~age|Subject) ## wrong!
## (Intercept) age
## 80 80
I (BB) have re-implemented this algorithm in a way that does slightly better for random-slopes models (but may still get confused!), see here.
source("R/calcDenDF.R")
calcDenDF(~age,"Subject",nlme::Orthodont)
## (Intercept) age
## 80 80
calcDenDF(~age,data=nlme::Orthodont,random=~1|Subject)
## (Intercept) age
## 80 80
calcDenDF(~age,data=nlme::Orthodont,random=~age|Subject) ## off by 1
## (Intercept) age
## 81 25
library(lme4)
m2 <- lmer(Reaction~Days+(1|Subject)+(0+Days|Subject),sleepstudy,REML=FALSE)
m1 <- update(m2,.~Days+(1|Subject))
m0 <- lm(Reaction~Days,sleepstudy)
anova(m2,m1,m0) ## two sequential tests
## Data: sleepstudy
## Models:
## m0: Reaction ~ Days
## m1: Reaction ~ Days + (1 | Subject)
## m2: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m0 3 1906.3 1915.9 -950.15 1900.3
## m1 4 1802.1 1814.8 -897.04 1794.1 106.214 1 < 2.2e-16 ***
## m2 5 1762.0 1778.0 -876.00 1752.0 42.075 1 8.782e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With recent versions of lme4
, goodness-of-fit (deviance)
can be compared between (g)lmer
and (g)lm
models, although anova()
must be called with the mixed
((g)lmer
) model listed first. Keep in mind that LRT-based
null hypothesis tests are conservative when the null value (such as
\(\sigma^2=0\)) is on the boundary of
the feasible space (Self and Liang 1987; Stram
and Lee 1994; Goldman and Whelan 2000); in the simplest case
(single random effect variance), the p-value is approximately twice as
large as it should be (Pinheiro and Bates
2000).
RLRsim
package, which has a fast
implementation of simulation-based tests of null hypotheses about zero
variances, for simple tests. (However, it only applies to
lmer
models, and is a bit tricky to use for more complex
models.)library(RLRsim)
## compare m0 and m1
exactLRT(m1,m0)
##
## simulated finite sample distribution of LRT. (p-value based on 10000
## simulated values)
##
## data:
## LRT = 106.21, p-value < 2.2e-16
## compare m1 and m2
mA <- update(m2,REML=TRUE)
m0B <- update(mA, . ~ . - (0 + Days|Subject))
m.slope <- update(mA, . ~ . - (1|Subject))
exactRLRT(m0=m0B,m=m.slope,mA=mA)
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 42.796, p-value < 2.2e-16
pbkrtest
package (messages and warnings
suppressed).(pb <- pbkrtest::PBmodcomp(m2,m1,seed=101))
## Bootstrap test; time: 14.57 sec; samples: 1000; extremes: 0;
## Requested samples: 1000 Used samples: 501 Extremes: 0
## large : Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
## Reaction ~ Days + (1 | Subject)
## stat df p.value
## LRT 42.075 1 8.782e-11 ***
## PBtest 42.075 0.001992 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lme4
allows for computing likelihood profiles of
variances and computing confidence intervals on their basis; these
likelihood profile confidence intervals are subject to the usual caveats
about the LRT with finite sample sizes.MCMCglmm
package, although its mode
specifications are not identical to those of lme4) will provide
posterior distributions of the variance parameters: quantiles or
credible intervals (HPDinterval()
in the coda
package) will characterize the uncertainty.[n]lme
fits contain an
element called apVar
which contains the approximate
variance-covariance matrix (derived from the Hessian, the matrix of
(numerically approximated) second derivatives of the likelihood (REML?)
at the maximum (restricted?) likelihood values): you can derive the
standard errors from this list element via
sqrt(diag(lme.obj$apVar))
. For whatever it’s worth, though,
these
estimates might not match the estimates
that SAS gives which are supposedly derived in the same way.Abandoning the approximate \(F\)/\(t\)-statistic route, one ends up with the more general problem of estimating \(p\)-values. There is a wider range of options here, although many of them are computationally intensive …
mcmcsamp
(if available for your problem: i.e. LMMs
with simple random effects – not GLMMs or complex random effects)pvals.fnc
in the languageR
package, a
wrapper for mcmcsamp)glmmADMB
package
(use the mcmc=TRUE
option) or the R2admb
package (write your own model definition in AD Model Builder), or
outside of Rsim
function from the arm
package
(simulates the posterior only for the beta (fixed-effect) coefficients;
not yet working with development lme4; would like a better formal
description of the algorithm …?)MCMCglmm
packageglmmBUGS
(a WinBUGS wrapper/R interface)rjags
/r2jags
/R2WinBUGS
/BRugs
packagesmcmcsamp
is a function for lme4 that is supposed to
sample from the posterior distribution of the parameters, based on
flat/improper priors for the parameters [ed: I believe, but am not sure,
that these priors are flat on the scale of the theta
(Cholesky-factor) parameters]. At present, in the CRAN version
(lme4 0.999999-0) and the R-forge “stable” version (lme4.0 0.999999-1),
this covers only linear mixed models with uncorrelated random
effects.
As has been discussed in a variety of places (e.g. on
r-sig-mixed models, and on
the r-forge bug tracker, it is challenging to come up with a sampler
that accounts properly for the possibility that the posterior
distributions for some of the variance components may be mixtures of
point masses at zero and continuous distributions. Naive samplers are
likely to get stuck at or near zero. Doug Bates has always been a bit
unsure that mcmcsamp
is really performing as intended, even
in the limited cases it now handles.
Given this uncertainty about how even the basic version works, the
lme4
developers have been reluctant to make the effort to
extend it to GLMMs or more complex LMMs, or to implement it for the
development version of lme4 … so unless something miraculous happens, it
will not be implemented for the new version of lme4
. As
always, users are encouraged to write and share their own code that
implements these capabilities …
The idea here is that in order to do inference on the effect of (a)
predictor(s), you (1) fit the reduced model (without the predictors) to
the data; (2) many times, (2a) simulate data from the reduced model;
(2b) fit both the reduced and the full model to the simulated (null)
data; (2c) compute some statistic(s) [e.g. t-statistic of the focal
parameter, or the log-likelihood or deviance difference between the
models]; (3) compare the observed values of the statistic from fitting
your full model to the data to the null distribution generated in step
2. - PBmodcomp
in the pbkrtest
package - see
the example in help("simulate-mer")
in the
lme4
package to roll your own, using a combination of
simulate()
and refit()
. - bootMer
in lme4
version >1.0.0 - a presentation at UseR! 2009
(abstract,
slides)
went into detail about a proposed bootMer
package and
suggested it could work for GLMMs too – but it does not seem to be
active.
Note that none of the following approaches takes the uncertainty of the random effects parameters into account … if you want to take RE parameter uncertainty into account, a Bayesian approach is probably the easiest way to do it.
The general recipe for computing predictions from a linear or generalized linear model is to
library(nlme)
fm1 <- lme(distance ~ age*Sex, random = ~ 1 + age | Subject,
data = Orthodont)
plot(Orthodont,asp="fill") ## plot responses by individual
## note that expand.grid() orders factor levels by *order of
## appearance* -- must match levels(Orthodont$Sex)
newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Female","Male"))
newdat$pred <- predict(fm1, newdat, level = 0)
## [-2] drops response from formula
Designmat <- model.matrix(formula(fm1)[-2], newdat)
predvar <- diag(Designmat %*% vcov(fm1) %*% t(Designmat))
newdat$SE <- sqrt(predvar)
newdat$SE2 <- sqrt(predvar+fm1$sigma^2)
library(ggplot2)
pd <- position_dodge(width=0.4)
g0 <- ggplot(newdat,aes(x=age,y=pred,colour=Sex))+
geom_point(position=pd)
cmult <- 2 ## could use 1.96 instead
g0 + geom_linerange(aes(ymin=pred-cmult*SE,ymax=pred+cmult*SE), position=pd)
## prediction intervals
g0 + geom_linerange(aes(ymin=pred-cmult*SE2,ymax=pred+cmult*SE2), position=pd)
A similar answer is laid out in the responses to this StackOverflow question.
Current versions of lme4 have a predict
method.
library(lme4)
library(ggplot2)
data("Orthodont",package="MEMSS")
fm1 <- lmer(
formula = distance ~ age*Sex + (age|Subject)
, data = Orthodont
)
newdat <- expand.grid(
age=c(8,10,12,14)
, Sex=c("Female","Male")
, distance = 0
)
newdat$distance <- predict(fm1,newdat,re.form=NA)
mm <- model.matrix(terms(fm1),newdat)
## or newdat$distance <- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$Subject[1] ## must be adapted for more complex models
cmult <- 2 ## could use 1.96
newdat <- data.frame(
newdat
, plo = newdat$distance-cmult*sqrt(pvar1)
, phi = newdat$distance+cmult*sqrt(pvar1)
, tlo = newdat$distance-cmult*sqrt(tvar1)
, thi = newdat$distance+cmult*sqrt(tvar1)
)
#plot confidence
g0 <- ggplot(newdat, aes(x=age, y=distance, colour=Sex))+geom_point()
g0 + geom_pointrange(aes(ymin = plo, ymax = phi))+
labs(title="CI based on fixed-effects uncertainty ONLY")
#plot prediction
g0 + geom_pointrange(aes(ymin = tlo, ymax = thi))+
labs(title="CI based on FE uncertainty + RE variance")
rm("Orthodont") ## clean up
library(glmmTMB)
data(Orthodont,package="nlme")
fm2 <- glmmTMB(distance ~ age*Sex + (age | Subject),
data = Orthodont,
family="gaussian")
## make prediction data frame
newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Female","Male"))
## design matrix (fixed effects)
mm <- model.matrix(delete.response(terms(fm2)),newdat)
## linear predictor (for GLMMs, back-transform this with the
## inverse link function (e.g. plogis() for binomial, beta;
## exp() for Poisson, negative binomial
newdat$distance <- drop(mm %*% fixef(fm2)[["cond"]])
predvar <- diag(mm %*% vcov(fm2)[["cond"]] %*% t(mm))
newdat$SE <- sqrt(predvar)
newdat$SE2 <- sqrt(predvar+sigma(fm2)^2)
(Probably overly complicated) ggplot
code:
library(ggplot2); theme_set(theme_bw())
pd <- position_dodge(width=0.4)
g0 <- ggplot(Orthodont,aes(x=age,y=distance,colour=Sex))+
stat_sum(alpha=0.2,aes(size=..n..))+
scale_size_continuous(breaks=1:4,range=c(2,5))
g1 <- g0+geom_line(data=newdat,position=pd)+
geom_point(data=newdat,shape=17,size=3,position=pd)
## confidence intervals
g2 <- g1 + geom_linerange(data=newdat,
aes(ymin=distance-2*SE,ymax=distance+2*SE),
lwd=2, position=pd)
## prediction intervals
g2 + geom_linerange(data=newdat,
aes(ymin=distance-2*SE2,ymax=distance+2*SE2), position=pd)
## Warning: The dot-dot notation (`..n..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(n)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
The effects
, emmeans
, and
sjPlot
packages are also useful here.
(From Harold Doran, updated by Assaf Oron Nov. 2013:)
If you want the standard errors of the conditional means, you can extract them as follows:
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
cV <- ranef(fm1, condVar = TRUE)
cV
is a list; each element is a data frame containing
the conditional modes for a particular grouping factor. If you use
scalar random effects (typically random intercepts), then specifying
ranef(...,drop=TRUE)
will return the conditional modes as a
single named vector instead.
The conditional variances are returned as an attribute of the
conditional modes. It may be easiest to use
as.data.frame(cV)
, or
broom.mixed::tidy(fm1, effects="ran_vals")
, to extract all
of the conditional means and standard deviations.
Or, digging in to the data structure by hand: if we set
ranvar <- attr(cV[[1]], "postVar")
then ranvar
is a 3-D array (the attribute is still
called postVar
, rather than condVar
, for
historical reasons/backward compatibility). Individual-level covariance
matrices of the conditional modes will sit on the [,,i]
faces. For example, ranvar[,,1]
is the variance-covariance
matrix of the conditional distribution for the first group, so
sqrt(diag(ranvar[,,1]))
## [1] 12.070857 2.304839
will provide the intercept and slope standard standard deviations for
the first group’s conditional modes. If you have a scalar random effect
and used drop=TRUE
in ranef()
, then you will
(mercifully) receive only a vector of individual variances here (one for
each level of the grouping factor). The following incantation will give
a matrix of conditional variances with one row for each group and one
column for each parameters:
ng <- dim(ranvar)[3]
np <- dim(ranvar)[2]
mm <- matrix(ranvar[cbind(rep(seq(np),ng),
rep(seq(np),ng),
rep(ng,each=np))],
byrow=TRUE,
nrow=ng)
Getting the uncertainty of the coefficients (i.e., what’s returned by
coef()
: the sum of the fixed-effect and random-effect
predictions for a particular level) is not (alas) currently easy with
lme4
. If the fixed and random effects were independent then
we could simply add the conditional variance and the variance of the
fixed-effect predictions, but they aren’t in general. There is a long r-sig-mixed-models
mailing list thread that discusses the issues, focusing on (1) how
to extract the covariance between fixed-effect estimate and the
random-effect prediction; (2) whether this value (the covariance between
an estimated parameter and a predicted mode of a
conditional distribution of a random variable) even makes sense in a
frequentist framework. If you’re willing to assume independence
of the conditional variance and the fixed-effect sampling variance, then
(e.g.) the variance of the intercepts for each group would be the sum of
the fixed-effect intercept variance and the conditional variance of the
intercept for each group:
vcov(fm1)[1,1]+mm[,1]
## [1] 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807
## [9] 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807
## [17] 192.2807 192.2807
Power analysis for (G)LMMs is mostly done by simulation, although there are some closed-form solutions and approximations, e.g. Snijders and Bosker (1993) (Snijders has links to programs and other resources on his web page). There are resources and bits of code examples spread all over the internet. e.g. here.
Kain, Bolker, and McCoy (2015) and Johnson et al. (2015) are peer-reviewed papers that discuss power analysis via simulation in more detail.
library(sos); findFn("{power analysis} mixed simulation")
locates the fullfact
, pamm
,
simr
, and simglm
packages. Depending on the
goal, one of these packages may have sufficient flexibility to do what
you want.
length(fixef(model))
)(This topic applies to both LMMs and GLMMs, perhaps more so to LMMs, because the issues are even harder for GLMMs.)
Researchers often want to know if there is a simple (or at least
implemented-in-R) way to get an analogue of \(R^2\) or another simple goodness-of-fit
metric for LMMs or GLMMs. This is a challenging question in both the GLM
and LMM worlds (and therefore doubly so for GLMMs), because it turns out
that the wonderful simplicity of \(R^2\) breaks down in the extension to GLMs
or LMMs. If you’re trying to quantify “fraction of variance explained”
in the GLM context, should you include or exclude sampling variation
(e.g., Poisson variation around the expected mean)? [According to an
sos::findFn
search for “Nagelkerke”, one of the common
solutions to this problem, the LogRegR2
function in the
descr
package computes several different “pseudo-\(R^2\)” measures for logistic regression.]
If you’re trying to quantify it in the LMM context, should you include
or exclude variation of different random-effects terms?
The same questions apply more generally to decomposition of variance (i.e. trying to assess the contribution of various model components to the overall fit, not just trying to assess the overall goodness-of-fit of the model); there is unlikely to be a single recipe that does everything you want.
This has been discussed at various times on the mailing lists. This thread and this thread on the r-sig-mixed-models mailing list are good starting points, and this post is useful too.
In one of those threads, Doug Bates said:
Assuming that one wants to define an R^2 measure, I think an argument could be made for treating the penalized residual sum of squares from a linear mixed model in the same way that we consider the residual sum of squares from a linear model. Or one could use just the residual sum of squares without the penalty or the minimum residual sum of squares obtainable from a given set of terms, which corresponds to an infinite precision matrix. I don’t know, really. It depends on what you are trying to characterize.
In one of those threads, Jarrett Byrnes contributed the following code:
r2.corr.mer <- function(m) {
lmfit <- lm(model.response(model.frame(m)) ~ fitted(m))
summary(lmfit)$r.squared
}
\(\Omega^2_0\) (Xu 2003), which is almost the same, is based on comparing the residual variance of the full model against the residual variance of a (fixed) intercept-only null model:
1-var(residuals(m))/var(model.response(model.frame(m)))
Another possibility is the squared correlation between the response variable and the predicted values:
cor(model.response(model.frame(m)),predict(m,type="response"))^2
Gelman and Pardoe (2006)
propose/discuss Bayesian measures of \(R^2\) (I don’t know if anyone has created a
canned implementation of these measures in R). Nakagawa and Schielzeth (2013) and Johnson (2014) have also proposed a general
methodology for computing \(R^2\); J.
Lefcheck gives examples here
and here,
based on his implementation in the piecewiseSEM
package (CRAN,
Github). See also
Jaeger et al. (2017), Rights and Sterba (2018) …
A related question is how to quantify “repeatability” (i.e., ratios of variance at different levels) in GLMMs, especially how to compute the “residual error” term for GLMMs: see Nakagawa and Schielzeth (2010) and the rptR package.
The bottom line is that there are some simple recipes (and some more complex recipes that may or may not have been coded up by someone), but that ’‘’you have to think carefully about what information you want to get out of the coefficient of determination’’’, because no recipe will have all of the properties of \(R^2\) in the simple linear model case.
Packages/functions: See performance::r2()
,
MuMIn::r.squaredGLMM()
, the r2glmm
package, the standalone r2MLM
function, stuff in the piecewiseSEM
package, psycho::R2_nakagawa
,
partR2 package … (try
e.g. sos::findFn("Nakagawa Schielzeth")
for an up-to-date
list …)
The simplest way to get (within-study) measures of variable importance is to standardize the predictor variables (scaling by 1 SD or 2SD: Gelman (2008), Schielzeth (2010))
The r2glmm
package computes partial \(R^2\) values for fixed effects (only for
lmer
, lme
, and glmmPQL
models)
Henrik Singmann has a detailed answer here on why standardized measures such as partial eta-squared are problematic:
The fact that calculating a global measure of model fit (such as R2) is already riddled with complications and that no simple single number can be found, should be a hint that doing so for a subset of the model parameters (i.e., main-effects or interactions) is even more difficult. Given this, I would not recommend to try finding a measure of standardized effect sizes for mixed models.
He even gives suggested wording for responding to reviewers who want standardized measures!
No. See Doug Bates reply to this question here
lmer
/glmer
/etc.lmer
: I have heard “ell emm ee arr” (i.e. pronouncing
each letter); “elmer” (probably most common); and “lemur”glmer
: “gee ell emm ee arr”, “gee elmer”, “glimmer”, or
“gleamer”lme
and nlme
people just seem to spell
out the names (rather than saying e.g. “lemmy” and “nelmy”)Recent versions of lme4
output contain an
@optinfo
slot that stores warnings.
Copied from https://stat.ethz.ch/pipermail/r-help/2012-February/302767.html :
There’s a somewhat hack-ish solution, which is to use options(warn=2) to ‘upgrade’ warnings to errors, and then use try() or tryCatch() to catch them.
More fancily, I used code that looked something like this to save warnings as I went along (sorry about the <<- ) in a recent simulation study. You could also check w$message to do different things in the case of different warnings.
## n.b. have to set up a 3D warn array first ...
withCallingHandlers(tryCatch(fun(n=nvec[j],tau=tauvec[i],...),
error = function(e) {
warn[k,i,j] <<- paste("ERROR:",e$message)
NA_ans}),
warning = function(w) {
warn[k,i,j] <<- w$message
invokeRestart("muffleWarning")
})
glmm.wikidot.com
repeated
package: AGHQ)aov()
, nlme
, or
lme4
, or some other package?aov()
(in the stats
package in base R:
balanced, orthogonal designs only (R analogue of SAS PROC GLM)nlme
(analogue of SAS
PROC MIXED
/NLMIXED
)
aov
(unbalanced,
heteroscedasticity and/or correlation among residual errors)lme4
lme4
(also SAS
PROC MIXED
/NLMIXED
):
lme
)lme4
for GLMMs, or if you have big data (thousands
to tens of thousands of records)The following is modified from a contribution by Kingsford Jones, found 2010-03-16
nlmeODE
– Non-linear mixed-effects modelling in nlme
using differential equationslongRPart
– Recursive partitioning of longitudinal data
using mixed-effects modelsPSM
– Non-Linear Mixed-Effects modelling using
Stochastic Differential EquationsglmmADMB
(R-forge, interface to AD Model Builder)spida
, p3d
(Georges Monette)glmm
(in Jim Lindsey’s repeated
package:
at Lindsey’s web
siteOpenMx
– Advanced Structural Equation ModelingASReml-R
(commercial, but 30 days’ free use/free
license for academic or developing-country use available). Very good at
complex LMMs (fast, flexible covariance structures, etc.), but only
offers PQL for GLMMs, and the manual says: > we cannot recommend the
use of this technique for general use. It is included in the current
version of asreml() for advanced users. It is highly recommended that
its use be accompanied by some form of cross-validatory assessment for
the specific dataset concerned.” Resources:sessionInfo()
## R Under development (unstable) (2024-07-31 r86945)
## Platform: x86_64-pc-linux-gnu
## Running under: Pop!_OS 22.04 LTS
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
## [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
## [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Toronto
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.1 RLRsim_3.1-8 nlme_3.1-165
## [4] dplyr_1.1.4 broom.mixed_0.2.9.5 glmmTMB_1.1.9-9000
## [7] equatiomatic_0.3.3 lme4_1.1-36 Matrix_1.7-0
## [10] Cairo_1.6-2 pander_0.6.5 knitr_1.48
## [13] rmarkdown_2.27
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 TMB_1.9.14 xfun_0.46
## [4] bslib_0.8.0 lattice_0.22-6 numDeriv_2016.8-1.1
## [7] vctrs_0.6.5 tools_4.5.0 Rdpack_2.6
## [10] generics_0.1.3 sandwich_3.1-0 parallel_4.5.0
## [13] tibble_3.2.1 fansi_1.0.6 highr_0.11
## [16] pkgconfig_2.0.3 lifecycle_1.0.4 farver_2.1.2
## [19] compiler_4.5.0 munsell_0.5.1 codetools_0.2-20
## [22] httpuv_1.6.15 htmltools_0.5.8.1 sass_0.4.9
## [25] yaml_2.3.10 crayon_1.5.3 later_1.3.2
## [28] pillar_1.9.0 furrr_0.3.1 nloptr_2.1.1
## [31] jquerylib_0.1.4 tidyr_1.3.1 MASS_7.3-61
## [34] cachem_1.1.0 reformulas_0.3.0 boot_1.3-30
## [37] multcomp_1.4-26 mime_0.12 parallelly_1.38.0
## [40] tidyselect_1.2.1 digest_0.6.36 mvtnorm_1.2-5
## [43] future_1.34.0 purrr_1.0.2 listenv_0.9.1
## [46] labeling_0.4.3 forcats_1.0.0 splines_4.5.0
## [49] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
## [52] cli_3.6.3 magrittr_2.0.3 survival_3.7-0
## [55] utf8_1.2.4 TH.data_1.1-2 broom_1.0.6
## [58] withr_3.0.1 scales_1.3.0 promises_1.3.0
## [61] backports_1.5.0 estimability_1.5.1 emmeans_1.10.3
## [64] globals_0.16.3 zoo_1.8-12 coda_0.19-4.1
## [67] shiny_1.9.1 evaluate_0.24.0 rbibutils_2.2.16
## [70] mgcv_1.9-1 rlang_1.1.4 Rcpp_1.0.13
## [73] xtable_1.8-4 glue_1.7.0 minqa_1.2.7
## [76] jsonlite_1.8.8 R6_2.5.1
merDeriv
for standard devs of variances,
robust estimates. More on Rizopoulos packagerethinking
, brms
, …broom(.mixed)
,
emmeans
, multcomp
, …)