# Longitudinal Data Analysis example
[Julian Faraway](https://julianfaraway.github.io/)
2024-10-17
- [Data](#data)
- [Mixed Effect Model](#mixed-effect-model)
- [INLA](#inla)
- [STAN](#stan)
- [Diagnostics](#diagnostics)
- [Output Summary](#output-summary)
- [Posterior Distributions](#posterior-distributions)
- [BRMS](#brms)
- [MGCV](#mgcv)
- [GINLA](#ginla)
- [Discussion](#discussion)
- [Package version info](#package-version-info)
See the [introduction](index.md) for an overview.
This example is discussed in more detail in my book [Extending the
Linear Model with R](https://julianfaraway.github.io/faraway/ELM/)
Required libraries:
``` r
library(faraway)
library(ggplot2)
library(lme4)
library(pbkrtest)
library(RLRsim)
library(INLA)
library(knitr)
library(rstan, quietly=TRUE)
library(brms)
library(mgcv)
```
# Data
The Panel Study of Income Dynamics (PSID), begun in 1968, is a
longitudinal study of a representative sample of U.S. individuals. The
study is conducted at the Survey Research Center, Institute for Social
Research, University of Michigan, and is still continuing. There are
currently 8700 households in the study and many variables are measured.
We chose to analyze a random subset of this data, consisting of 85 heads
of household who were aged 25â39 in 1968 and had complete data for at
least 11 of the years between 1968 and 1990. The variables included were
annual income, gender, years of education and age in 1968:
Load in the data:
``` r
data(psid, package="faraway")
head(psid)
```
age educ sex income year person
1 31 12 M 6000 68 1
2 31 12 M 5300 69 1
3 31 12 M 5200 70 1
4 31 12 M 6900 71 1
5 31 12 M 7500 72 1
6 31 12 M 8000 73 1
``` r
summary(psid)
```
age educ sex income year person
Min. :25.0 Min. : 3.0 F:732 Min. : 3 Min. :68.0 Min. : 1.0
1st Qu.:28.0 1st Qu.:10.0 M:929 1st Qu.: 4300 1st Qu.:73.0 1st Qu.:20.0
Median :34.0 Median :12.0 Median : 9000 Median :78.0 Median :42.0
Mean :32.2 Mean :11.8 Mean : 13575 Mean :78.6 Mean :42.4
3rd Qu.:36.0 3rd Qu.:13.0 3rd Qu.: 18050 3rd Qu.:84.0 3rd Qu.:63.0
Max. :39.0 Max. :16.0 Max. :180000 Max. :90.0 Max. :85.0
``` r
psid$cyear <- psid$year-78
```
We have centered the time variable year. Some plots of just 20 of the
subjects:
``` r
psid20 = subset(psid, person <= 20)
ggplot(psid20, aes(x=year, y=income))+geom_line()+facet_wrap(~ person)
```
![](figs/psidplot-1..svg)
``` r
ggplot(psid20, aes(x=year, y=income+100, group=person)) +geom_line()+facet_wrap(~ sex)+scale_y_log10()
```
![](figs/psidplot-2..svg)
# Mixed Effect Model
Suppose that the income change over time can be partly predicted by the
subjectâs age, sex and educational level. The variation may be
partitioned into two components. Clearly there are other factors that
will affect a subjectâs income. These factors may cause the income to be
generally higher or lower or they may cause the income to grow at a
faster or slower rate. We can model this variation with a random
intercept and slope, respectively, for each subject. We also expect that
there will be some year-to-year variation within each subject. For
simplicity, let us initially assume that this error is homogeneous and
uncorrelated. We also center the year to aid interpretation. We may
express these notions in the model:
``` r
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(1 | person) + (0 + cyear | person), psid)
faraway::sumary(mmod, digits=3)
```
Fixed Effects:
coef.est coef.se
(Intercept) 6.628 0.552
cyear 0.086 0.009
sexM 1.153 0.121
age 0.011 0.014
educ 0.109 0.022
cyear:sexM -0.026 0.012
Random Effects:
Groups Name Std.Dev.
person (Intercept) 0.531
person.1 cyear 0.049
Residual 0.683
---
number of obs: 1661, groups: person, 85
AIC = 3839.7, DIC = 3753.4
deviance = 3787.6
This model can be written as:
``` math
\begin{aligned}
\mathrm{\log(income)}_{ij} &= \mu + \beta_{y} \mathrm{year}_i + \beta_s \mathrm{sex}_j + \beta_{ys} \mathrm{sex}_j \times \mathrm{year}_i + \beta_e \mathrm{educ}_j + \beta_a \mathrm{age}_j \\ &+ \gamma^0_j + \gamma^1_j \mathrm{year}_i + \epsilon_{ij}
\end{aligned}
```
where $i$ indexes the year and $j$ indexes the individual. We have:
``` math
\left(
\begin{array}{c}
\gamma^0_k \\
\gamma^1_k
\end{array}
\right) \sim
N(0,\sigma^2 D)
```
We have chosen not to have an interaction between the intercept and the
slope random effects and so $D$ is a diagonal matrix. It is possible to
have such an interaction in `lmer()` models but not in some of the
models below. As it happens, if you do fit a model with such an
interaction, you find that it is not significant. Hence dropping it does
not make much difference in this instance.
We can test the interaction term in the fixed effect part of the model:
``` r
mmod <- lmer(log(income) ~ cyear*sex +age+educ+(1 | person) + (0 + cyear | person),psid, REML=FALSE)
mmodr <- lmer(log(income) ~ cyear + sex +age+educ+(1 | person) + (0 + cyear | person),psid, REML=FALSE)
KRmodcomp(mmod,mmodr)
```
large : log(income) ~ cyear + sex + age + educ + (1 | person) + (0 +
cyear | person) + cyear:sex
small : log(income) ~ cyear + sex + age + educ + (1 | person) + (0 +
cyear | person)
stat ndf ddf F.scaling p.value
Ftest 4.63 1.00 81.47 1 0.034
We find that the interaction is statistically significant. We can also
compute bootstrap confidence intervals:
``` r
confint(mmod, method="boot", oldNames=FALSE)
```
2.5 % 97.5 %
sd_(Intercept)|person 0.419467 0.6008617
sd_cyear|person 0.037740 0.0577056
sigma 0.658871 0.7054057
(Intercept) 5.611348 7.6201110
cyear 0.068317 0.1035504
sexM 0.933932 1.3978847
age -0.014185 0.0358830
educ 0.065079 0.1555695
cyear:sexM -0.049683 -0.0022605
We see that all the standard deviations are clearly well above zero. The
age effect does not look significant.
# INLA
Integrated nested Laplace approximation is a method of Bayesian
computation which uses approximation rather than simulation. More can be
found on this topic in [Bayesian Regression Modeling with
INLA](http://julianfaraway.github.io/brinla/) and the [chapter on
GLMMs](https://julianfaraway.github.io/brinlabook/chaglmm.html)
Use the most recent computational methodology:
``` r
inla.setOption(inla.mode="experimental")
inla.setOption("short.summary",TRUE)
```
We need to create a duplicate of the person variable because this
variable appears in two random effect terms and INLA requires these be
distinct. We fit the default INLA model:
``` r
psid$slperson <- psid$person
formula <- log(income) ~ cyear*sex+age+educ + f(person, model="iid") + f(slperson, cyear , model="iid")
result <- inla(formula, family="gaussian", data=psid)
summary(result)
```
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 6.629 0.549 5.550 6.629 7.709 6.629 0
cyear 0.086 0.009 0.068 0.086 0.103 0.086 0
sexM 1.153 0.121 0.916 1.153 1.391 1.153 0
age 0.011 0.014 -0.016 0.011 0.037 0.011 0
educ 0.109 0.022 0.066 0.109 0.151 0.109 0
cyear:sexM -0.026 0.012 -0.050 -0.026 -0.002 -0.026 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 2.14 0.079 1.99 2.14 2.30 2.14
Precision for person 3.67 0.628 2.58 3.62 5.04 3.54
Precision for slperson 438.00 91.088 285.95 428.84 642.95 411.12
is computed
In this case, the default priors appear to produce believable results
for the precisions.
``` r
sigmaint <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaslope <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaint,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaslope,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
mm <- model.matrix(~ cyear*sex+age+educ,psid)
colnames(restab) = c("Intercept",colnames(mm)[-1],"intSD","slopeSD","epsilon")
data.frame(restab) |> kable()
```
| | Intercept | cyear | sexM | age | educ | cyear.sexM | intSD | slopeSD | epsilon |
|:-----------|:----------|:----------|:--------|:----------|:---------|:-----------|:---------|:----------|:---------|
| mean | 6.6292 | 0.085561 | 1.1531 | 0.010662 | 0.10869 | -0.026293 | 0.52745 | 0.048544 | 0.68335 |
| sd | 0.54876 | 0.0089766 | 0.12078 | 0.01364 | 0.021709 | 0.012199 | 0.044794 | 0.0049758 | 0.012445 |
| quant0.025 | 5.5498 | 0.067987 | 0.91553 | -0.016189 | 0.065982 | -0.050423 | 0.44577 | 0.039499 | 0.65928 |
| quant0.25 | 6.2605 | 0.079515 | 1.072 | 0.0015019 | 0.094099 | -0.034461 | 0.49603 | 0.045048 | 0.67483 |
| quant0.5 | 6.628 | 0.08551 | 1.1529 | 0.010638 | 0.10864 | -0.026275 | 0.5251 | 0.048281 | 0.68319 |
| quant0.75 | 6.9955 | 0.091534 | 1.2337 | 0.019769 | 0.12317 | -0.01813 | 0.55636 | 0.051748 | 0.69169 |
| quant0.975 | 7.7068 | 0.10327 | 1.3903 | 0.037427 | 0.15131 | -0.0024687 | 0.62158 | 0.059025 | 0.70815 |
Also construct a plot the SD posteriors:
``` r
ddf <- data.frame(rbind(sigmaint,sigmaslope,sigmaepsilon),errterm=gl(3,nrow(sigmaint),labels = c("intercept","slope","epsilon")))
ggplot(ddf, aes(x,y))+geom_line()+xlab("log(income)")+ylab("density") +
facet_wrap(~ errterm, scales = "free")
```
![](figs/plotsdspsid-1..svg)
Posteriors look OK.
# STAN
[STAN](https://mc-stan.org/) performs Bayesian inference using MCMC. Set
up STAN to use multiple cores. Set the random number seed for
reproducibility.
``` r
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
set.seed(123)
```
Fit the model. Requires use of STAN command file
[longitudinal.stan](../stancode/longitudinal.stan). We view the code
here:
``` r
writeLines(readLines("../stancode/longitudinal.stan"))
```
data {
int Nobs;
int Npreds;
int Ngroups;
vector[Nobs] y;
matrix[Nobs,Npreds] x;
vector[Nobs] timevar;
int group[Nobs];
}
parameters {
vector[Npreds] beta;
real sigmaint;
real sigmaslope;
real sigmaeps;
vector[Ngroups] etaint;
vector[Ngroups] etaslope;
}
transformed parameters {
vector[Ngroups] ranint;
vector[Ngroups] ranslope;
vector[Nobs] yhat;
ranint = sigmaint * etaint;
ranslope = sigmaslope * etaslope;
for (i in 1:Nobs)
yhat[i] = x[i]*beta+ranint[group[i]]+ranslope[group[i]]*timevar[i];
}
model {
etaint ~ normal(0, 1);
etaslope ~ normal(0, 1);
y ~ normal(yhat, sigmaeps);
}
We have used uninformative priors.
Set up the data as a list:
``` r
lmod <- lm(log(income) ~ cyear*sex +age+educ, psid)
x <- model.matrix(lmod)
psiddat <- list(Nobs = nrow(psid),
Npreds = ncol(x),
Ngroups = length(unique(psid$person)),
y = log(psid$income),
x = x,
timevar = psid$cyear,
group = psid$person)
```
Break the fitting of the model into three steps.
``` r
rt <- stanc("../stancode/longitudinal.stan")
sm <- stan_model(stanc_ret = rt, verbose=FALSE)
set.seed(123)
system.time(fit <- sampling(sm, data=psiddat))
```
user system elapsed
292.339 2.884 88.065
## Diagnostics
For the error SD:
``` r
muc <- rstan::extract(fit, pars="sigmaeps", permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
```
![](figs/psidsigmaeps-1..svg)
Looks OK. For the intercept SD
``` r
pname <- "sigmaint"
muc <- rstan::extract(fit, pars=pname, permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
```
![](figs/psidsigmaint-1..svg)
Looks OK. For the slope SD.
``` r
pname <- "sigmaslope"
muc <- rstan::extract(fit, pars=pname, permuted=FALSE, inc_warmup=FALSE)
mdf <- reshape2::melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])
```
![](figs/psidsigmaslope-1..svg)
Again satisfactory.
## Output Summary
Examine the main parameters of interest:
``` r
print(fit,par=c("beta","sigmaint","sigmaslope","sigmaeps"))
```
Inference for Stan model: longitudinal.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 6.62 0.02 0.54 5.54 6.26 6.62 6.97 7.72 701 1.01
beta[2] 0.09 0.00 0.01 0.07 0.08 0.09 0.09 0.10 1062 1.01
beta[3] 1.15 0.01 0.13 0.91 1.07 1.15 1.24 1.41 600 1.01
beta[4] 0.01 0.00 0.01 -0.02 0.00 0.01 0.02 0.04 747 1.01
beta[5] 0.11 0.00 0.02 0.07 0.09 0.11 0.12 0.15 740 1.00
beta[6] -0.03 0.00 0.01 -0.05 -0.04 -0.03 -0.02 0.00 1006 1.00
sigmaint 0.54 0.00 0.05 0.45 0.50 0.54 0.57 0.64 811 1.00
sigmaslope 0.05 0.00 0.01 0.04 0.05 0.05 0.05 0.06 1062 1.00
sigmaeps 0.68 0.00 0.01 0.66 0.68 0.68 0.69 0.71 4315 1.00
Samples were drawn using NUTS(diag_e) at Mon Sep 2 11:45:13 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Remember that the beta correspond to the following parameters:
``` r
colnames(x)
```
[1] "(Intercept)" "cyear" "sexM" "age" "educ" "cyear:sexM"
## Posterior Distributions
First the random effect parameters:
``` r
postsig <- rstan::extract(fit, pars=c("sigmaint","sigmaslope","sigmaeps"))
ref <- reshape2::melt(postsig)
colnames(ref)[2:3] <- c("logincome","parameter")
ggplot(data=ref,aes(x=logincome))+geom_density()+facet_wrap(~parameter,scales="free")
```
![](figs/psidpostsig-1..svg)
The slope parameter is not on the same scale.
``` r
ref <- reshape2::melt(rstan::extract(fit, pars="beta")$beta)
colnames(ref)[2:3] <- c("parameter","logincome")
ref$parameter <- factor(colnames(x)[ref$parameter])
ggplot(ref, aes(x=logincome))+geom_density()+geom_vline(xintercept=0)+facet_wrap(~parameter,scales="free")
```
![](figs/psidpostbeta-1..svg)
We see that age and possibly the year:sex interaction are the only terms
which may not contribute much,
# BRMS
[BRMS](https://paul-buerkner.github.io/brms/) stands for Bayesian
Regression Models with STAN. It provides a convenient wrapper to STAN
functionality. We specify the model as in `lmer()` above.
``` r
suppressMessages(bmod <- brm(log(income) ~ cyear*sex +age+educ+(1 | person) + (0 + cyear | person),data=psid, cores=4))
```
We can obtain some posterior densities and diagnostics with:
``` r
plot(bmod, variable = "^s", regex=TRUE)
```
![](figs/psidbrmsdiag-1..svg)
We have chosen only the random effect hyperparameters since this is
where problems will appear first. Looks OK.
We can look at the STAN code that `brms` used with:
``` r
stancode(bmod)
```
// generated with brms 2.21.0
functions {
}
data {
int N; // total number of observations
vector[N] Y; // response variable
int K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int Kc; // number of population-level effects after centering
// data for group-level effects of ID 1
int N_1; // number of grouping levels
int M_1; // number of coefficients per level
array[N] int J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
// data for group-level effects of ID 2
int N_2; // number of grouping levels
int M_2; // number of coefficients per level
array[N] int J_2; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
real sigma; // dispersion parameter
vector[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
vector[M_2] sd_2; // group-level standard deviations
array[M_2] vector[N_2] z_2; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
r_2_1 = (sd_2[1] * (z_2[1]));
lprior += student_t_lpdf(Intercept | 3, 9.1, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_2 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
We see that `brms` is using student t distributions with 3 degrees of
freedom for the priors. For the error SDs, this will be truncated at
zero to form half-t distributions. You can get a more explicit
description of the priors with `prior_summary(bmod)`. We examine the
fit:
``` r
summary(bmod)
```
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log(income) ~ cyear * sex + age + educ + (1 | person) + (0 + cyear | person)
Data: psid (Number of observations: 1661)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~person (Number of levels: 85)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.54 0.05 0.46 0.65 1.00 962 1476
sd(cyear) 0.05 0.01 0.04 0.06 1.00 1147 1969
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.64 0.57 5.53 7.77 1.00 978 1545
cyear 0.09 0.01 0.07 0.11 1.00 1231 1526
sexM 1.15 0.13 0.90 1.40 1.00 636 879
age 0.01 0.01 -0.02 0.04 1.00 918 1325
educ 0.11 0.02 0.07 0.15 1.00 861 1669
cyear:sexM -0.03 0.01 -0.05 -0.00 1.00 1038 1891
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.68 0.01 0.66 0.71 1.00 4357 3120
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results are consistent with those seen previously.
# MGCV
It is possible to fit some GLMMs within the GAM framework of the `mgcv`
package. An explanation of this can be found in this
[blog](https://fromthebottomoftheheap.net/2021/02/02/random-effects-in-gams/).
The `s()` function requires `person` to be a factor to achieve the
expected outcome.
``` r
psid$person = factor(psid$person)
gmod = gam(log(income) ~ cyear*sex + age + educ +
s(person, bs = 're') + s(person, cyear, bs = 're'), data=psid, method="REML")
```
and look at the summary output:
``` r
summary(gmod)
```
Family: gaussian
Link function: identity
Formula:
log(income) ~ cyear * sex + age + educ + s(person, bs = "re") +
s(person, cyear, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.62759 0.55174 12.01 <2e-16
cyear 0.08565 0.00904 9.47 <2e-16
sexM 1.15349 0.12144 9.50 <2e-16
age 0.01067 0.01372 0.78 0.437
educ 0.10874 0.02183 4.98 7e-07
cyear:sexM -0.02644 0.01229 -2.15 0.032
Approximate significance of smooth terms:
edf Ref.df F p-value
s(person) 73.6 81 14.73 <2e-16
s(person,cyear) 63.3 83 7.38 <2e-16
R-sq.(adj) = 0.673 Deviance explained = 70.1%
-REML = 1910.9 Scale est. = 0.46693 n = 1661
We get the fixed effect estimates. We also get tests on the random
effects (as described in this
[article](https://doi.org/10.1093/biomet/ast038). The hypothesis of no
variation is rejected in both cases. This is consistent with earlier
findings.
We can get an estimate of the operator and error SD:
``` r
gam.vcomp(gmod)
```
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(person) 0.531482 0.449062 0.629030
s(person,cyear) 0.049289 0.040273 0.060323
scale 0.683325 0.659216 0.708315
Rank: 3/3
The point estimates are the same as the REML estimates from `lmer`
earlier. The confidence intervals are slightly different. A bootstrap
method was used for the `lmer` fit whereas `gam` is using an asymptotic
approximation resulting in slightly different results.
The fixed and random effect estimates can be found with:
``` r
head(coef(gmod),20)
```
(Intercept) cyear sexM age educ cyear:sexM s(person).1 s(person).2 s(person).3
6.6275863 0.0856540 1.1534910 0.0106748 0.1087393 -0.0264416 -0.0352094 -0.0070974 -0.1022664
s(person).4 s(person).5 s(person).6 s(person).7 s(person).8 s(person).9 s(person).10 s(person).11 s(person).12
0.1322106 -0.5742115 0.2137369 0.4336553 -0.6408237 0.3183853 0.1176814 0.3062349 0.3806784
s(person).13 s(person).14
0.1629332 -0.5551002
We have not printed them all out because they are too many in number to
appreciate.
# GINLA
In [Wood (2019)](https://doi.org/10.1093/biomet/asz044), a simplified
version of INLA is proposed. The first construct the GAM model without
fitting and then use the `ginla()` function to perform the computation.
``` r
gmod = gam(log(income) ~ cyear*sex + age + educ +
s(person, bs = 're') + s(person, cyear, bs = 're'), data=psid, fit = FALSE)
gimod = ginla(gmod)
```
We get the posterior density for the intercept as:
``` r
plot(gimod$beta[1,],gimod$density[1,],type="l",xlab="log(income)",ylab="density")
```
![](figs/psidginlaint-1..svg)
and for the age fixed effects as:
``` r
plot(gimod$beta[4,],gimod$density[4,],type="l",xlab="log(income) per age",ylab="density")
abline(v=0)
```
![](figs/psidginlaage-1..svg)
We see that the posterior for the age effect substantially overlaps
zero.
It is not straightforward to obtain the posterior densities of the
hyperparameters.
# Discussion
See the [Discussion of the single random effect
model](pulp.md#Discussion) for general comments.
- INLA and mgcv were unable to fit a model with an interaction between
the random effects. STAN, brms and lme4 can do this.
- There is relatively little disagreement between the methods and much
similarity.
- There were no major computational issue with the analyses (in contrast
with some of the other examples)
- The `mgcv` analyses took a little longer than previous analyses
because the sample size is larger (but still were quite fast).
# Package version info
``` r
sessionInfo()
```
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.9-1 nlme_3.1-166 brms_2.21.0 Rcpp_1.0.13 rstan_2.32.6
[6] StanHeaders_2.32.10 knitr_1.48 INLA_24.06.27 sp_2.1-4 RLRsim_3.1-8
[11] pbkrtest_0.5.3 lme4_1.1-35.5 Matrix_1.7-0 ggplot2_3.5.1 faraway_1.0.8
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 farver_2.1.2 dplyr_1.1.4 loo_2.8.0 fastmap_1.2.0
[6] tensorA_0.36.2.1 digest_0.6.37 estimability_1.5.1 lifecycle_1.0.4 Deriv_4.1.3
[11] sf_1.0-16 magrittr_2.0.3 posterior_1.6.0 compiler_4.4.1 rlang_1.1.4
[16] tools_4.4.1 utf8_1.2.4 yaml_2.3.10 labeling_0.4.3 bridgesampling_1.1-2
[21] pkgbuild_1.4.4 classInt_0.4-10 plyr_1.8.9 abind_1.4-5 KernSmooth_2.23-24
[26] withr_3.0.1 purrr_1.0.2 grid_4.4.1 stats4_4.4.1 fansi_1.0.6
[31] xtable_1.8-4 e1071_1.7-14 colorspace_2.1-1 inline_0.3.19 emmeans_1.10.4
[36] scales_1.3.0 MASS_7.3-61 cli_3.6.3 mvtnorm_1.2-6 rmarkdown_2.28
[41] generics_0.1.3 RcppParallel_5.1.9 rstudioapi_0.16.0 reshape2_1.4.4 minqa_1.2.8
[46] DBI_1.2.3 proxy_0.4-27 stringr_1.5.1 splines_4.4.1 bayesplot_1.11.1
[51] parallel_4.4.1 matrixStats_1.3.0 vctrs_0.6.5 boot_1.3-31 jsonlite_1.8.8
[56] systemfonts_1.1.0 tidyr_1.3.1 units_0.8-5 glue_1.8.0 nloptr_2.1.1
[61] codetools_0.2-20 distributional_0.4.0 stringi_1.8.4 gtable_0.3.5 QuickJSR_1.3.1
[66] munsell_0.5.1 tibble_3.2.1 pillar_1.9.0 htmltools_0.5.8.1 Brobdingnag_1.2-9
[71] R6_2.5.1 fmesher_0.1.7 evaluate_0.24.0 lattice_0.22-6 backports_1.5.0
[76] broom_1.0.6 rstantools_2.4.0 class_7.3-22 svglite_2.1.3 coda_0.19-4.1
[81] gridExtra_2.3 checkmate_2.3.2 xfun_0.47 pkgconfig_2.0.3