Abstract

The INLA package supports different kinds of data likelihoods, including the inla.surv, which is a generalization of the Surv from the survival package, and mdata. It also support joint modeling different kind of outcomes. In this vignette we illustrate the case when we have three outcomes, of different type, and model them jointly.

Introduction

Usually correlations between outcomes are modeled by sharing common model terms in the linear predictor. These terms can be covariates, random effects or sums of these. In some cases, the likelihood requires more information than just the vector of observations. As an example, the likelihood for survival models requires both observed times and information about censoring.

In this vignette we consider a joint model for three outcomes, simulate data from this model and illustrate how to fit the joint model accounting for shared terms. The linear predictor for each outcome includes covariates and a spatial random effect modeled using the SPDE approach.

The first outcome is known at an aggregated level, in a setting were we model it considering a kind of likelihood designed in INLA to account for all the information available with respect to the aggregation. The second outcome is assumed to be censored, so that we have to consider the observed value and the censoring information. The third outcome is just the usual kind of data: one scalar per data unit.

The spatial domain

The SPDE models available in INLA could be applied for processes in \(\mathbb{R}^d\) or \(\mathbb{S}^d\), for \(d=1,2\), and the theory in Lindgren, Rue, and Lindström (2011) include higher dimentional domains. We consider a spatial domain, \(D \in \mathbb{R}^2\) to keep the example simple. Without loss off generality we consider \(D\) as a rectangle defined as

bb <- rbind(c(0, 10), c(0, 7))

Next, we store the length of each side and its average for later use

rxy <- apply(bb, 1, diff)
rr <- mean(rxy)

We sample a set of \(n\) locations, \(l_1, ..., l_n\), in this domain, \(l_i \in D\), completely at random as follows:

n <- 1000
loc <- cbind(
    runif(n, bb[1, 1], bb[1, 2]),
    runif(n, bb[2, 1], bb[2, 2]))

A spatial field

Instead of simulating from a model to play as a spatial smooth term, we define a smooth function on this domain as follows

sfn <- function(a, b)
    sin(a - b) + cos(a + b)

This function is evaluated at the \(n\) locations and visualized with

u <- sfn(2 * pi * loc[, 1] / rr,
         2 * pi * loc[, 2] / rr) 
summary(u)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -1.98934 -0.76929 -0.01775 -0.07667  0.58622  1.99288
par(mar = c(2, 2, 0, 0), mgp = c(1.5, 0.5, 0), las = 1)
plot(loc, asp = 1, pch = 19, cex = 0.5 + (2 + u)/4, xlab = "", ylab = "")

Spatial sub-domains

In order to define the first outcome, let us partition our domain into a set of sub-regions, \(r_1, r_2, ..., r_g\), so that \(r_i \cap r_i = \oslash\) and \(\cup_{i=1}^g r_j = D\).

For simplicity of coding, but without loss of generality, we assume these small sub-regions to be pixels based in the following set of centers:

h <- 1 ## size of the side of each sub-region
group.ce <- as.matrix(expand.grid(
    seq(bb[1, 1] + h/2, bb[1, 2], h),
    seq(bb[2, 1] + h/2, bb[2, 2], h)))
(ng <- nrow(group.ce))
#> [1] 70

We can avoid the need of dealing with the general spatial objects and operations available in R and identify each location \(l_i\) to each sub-region \(r_j\) with an integer vector as

group.id <- ceiling(loc[,1]/h) + 
  ceiling(rxy[1]/h) * (ceiling(loc[,2]/h)-1)

This will be used as a grouping index, so that each group is made of the observations falling in each of the pixels. We will visualize the locations with color depending on this grouping later.

Covariates

Let us consider a set of three covariates for each location as

xxx <- cbind(x1 = runif(n), x2 = runif(n), x3 = runif(n))

We average the covariates by group as we will consider that we have data at the group level to model the first outcome.

xxx.g <- aggregate(xxx, list(g = group.id), mean)
str(xxx.g)
#> 'data.frame':    70 obs. of  4 variables:
#>  $ g : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ x1: num  0.542 0.453 0.413 0.501 0.58 ...
#>  $ x2: num  0.515 0.435 0.39 0.638 0.46 ...
#>  $ x3: num  0.548 0.558 0.607 0.557 0.49 ...

Outcomes

We will consider three outcomes.

First outcome: aggregated Gaussian

The first outcome is Gaussian distributes, with unknown mean and variance, and a known scaling factor, \(s\), such that \[ y_i \sim \textrm{N}(\mu_i, s_i\sigma^2). \]

We assume \(\boldsymbol{\eta}_y = \{\mu_1, ..., \mu_n\}\) and \[ \boldsymbol{\eta}_y = \mathbf{X}\boldsymbol{\beta}_y + \mathbf{u}. \] This is being evaluated considering the covariate and random effect averaged per group:

beta.y <- c(5, 0, -3, 3) 
u.g <- tapply(u, group.id, mean)
eta.y.g <- drop(cbind(1, as.matrix(xxx.g[, 2:ncol(xxx.g)])) %*% beta.y) + u.g

We will simulate the scaling factor for each observation with

s <- rgamma(n, 7, 3)
summary(s)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.3739  1.7863  2.2935  2.3933  2.9033  6.0581

sigma.y <- 1
y <- rnorm(n, eta.y.g[group.id], sqrt(sigma.y/s))
summary(y)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.090   4.075   4.964   4.947   5.841   8.347

We simulated \(y_i\) for each location \(l_i\). However, we consider that observations of this variable are available only at the aggregated level, at each of the sub-regions \(r_j\). Furthermore, for each sub-region \(r_j\), we know

how many observations were taken in each sub-region, \(n_j = \sum_{i=1}^n I(l_i \in r_j)\)
the weighted average \(\overline{y}_j\) = \((n_j)^{-1}\sum_{i,l_i\in r_j} s_iy_i\),
the sum o the scaling factors, \(m_j\) = \(\sum_{i,l_i\in r_j} s_i\),
the half of the sum of the log of the scaling factors, \(\frac{1}{2}\sum_{i,l_i\in r_j} \log(s_i)\),
and the sample variance within each unit, defined as \(v_j\) = \(\frac{1}{n_j}\sum_{i,l_i\in r_j}^n(s_i y_i^2-\overline{y}_j^2)\)

The function inla.agaussian computes these five statistics, given \(y_i\) and \(s_i\) for each group. We use this function to compute these statistics, as follows

library(INLA)
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loading required package: parallel
#> Loading required package: sp
#> This is INLA_23.04.11 built 2023-04-10 20:51:17 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.
#>  - To enable PARDISO sparse library; see inla.pardiso()
agg <- lapply(1:ng, function(g) {
    ig <- which(group.id==g)
    if(length(ig)>0) 
        return(inla.agaussian(y[ig], s[ig]))
    return(inla.mdata(list(NA, 0, 1, 1, NA))) ### a deal with missing
})
str(YY <- Reduce('rbind', lapply(agg, unlist))) ## five columns matrix
#>  num [1:70, 1:5] 0.218 0.418 0.15 0.389 0.242 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:70] "init" "" "" "" ...
#>   ..$ : chr [1:5] "Y1" "Y2" "Y3" "Y4" ...

Second outcome: Survival

For the second outcome we choose a Weibull distribution with the following parametrization \[ f_W(w_i) = \alpha w_i^{\alpha - 1} \lambda_i^\alpha \textrm{e}^{-(\lambda_i w_i)^\alpha}. \]

We define the linear predictor as \(\boldsymbol{\eta}_w = \{\log(\lambda_1), ..., \log(\lambda_n)\}\) and \[ \boldsymbol{\eta}_w = \mathbf{X}\beta_w + \gamma_w\mathbf{u} \] and we consider the following code to sample from a Weibull distribution

beta.w <- c(1, -1, 0, 1)
alpha <- 2
gamma.w <- 0.5
lambdaw <- exp(cbind(1, xxx) %*% beta.w + gamma.w * u)
we0 <- rweibull(n, shape = alpha, scale = 1/lambdaw)
summary(we0)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.01604 0.15938 0.29232 0.40070 0.54882 2.32019

However, we suppose that some observations are censored

summary(u.ev <- runif(n, 0.3, 1)) ## censoring factor
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.3006  0.4792  0.6429  0.6466  0.8066  0.9985
table(event <- rbinom(n, 1, 0.5)) ## censored (=0) or event (=1)
#> 
#>   0   1 
#> 519 481
summary(we <- ifelse(event == 1, we0, we0 * u.ev)) ## censored outcome
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.01096 0.11519 0.23098 0.32351 0.41337 2.18450

Third outcome: Count under exposure

We consider that we have different exposure at each location

summary(ee <- rgamma(n, 10, 2))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.779   3.875   4.833   4.989   5.933   9.679

We assume an outcome following a Poisson distribution with \[ O_i \sim \textrm{Poisson}(\delta_i E_i). \]

We assume the linear predictor as \(\boldsymbol{\eta}_p = \{\log(\delta_1),\;...,\; \log(\delta_n)\}\) and \[ \boldsymbol{\eta}_p = \mathbf{X}\beta_p + \gamma_p\mathbf{u} \]

This outcome is being simulated with

beta.p <- c(2, 1, -1, 0)
gamma.p <- -0.3
delta.p <- exp(cbind(1, xxx) %*% beta.p + gamma.p * u)
po <- rpois(n, delta.p * ee)
summary(po)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    4.00   23.00   35.00   42.27   55.00  186.00

Data model setup

We now have to setup the objects in order to fit the model.

SPDE model setup

We have to define a mesh over the domain. We do it considering the following code

mesh <- inla.mesh.2d(
    loc.domain = matrix(bb[c(1,3,3,1,1, 2,2,4,4,2)], ncol = 2), 
    offset = c(.0001, .5) * rr, ## extensions size
    max.edge = c(.05, .3) * rr, 
    cutoff = 0.025 * rr)
mesh$n
#> [1] 1157

This mesh and the locations colored by the group defined for the first outcome can be visualized with

par(mar = c(0, 0, 1, 0), mgp = c(1.5, 0.5, 0))
plot(mesh, asp = 1, edge.color = gray(0.5))
points(loc, pch = 19, col = group.id, cex = 0.3 + (2 + u)/4)

The SPDE model is defined with

spde <- inla.spde2.pcmatern(
    mesh,
    prior.range=c(1, 0.1),
    prior.sigma=c(1, 0.1))

Linear predictor components

We will fit a joint model considering the linear predictor with terms as how it was simulated. The inla.stack() function in meant to me used when working with SPDE models. However, it also helps when considering multiple likelihood data as detailed below.

One way to deal with multiple likelihoods in INLA is defining matrix data for the outcome where each column is used for each likelihood. However, in this vignette we are dealing with more complex cases. Therefore, he need to use list in the left hand side of the formula. This is the key point to implement this example.

The right hand side of the formula includes each term in the linear predictor. When working with multiple likelihood, each term in each likelihood should be named uniquely. For example, if two likelihoods include the same covariate but with a different coefficient, each one has to have its own name. Therefore, we will pass the covariate x1 as x1y, for the first outcome, x1w for the second outcome, and x1p for the third outcome. Similarly for the other ones, including the intercept.

In our example, the SPDE model is in the linear predictor for the first outcome, a copy from this random effect in in the linear predictor for the second outcome, and another copy of this term is in the linear predictor for the third outcome.

The model formula considering all of these details is defined as

fff <- list(
  inla.mdata(Y),
  inla.surv(w, ev),
  o) ~ 0 +
  a1 + x1y + x2y + x3y + f(s, model = spde) +
  a2 + x1w + x2w + x3w + f(sc1, copy = "s", fixed = FALSE) +
  a3 + x1p + x2p + x3p + f(sc2, copy = "s", fixed = FALSE)

Building data stacks

The data has to be organized so it contains all the information needed, that is all the outcomes, covariates and random effect indexes. For the SPDE models it also has to include the projector matrices.

The main work of the inla.stack() function is to check and organize outcomes, effects and projector matrices in order to guarantee that the dimentions match. Additionally, inla.stack() will deal with the problem of arranging the outcomes to work properly with multiple likelihoods.

For the first outcome, we need to supply a five column matrix, each with one of the five statistics previously computed for the ng groups. For the effects we define an intercept, covariates with names according to what the formula specify and the index for the SPDE model. The projector matrices is one for the fixed effect, which is just 1, and the one for the SPDE model.

We will include a different tag for each stack to keep track. This is useful to collect results associated to each outcome. For each outcome we can also define an indicator vector to identify it. This is useful for the case of having some missing data. In this case the indicator can be used to identify which link to be used for making the predictions.

The stack for the first outcome can be defined as

stackY <- inla.stack(
    tag = 'y',
    data = list(Y = YY),
    effects= list(
        data.frame(a1 = 1,
                   x1y = xxx.g$x1, 
                   x2y = xxx.g$x2, 
                   x3y = xxx.g$x3),
        s = 1:mesh$n),
    A=list(1,
           inla.spde.make.A(mesh, group.ce))
)

For the second outcome, we have to supply two vectors (observed time and censoring) to build the inla.surv data, the covariates, with names according to the terms in the formula, and the projector matrices, the single 1 for the covariates and the projector for the shared SPDE term.

The stack for the second outcome can be defined as

stackW <- inla.stack(
    tag = 'w', 
    data = list(w = we,
                ev = event), 
    effects = list(
        data.frame(a2 = 1,
                   x1w = xxx[, 1], 
                   x2w = xxx[, 2], 
                   x3w = xxx[, 3]),
        sc1 = 1:mesh$n),
    A = list(1,
             inla.spde.make.A(mesh, loc))
)

For the third outcome we have to supply the vector of counts and the vector with the exposure. The effects and projector are similar with the previous others, just that we have to account for its names.

The stack for the third outcome can be defined as

stackP <- inla.stack(
    tag = 'p', 
    data = list(o = po,
                E = ee),
    effects = list(
        data.frame(a3 = 1,
                   x1p = xxx[, 1], 
                   x2p = xxx[, 2], 
                   x3p = xxx[, 3]),
        sc2 = 1:mesh$n),
    A = list(1,
             inla.spde.make.A(mesh, loc))
)

Fitting the model and some results

We now fit the model and look at some results.

Model fitting

To fit the model we start by joining all the data into one data stack

stacked <- inla.stack(stackY, stackW, stackP)

We now supply this to the inla() function in order to fit the model

result <- inla(
   formula = fff,
    family = c("agaussian", "weibullsurv", "poisson"),
    data = inla.stack.data(stacked),
    E = E, 
    control.predictor = list(
        A=inla.stack.A(stacked)),
    control.family = list(
        list(),
        list(variant = 1),
        list()), 
   control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE)
)
#> Warning in inla.model.properties.generic(inla.trim.family(model), mm[names(mm) == : Model 'agaussian' in section 'likelihood' is marked as 'experimental'; changes may appear at any time.
#>   Use this model with extra care!!! Further warnings are disabled.
#> Warning in sqrt(ee): NaNs produced
result$cpu
#>        Pre    Running       Post      Total 
#>  1.7111912 72.1509206  0.2670162 74.1291280

Summaries

We now compare the true value for each fixed effect with its posterior summary.

round(cbind(
    true = c(beta.y, beta.w, beta.p),
    result$summary.fix), 2)
#>     true  mean   sd 0.025quant 0.5quant 0.975quant  mode kld
#> a1     5  5.63 3.27      -1.29     5.62      12.58  5.59   0
#> x1y    0 -0.02 0.43      -0.86    -0.02       0.82 -0.02   0
#> x2y   -3 -3.41 0.36      -4.11    -3.41      -2.70 -3.41   0
#> x3y    3  3.09 0.37       2.37     3.09       3.81  3.09   0
#> a2     1  1.03 1.53      -2.22     1.01       4.29  1.00   0
#> x1w   -1 -1.06 0.07      -1.19    -1.06      -0.92 -1.06   0
#> x2w    0  0.02 0.07      -0.11     0.02       0.15  0.02   0
#> x3w    1  1.03 0.07       0.89     1.03       1.17  1.03   0
#> a3     2  1.85 1.00      -0.28     1.86       3.97  1.86   0
#> x1p    1  1.01 0.02       0.98     1.01       1.05  1.01   0
#> x2p   -1 -1.01 0.02      -1.04    -1.01      -0.98 -1.01   0
#> x3p    0 -0.01 0.02      -0.04    -0.01       0.03 -0.01   0

For the hyper-parameters, we have one for the agaussian likelihood, one for the weibulsuv, two for the SPDE model and two that define the scaling for the shared SPDE terms in the second and third outome.

round(cbind(true = c(1/sigma.y^2, alpha, NA, NA, gamma.w, gamma.p), 
            result$summary.hy), 2)
#>                                            true  mean   sd 0.025quant 0.5quant
#> Precision for the AggGaussian observations  1.0  1.03 0.05       0.92     1.03
#> alpha parameter for weibullsurv[2]          2.0  2.14 0.05       2.07     2.14
#> Range for s                                  NA 11.83 2.43       8.15    11.47
#> Stdev for s                                  NA  1.58 0.29       1.08     1.55
#> Beta for sc1                                0.5  0.47 0.02       0.43     0.47
#> Beta for sc2                               -0.3 -0.31 0.01      -0.32    -0.31
#>                                            0.975quant  mode
#> Precision for the AggGaussian observations       1.11  1.04
#> alpha parameter for weibullsurv[2]               2.25  2.13
#> Range for s                                     17.70 10.59
#> Stdev for s                                      2.24  1.49
#> Beta for sc1                                     0.52  0.47
#> Beta for sc2                                    -0.29 -0.30

Results considering each outcome

To collect outputs associated with each data, we need to consider the tag from each data stack to get the data index from the joint stack. For example, the data index for the third outcome is

idx1 <- inla.stack.index(stacked, tag = "p")$data

We can see the summary for the fitted values for the first few observations of this outcome with

head(cbind(true = delta.p, 
           result$summary.fitted.values[idx1, ]), 3)
#>                             true      mean        sd 0.025quant  0.5quant
#> fitted.APredictor.1071  9.798490  9.943670 0.4248137   9.137619  9.934075
#> fitted.APredictor.1072 11.039282 10.826138 0.4322350  10.001900 10.817795
#> fitted.APredictor.1073  8.644928  8.256163 0.3728410   7.545437  8.249140
#>                        0.975quant      mode
#> fitted.APredictor.1071  10.804335  9.914983
#> fitted.APredictor.1072  11.697989 10.801254
#> fitted.APredictor.1073   9.007049  8.235088

For access the DIC, WAIC and log score can consider the family indicator stored in the DIC output as follows

str(result$dic)
#> List of 14
#>  $ dic              : num 8443
#>  $ p.eff            : num 114
#>  $ mean.deviance    : num 8329
#>  $ deviance.mean    : num 8215
#>  $ dic.sat          : num -637597
#>  $ mean.deviance.sat: num -637712
#>  $ deviance.mean.sat: num 1810
#>  $ family.dic       : num [1:3] 2047.7 -24.6 6420.5
#>  $ family.dic.sat   : num [1:3] 9.26e+01 -1.28e+06 1.03e+03
#>  $ family.p.eff     : num [1:3] 33.9 8.08 72.19
#>  $ family           : num [1:2070] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ local.dic        : num [1:2070] 27.1 31.6 18.8 31.2 34.1 ...
#>  $ local.dic.sat    : num [1:2070] 2.848 1.042 0.896 1.166 2.298 ...
#>  $ local.p.eff      : num [1:2070] 0.594 0.477 0.592 0.552 0.639 ...
str(result$waic)
#> List of 4
#>  $ waic       : num 8444
#>  $ p.eff      : num 103
#>  $ local.waic : num [1:2070] 28 31.3 18.4 30.9 34.6 ...
#>  $ local.p.eff: num [1:2070] 1.085 0.149 0.134 0.198 0.836 ...
str(result$cpo)
#> List of 3
#>  $ cpo    : num [1:2070] 6.15e-07 1.52e-07 1.00e-04 1.88e-07 2.26e-08 ...
#>  $ pit    : num [1:2070] 0.00326 0.35375 0.50348 0.25726 0.01392 ...
#>  $ failure: num [1:2070] 0 0 0 0 0 0 0 0 0 0 ...
result$dic$family.dic ### DIC for each outcome
#> [1] 2047.66335  -24.64605 6420.47635
tapply(result$waic$local.waic, 
       result$dic$family, ## taken from dic family id
       sum) ## WAIC for each outcome
#>          1          2          3 
#> 2040.77963  -17.66394 6421.15327
tapply(result$cpo$cpo, 
       result$dic$family, ## taken fron dic family id
        function(x) 
         c(nlCPO = -sum(log(x)))) ## -sum log CPO for each outcome
#>           1           2           3 
#> 1024.946229   -8.688735 3211.438629

References

Lindgren, F., H. Rue, and J. Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach (with Discussion).” J. R. Statist. Soc. B 73 (4): 423–98.

SPDE example with multiple kind of outcomes

Joint modeling with `mdata`, `inla.surv` and counts with exposure

Elias T Krainski

2023, March

Abstract

Introduction

The spatial domain

A spatial field

Spatial sub-domains

Covariates

Outcomes

First outcome: aggregated Gaussian

Second outcome: Survival

Third outcome: Count under exposure

Data model setup

SPDE model setup

Linear predictor components

Building data stacks

Fitting the model and some results

Model fitting

Summaries

Results considering each outcome

References

SPDE example with multiple kind of outcomes

Joint modeling with mdata, inla.surv and counts with exposure

Elias T Krainski

2023, March

Abstract

Introduction

The spatial domain

A spatial field

Spatial sub-domains

Covariates

Outcomes

First outcome: aggregated Gaussian

Second outcome: Survival

Third outcome: Count under exposure

Data model setup

SPDE model setup

Linear predictor components

Building data stacks

Fitting the model and some results

Model fitting

Summaries

Results considering each outcome

References

Joint modeling with `mdata`, `inla.surv` and counts with exposure