In the introduction vignette, we reviewed some basic IPM theory and worked through examples of simple IPMs, which only model a single, continuously distributed state variable. General IPMs are distinguished from simple ones in that they incorporate multiple state variables and/or include transitions between continuous and discrete states. These state variables could be, for example, a continuous measure of coral colony surface area and a discrete state to denote aspergillosis infection (in Bruno et al. 2011), a mixture of trees classified by basal diameter and a discrete state for seeds in a seed bank (in Crandall & Knight 2017), or a mixture of multiple continuous and discrete state (e.g. Aikens & Roach 2014). A more comprehensive definition is provided in Ellner & Rees (2006) and in Ellner, Childs & Rees, Chapter 6 (2016).
We’ll start with the least complicated of the general IPMs and build progressively from there. If you’ve already read through the introduction vignette, then most of this will look pretty familiar. If not, it is probably best to at least skim the first few sections before proceeding here.
As noted above, we are now working with models that may have multiple continuous state variables and/or discrete states to describe the demography of our species. Thus, we need to add some more information to the model’s definition in order to fully capture these additional demographic details.
Relative to how we define simple models in ipmr
, there
are now two new bits:
Each kernel that has an integration requires that
d_statevariable
get appended to the kernel
formula
. This is equivalent to the “dz” in \(\int_L^U K(z',z) n(z,t) \mathrm{dz}\).
ipmr
automatically generates this variable internally, so
there is no need to define it in the data_list
or
define_domains()
, we can just add it any of the formula(s)
that we want to. We skipped this step in the simple IPMs because it gets
appended automatically. Unfortunately, it is less easy to infer the
correct state variable to d_z
with for general IPMs where
there may be many continuous and/or discrete states.
The implementation arguments list can now have different values
in the state_start
and state_end
slots. This
is demonstrated in a separate chunk below.
This example will use an IPM for Ligustrum obtusifolium, a
tree species that is now invasive in North America. Data were collected
outside of St. Louis, Missouri, and the full model is described in Levin
et al. 2019. The IPM consists of a single discrete stage (seed bank,
abbreviated b
) and a single continuous state (height,
abbreviated ht
/\(z,z'\)). The census timing was such
that all seeds must enter the seed bank. They can either germinate in
the next year, or they can die (so stay_discrete
= 0).
Thus, the full model takes the following form:
\(n(z', t + 1) = \int_L^U P(z', z) n(z,t)d\mathrm{z} + b(t) * leave\_discrete(z')\)
\(b(t + 1) = \int_L^Ugo\_discrete(z) n(z,t)d\mathrm{z}+ stay\_discrete\)
\(P(z',z) = s(z) * G(z',z)\)
\(go\_discrete(z) = f_s(z) * f_r(z) * g_i\)
\(leave\_discrete(z') = e_p * f_d(z')\)
\(stay\_discrete = 0\)
\(f_G\) and \(f_{r_d}\) denote Normal probability density functions. The vital rates functions and example code to fit them are:
survival (s
): A logistic regression with a squared
term
Example code:
glm(surv ~ ht_1 + I(ht_1)^2, data = my_surv_data, family = binomial())
Mathematical form: \(Logit(s(z)) = \alpha_s + \beta_{s,1} * z + \beta_{s,2} * z^2\)
growth (g
): A linear regression
example code:
lm(ht_2 ~ ht_1, data = grow_data)
Mathematical form:
\(\mu_G = \alpha_G + \beta_G * z\)
\(G(z',z) = f_G(\mu_G, \sigma_G)\)
flowering probability (r_r
): A logistic
regression
example code:
glm(repro ~ ht_1, data = my_repro_data, family = binomial())
Mathematical form: \(Logit(r_r(z)) = \alpha_{r_r} + \beta_{r_r} * z\)
seed production (r_s
): A Poisson regression
example code:
glm(seeds ~ ht_1, data = my_seed_data, family = poisson())
Mathematical form: \(Log(r_s(z)) = \alpha_{r_s} + \beta_{r_s} * z\)
Recruit size distribution (r_d
): A normal
distribution with mean r_d_mu
and standard deviation
r_d_sd
.
example code:
r_d_mu <- mean(my_recr_data$ht_2, na.rm = TRUE)
and
r_d_sd <- sd(my_rer_data$ht_2, na.rm = TRUE)
.
Mathematical form: \(r_d(z') = f_{r_d}(\mu_{r_d}, \sigma_{r_d})\)
germination (g_i
) and establishment
(e_p
) are constants. The code below assumes we have our
data in long format (each seed get its own row) and that successful
germination/establishment is coded as 1s, failures are 0s, and seeds we
don’t know the fate of for whatever reason are NA
s.
example code:
g_i <- mean(my_germ_data, na.rm = TRUE)
and
e_p <- mean(my_est_data, na.rm = TRUE)
Mathematical form: \(g_i = 0.5067, e_p = 0.15\)
Below is the code to implement this model. First, we define our our parameters in a list.
library(ipmr)
# Set up the initial population conditions and parameters
data_list <- list(
g_int = 5.781,
g_slope = 0.988,
g_sd = 20.55699,
s_int = -0.352,
s_slope = 0.122,
s_slope_2 = -0.000213,
r_r_int = -11.46,
r_r_slope = 0.0835,
r_s_int = 2.6204,
r_s_slope = 0.01256,
r_d_mu = 5.6655,
r_d_sd = 2.0734,
e_p = 0.15,
g_i = 0.5067
)
Next, we set up two functions to pass into the model. These perform
the inverse logit transformations for the probability of flowering model
(r_r
/\(r_r(z)\)) and
survival model (s
/\(s(z)\)).
# We'll set up some helper functions. The survival function
# in this model is a quadratic function, so we use an additional inverse logit function
# that can handle the quadratic term.
inv_logit <- function(int, slope, sv) {
1/(1 + exp(-(int + slope * sv)))
}
inv_logit_2 <- function(int, slope, slope_2, sv) {
1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}
Now, we’re ready to begin making the IPM kernels. We change the
sim_gen
argument of init_ipm()
to
"general"
.
general_ipm <- init_ipm(sim_gen = "general", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
# We add d_ht to formula to make sure integration is handled correctly.
# This variable is generated internally by make_ipm(), so we don't need
# to do anything else.
formula = s * g * d_ht,
# The family argument tells ipmr what kind of transition this kernel describes.
# it can be "CC" for continuous -> continuous, "DC" for discrete -> continuous
# "CD" for continuous -> discrete, or "DD" for discrete -> discrete.
family = "CC",
# The rest of the arguments are exactly the same as in the simple models
g = dnorm(ht_2, g_mu, g_sd),
g_mu = g_int + g_slope * ht_1,
s = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
data_list = data_list,
states = list(c('ht')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'g')
) %>%
define_kernel(
name = "go_discrete",
formula = r_r * r_s * d_ht,
# Note that now, family = "CD" because it denotes a continuous -> discrete transition
family = 'CD',
r_r = inv_logit(r_r_int, r_r_slope, ht_1),
r_s = exp(r_s_int + r_s_slope * ht_1),
data_list = data_list,
# Note that here, we add "b" to our list in states, because this kernel
# "creates" seeds entering the seedbank
states = list(c('ht', "b")),
uses_par_sets = FALSE
) %>%
define_kernel(
name = 'stay_discrete',
# In this case, seeds in the seed bank either germinate or die, but they
# do not remain for multiple time steps. This can be adjusted as needed.
formula = 0,
# Note that now, family = "DD" because it denotes a discrete -> discrete transition
family = "DD",
# The only state variable this operates on is "b", so we can leave "ht"
# out of the states list
states = list(c('b')),
evict_cor = FALSE
) %>%
define_kernel(
# Here, the family changes to "DC" because it is the discrete -> continuous
# transition
name = 'leave_discrete',
formula = e_p * g_i * r_d,
r_d = dnorm(ht_2, r_d_mu, r_d_sd),
family = 'DC',
data_list = data_list,
# Again, we need to add "b" to the states list
states = list(c('ht', "b")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'r_d')
)
We’ve now defined all of the kernels, next are the implementation
details. These also differ somewhat from simple IPMs. The key difference
in the implementation arguments list lies in the
state_start
and state_end
of each kernel, and
is related to the family
argument of each kernel. Kernels
that begin with one state and end in a different state (e.g. moving from
seed bank to a plant) will have different entries in the
state_start
and state_end
slots. It is very
important to get these correct, as ipmr
uses this
information to generate the model iteration procedure automatically
(i.e. code corresponding to Equations 1-2).
general_ipm <- general_ipm %>%
define_impl(
list(
P = list(int_rule = "midpoint",
state_start = "ht",
state_end = "ht"),
go_discrete = list(int_rule = "midpoint",
state_start = "ht",
state_end = "b"),
stay_discrete = list(int_rule = "midpoint",
state_start = "b",
state_end = "b"),
leave_discrete = list(int_rule = "midpoint",
state_start = "b",
state_end = "ht")
)
)
An alternative to the list above is to use
make_impl_args_list()
. The chunk above and the chunk below
generate equivalent proto_ipm
objects.
general_ipm <- general_ipm %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "go_discrete", "stay_discrete", "leave_discrete"),
int_rule = c(rep("midpoint", 4)),
state_start = c('ht', "ht", "b", "b"),
state_end = c('ht', "b", "b", 'ht')
)
)
The rest is the same as the simple IPM. We can use our pre-defined
functions in make_ipm
, and we can use pre-defined variables
in define_domains
. We’ll create variables for the upper
(U
) and lower (L
) size bounds for the
population, and the number of meshpoints for integration
(n
). We also define initial population vectors for both
b
and ht
.
# The lower and upper bounds for the continuous state variable and the number
# of meshpoints for the midpoint rule integration. We'll also create the initial
# population vector from a random uniform distribution
L <- 1.02
U <- 624
n <- 500
set.seed(2312)
init_pop_vec <- runif(500)
init_seed_bank <- 20
general_ipm <- general_ipm %>%
define_domains(
# We can pass the variables we created above into define_domains
ht = c(L, U, n)
) %>%
define_pop_state(
# We can also pass them into define_pop_state
pop_vectors = list(
n_ht = init_pop_vec,
n_b = init_seed_bank
)
) %>%
make_ipm(iterations = 100,
usr_funs = list(inv_logit = inv_logit,
inv_logit_2 = inv_logit_2))
# lambda is a generic function to compute per-capita growth rates. It has a number
# of different options depending on the type of model
lambda(general_ipm)
# If we are worried about whether or not the model converged to stable
# dynamics, we can use the exported utility is_conv_to_asymptotic. The default
# tolerance for convergence is 1e-10, but can be changed with the 'tol' argument.
is_conv_to_asymptotic(general_ipm, tol = 1e-10)
w <- right_ev(general_ipm)
v <- left_ev(general_ipm)
Our model is now built! We can explore the asymptotic population
growth rate with the lambda()
function, which will work on
any object made my make_ipm()
. The same is true for
is_conv_to_asymptotic()
, which is a helper to function to
figure out if we’ve set the number of iterations
high
enough to actually reach asymptotic population dynamics.
left_ev
and right_ev
work for general
deterministic IPMs as well. Stochastic versions are in the works, but
are not yet implemented.
Say we wanted to calculate the mean and variance of life span conditional on initial state \(z_0\). This requires us to generate the IPM’s fundamental operator, \(N\), which is computed as follows: \(N = (I - P)^{-1}\). The following chunk is a function to compute average lifespan as a function of initial size. Note that we omit the fecundity from these calculations entirely, as our model does not assume that reproduction imposes a cost on survival.
The formula for mean lifespan is given as \(\bar{\eta}(z_0) = eN\) (where \(e\) is a constant function such that \(e(z)\equiv1\)), and the formula for its
variance is \(\sigma^2_\eta(z_0) = e(2N^2-N) -
(eN)^2\). Since the \(e\)
function has the effect of summing columns, we’ll replace that with the
R function colSums
. Our second function,
sigma_eta
, will make use of %^%
from
ipmr
, which multiples matrices by themselves, rather than
the point-wise power provided by ^
. More details on the
math underlying these calculations are provided in Ellner, Childs, &
Rees (2016), chapter 3.
make_N <- function(ipm) {
P <- ipm$sub_kernels$P
I <- diag(nrow(P))
N <- solve(I - P)
return(N)
}
eta_bar <- function(ipm) {
N <- make_N(ipm)
out <- colSums(N)
return(as.vector(out))
}
sigma_eta <- function(ipm) {
N <- make_N(ipm)
out <- colSums(2 * (N %^% 2L) - N) - colSums(N) ^ 2
return(as.vector(out))
}
mean_l <- eta_bar(general_ipm)
var_l <- sigma_eta(general_ipm)
mesh_ps <- int_mesh(general_ipm)$ht_1 %>%
unique()
par(mfrow = c(1,2))
plot(mesh_ps, mean_l, type = "l", xlab = expression( "Initial size z"[0]))
plot(mesh_ps, var_l, type = "l", xlab = expression( "Initial size z"[0]))
From prior knowledge, we happen to know that even in the best conditions, Ligustrum obtusifolium individuals usually only live 25-50 years. It appears that our model (wildly) overestimates their average lifespan, and we’d want to think about how to re-parameterize it to more accurately capture mortality events. Our survival model would seem the most likely candidate for re-examination, particularly for young and mid-sized trees. That is a problem for another day though, and so the next section will investigate how to implement general IPMs in discretely varying environments.
Discretely varying parameters can be used to construct general IPMs with little additional effort. These are typically the result of fitting a set of fixed effects vital rate models that include both continuous predictors for size/state and categorical variables like treatment or maturation state. They can also result from mixed effects models, for example, working with conditional modes for a random intercept corresponding to year or site.
ipmr
refers to these as parameter sets, and
abbreviates them par_sets
. For example, a random intercept
for year may be denoted \(\alpha_{yr}\), where \(_{yr}\) provides an index for the different
values that \(\alpha\) can take on.
If you’ve already read the Intro to ipmr
article and the
example above, then there aren’t any new concepts to introduce here.
Below is an example showing how all of these come together.
We’ll use a variation of the model above, simulating some random year-specific intercepts for growth and seed production. Parameters, functions, and kernels that are now time varying have a subscript \(x_{yr}\) appended to them.
\(n(z', t + 1) = \int_L^U P_{yr}(z', z) n(z,t)d\mathrm{z} + b(t) * leave\_discrete(z')\)
\(b(t + 1) = \int_L^Ugo\_discrete_{yr}(z) n(z,t)d\mathrm{z}+ stay\_discrete\)
\(P_{yr}(z',z) = s(z) * G_{yr}(z',z)\)
\(go\_discrete_{yr}(z) = r_{s,yr}(z) * r_r(z) * g_i\)
\(leave\_discrete(z') = e_p * r_d(z')\)
\(stay\_discrete = 0\)
The vital rate models are as follows:
survival (s
): A logistic regression
Example code:
glm(surv ~ ht_1, data = my_surv_data, family = binomial())
Mathematical form: \(Logit(s(z)) = \alpha_s + \beta_{s,1} * z + \beta_{s,2} * z^2\)
growth (g
): A linear mixed effects regression. \(f_G\) denotes a normal probability density
function.
example code:
lmer(ht_2 ~ ht_1 + (1 | year), data = grow_data)
Mathematical form:
\(\mu_{G,yr} = \alpha_G + \alpha_{G,yr} + \beta_G * z\)
\(G_{yr}(z',z) = f_G(z', \mu_{G,yr}, \sigma_G)\)
flowering probability (r_r
): A logistic
regression
example code:
glm(repro ~ ht_1, data = my_repro_data, family = binomial())
Mathematical form: \(Logit(r_r(z)) = \alpha_{r_r} + \beta_{r_r} * z\)
seed production (r_s
): A Poisson mixed effects
regression
example code:
glmer(seeds ~ ht_1 + (1 | year), data = my_seed_data, family = poisson())
Mathematical form: \(Log(r_{s,yr}(z)) = \alpha_{r_s} + \alpha_{r_s, yr} + \beta_{r_s} * z\)
Recruit size distribution (r_d
): A normal
distribution (\(f_{r_d}\)) with mean
r_d_mu
and standard deviation r_d_sd
.
example code:
r_d_mu <- mean(my_recr_data$ht_2, na.rm = TRUE)
and
r_d_sd <- sd(my_recr_data$ht_2, na.rm = TRUE)
.
Mathematical form: \(r_d(z') = f_{r_d}(z', \mu_{r_d}, \sigma_{r_d})\)
germination (g_i
) and establishment
(e_p
) are constants. The code below assumes we have our
data in long format (each seed get its own row) and that successful
germinations/establishments are coded as 1s, failures are 0s, and seeds
we don’t know the fate of for whatever reason are NA
s.
example code:
g_i <- mean(my_germ_data, na.rm = TRUE)
and
e_p <- mean(my_est_data, na.rm = TRUE)
Mathematical form: \(g_i = 0.5067, e_p = 0.15\)
In the next chunk, we’ll set up the data list with all of our constants and make sure the names are what we want them to be. This example generates intercepts for 5 years of data. This can be altered according to your own needs. Then, we’ll define a couple functions to make the vital rate expressions easier to write.
A key aspect here is setting the names on the time-varying parameters
in the data_list
. We are using the form
"g_int_year"
, and substituting the values that
"year"
can take in for the actual index. These don’t need
to be numbers, they can be words, letters, or some combination of the
two. For this example, the years will be represented with the values
1:5
. If we wanted to denote the actual years of the
censuses, we could switch 1:5
to, for example,
2002:2006
.
library(ipmr)
# Set up the initial population conditions and parameters
# Here, we are simulating random intercepts for growth
# and seed production, converting them to a list,
# and adding them into the list of constants. Equivalent code
# to produce the output for the output from lmer/glmer
# is in the comments next to each line
all_g_int <- as.list(rnorm(5, mean = 5.781, sd = 0.9)) # as.list(unlist(ranef(my_growth_model)))
all_r_s_int <- as.list(rnorm(5, mean = 2.6204, sd = 0.3)) # as.list(unlist(ranef(my_seed_model)))
names(all_g_int) <- paste("g_int_", 1:5, sep = "")
names(all_r_s_int) <- paste("r_s_int_", 1:5, sep = "")
constant_list <- list(
g_slope = 0.988,
g_sd = 20.55699,
s_int = -0.352,
s_slope = 0.122,
s_slope_2 = -0.000213,
r_r_int = -11.46,
r_r_slope = 0.0835,
r_s_slope = 0.01256,
r_d_mu = 5.6655,
r_d_sd = 2.0734,
e_p = 0.15,
g_i = 0.5067
)
all_params <- c(constant_list, all_g_int, all_r_s_int)
# The lower and upper bounds for the continuous state variable and the number
# of meshpoints for the midpoint rule integration.
L <- 1.02
U <- 624
n <- 500
set.seed(2312)
init_pop_vec <- runif(500)
init_seed_bank <- 20
# add some helper functions. The survival function
# in this model is a quadratic function, so we use an additional inverse logit function
# that can handle the quadratic term.
inv_logit <- function(int, slope, sv) {
1/(1 + exp(-(int + slope * sv)))
}
inv_logit_2 <- function(int, slope, slope_2, sv) {
1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}
Next, we will set up our sub-kernels. We’ll append the
_year
suffix to the kernel name
, the vital
rate expressions and parameter values that have time-varying components
(g_year
, f_s_year
, g_int_year
,
f_s_int_year
), and the call to
truncated_distributions
so that the growth kernel is
correctly specified. We’ll also pass a list(year = 1:5))
into the par_set_indices
argument of
define_kernel()
. When building the model,
make_ipm()
automatically expands these expressions to
create 5 different expressions containing the various levels of
year
.
general_stoch_kern_ipm <- init_ipm(sim_gen = "general",
di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
# The kernel name gets indexed by _year to denote that there
# are multiple possible kernels we can build with our parameter set.
# The _year gets substituted by the values in "par_set_indices" in the
# output, so in this example we will have P_1, P_2, P_3, P_4, and P_5
name = "P_year",
# We also add _year to "g" to signify that it is going to vary across kernels.
formula = s * g_year * d_ht,
family = "CC",
# Here, we add the suffixes again, ensuring they are expanded and replaced
# during model building by the parameter names
g_year = dnorm(ht_2, g_mu_year, g_sd),
g_mu_year = g_int_year + g_slope * ht_1,
s = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
data_list = all_params,
states = list(c('ht')),
# We set uses_par_sets to TRUE, signalling that we want to expand these
# expressions across all levels of par_set_indices.
uses_par_sets = TRUE,
par_set_indices = list(year = 1:5),
evict_cor = TRUE,
# we also add the suffix to `target` here, because the value modified by
# truncated_distributions is time-varying.
evict_fun = truncated_distributions('norm',
target = 'g_year')
) %>%
define_kernel(
# again, we append the index to the kernel name, vital rate expressions,
# and in the model formula.
name = "go_discrete_year",
formula = r_r * r_s_year * g_i * d_ht,
family = 'CD',
r_r = inv_logit(r_r_int, r_r_slope, ht_1),
# Again, we modify the left and right hand side of this expression to
# show that there is a time-varying component
r_s_year = exp(r_s_int_year + r_s_slope * ht_1),
data_list = all_params,
states = list(c('ht', "b")),
uses_par_sets = TRUE,
par_set_indices = list(year = 1:5)
) %>%
define_kernel(
# This kernel has no time-varying parameters, and so is not indexed.
name = 'stay_discrete',
# In this case, seeds in the seed bank either germinate or die, but they
# do not remain for multipe time steps. This can be adjusted as needed.
formula = 0,
# Note that now, family = "DD" becuase it denotes a discrete -> discrete transition
family = "DD",
states = list(c('b')),
# This kernel has no time-varying parameters, so we don't need to designate
# it as such.
uses_par_sets = FALSE,
evict_cor = FALSE
) %>%
define_kernel(
# This kernel also doesn't get a index, because there are no varying parameters.
name = 'leave_discrete',
formula = e_p * r_d,
r_d = dnorm(ht_2, r_d_mu, r_d_sd),
family = 'DC',
data_list = all_params,
states = list(c('ht', "b")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'r_d')
) %>%
define_impl(
make_impl_args_list(
# We add suffixes to the kernel names here to make sure they match the names
# we specified above.
kernel_names = c("P_year", "go_discrete_year", "stay_discrete", "leave_discrete"),
int_rule = c(rep("midpoint", 4)),
state_start = c('ht', "ht", "b", "b"),
state_end = c('ht', "b", "b", 'ht')
)
) %>%
define_domains(
ht = c(L, U, n)
) %>%
define_pop_state(
n_ht = init_pop_vec,
n_b = init_seed_bank
) %>%
make_ipm(
iterations = 100,
# We can specify a sequence of kernels to select for the simulation.
# This helps others to reproduce what we did,
# and lets us keep track of the consequences of different selection
# sequences for population dynamics.
kernel_seq = sample(1:5, size = 100, replace = TRUE),
usr_funs = list(inv_logit = inv_logit,
inv_logit_2 = inv_logit_2)
)
100 isn’t too many iterations, but hopefully this demonstrates how to set up and implement such a model. There are a number of slots in the output that may be useful for further analyses.
env_seq
: This contains a character vector which
shows the sequence in which levels of the time-varying components were
chosen during model iteration. We could use this to reproduce the model
outputs later.
pop_state$lambda
: This shows the single timestep
per-capita growth rates for the simulation.
Additionally, there are functions to compute the stochastic
population growth rate (\(\lambda_s\),
computed as mean(log(pop_state$lambda))
, burn in can be
controlled with the burn_in
parameter), and the mean
sub-kernels.
mean_kernels <- mean_kernel(general_stoch_kern_ipm)
lam_s <- lambda(general_stoch_kern_ipm, burn_in = 0.15) # Remove first 15% of iterations
We can also use continuously varying parameters to construct general
IPMs. These are good tools for exploring the consequences of
environmental variation on demographic rates. ipmr
handles
these using define_env_state()
, which can take both
functions and data and generate draws from distributions during each
iteration of the model.
This example includes survival and growth models that make use of environmental covariates. The model can be written as:
\(n(z', t+1) = \int_L^U K(z',z, \theta)n(z,t)dz + B(t) * r_s * r_d * f_{c_d}(z')\)
\(B(t + 1) = r_e * r_s * \int_L^U c_r(z) * c_s(z)n(z,t)dz + B(t) * r_s * r_r\)
\(K(z',z,\theta) = P(z',z,\theta) + F(z',z)\)
\(P(z',z,\theta) = s(z, \theta) * G(z',z,\theta)\)
\(F(z',z) = c_r(z) * c_s(z) * c_d(z') * (1 - r_e)\)
\(\theta\) is a vector of time-varying environmental parameters. For this example, we’ll use a Gaussian distribution for temperature and a Gamma distribution for precipitation:
\(\theta_t \sim Norm(\mu = 8.9, \sigma = 1.2)\)
\(\theta_p \sim Gamma(k = 1000, \beta = 2)\)
Next, we’ll write out the actual vital rate functions in the model:
survival (s
/\(s(z,\theta)\)): a logistic regression.
Example model formula:
glm(survival ~ size_1 + temp + precip, data = my_surv_data, family = binomial())
Mathematical form: \(Logit(s(z,\theta)) = \alpha_s + \beta_s^z * z + \beta_s^t * temp + \beta_s^p * precip\)
growth (g
, \(G(z',z,\theta)\)): a linear regression
with a Normal error distribution (denoted \(f_G\))
Example model formula:
glm(size_2 ~ size_1 + temp + precip, data = my_grow_data)
Mathematical form:
\(G(z',z) = f_G(z', \mu_G(z,\theta), \sigma_G)\)
\(\mu_G(z, \theta) = \alpha_G + \beta_G^z * z + \beta_G^t * temp + \beta_G^p * precip\)
flower probability (c_r
/\(c_r(z)\)): A logistic regression.
Example model formula:
glm(repro ~ size_1, data = my_repro_data, family = binomial())
Mathematical form: \(Logit(c_r(z)) = \alpha_{c} + \beta_{c_r} * z\)
seed production (c_s
/\(c_s(z)\): a logistic regression.
Example model formula:
glm(flower_n ~ size_1, data = my_flower_data, family = poisson())
Mathematical form: \(Log(c_s(z)) = \alpha_{c_s} + \beta_{c_s} * z\)
recruit sizes (c_d
/\(c_d(z')\)): A Normal distribution
(denoted \(f_{c_d}\))
Example code: mean (c_d_mu
)
mean(my_recruit_data$size_2, na.rm = TRUE)
and standard
deviation (c_d_sd
)
sd(my_recruit_data$size_2, na.rm = TRUE)
Mathematical form: \(c_d(z') = f_{c_d}(z', \mu_{f_d}, \sigma_{f_d})\)
Constants: Discrete stage survival (r_s
), discrete
stage entrance probability (r_e
), discrete stage departure
probability conditional on survival (r_d
), and probability
of remaining in discrete stage (r_r
).
First, we’ll specify the constant parameters. We keep all values
related to demography in the data_list
.
library(ipmr)
# Define the fixed parameters in a list
constant_params <- list(
s_int = -5,
s_slope = 2.2,
s_precip = 0.0002,
s_temp = -0.003,
g_int = 0.2,
g_slope = 1.01,
g_sd = 1.2,
g_temp = -0.002,
g_precip = 0.004,
c_r_int = 0.3,
c_r_slope = 0.03,
c_s_int = 0.4,
c_s_slope = 0.01,
c_d_mu = 1.1,
c_d_sd = 0.1,
r_e = 0.3,
r_d = 0.3,
r_r = 0.2,
r_s = 0.2
)
In addition to creating the standard data_list
, we need
to create a function to sample the environmental covariates, and a list
of parameters that generate the environmental covariates. The function
needs to return a named list. The names in the returned list can be
referenced in vital rate expressions, kernel formulas, etc. as if we had
specified them in the data_list
. make_ipm()
only runs the functions in define_env_state()
once per
model iteration. This ensures that parameters from joint distributions
can be used consistently across kernels without losing user-any
specified correlations.
# Now, we create a set of environmental covariates. In this example, we use
# a normal distribution for temperature and a Gamma for precipitation.
env_params <- list(
temp_mu = 8.9,
temp_sd = 1.2,
precip_shape = 1000,
precip_rate = 2
)
# We define a wrapper function that samples from these distributions
sample_env <- function(env_params) {
# We generate one value for each covariate per iteration, and return it
# as a named list.
temp_now <- rnorm(1,
env_params$temp_mu,
env_params$temp_sd)
precip_now <- rgamma(1,
shape = env_params$precip_shape,
rate = env_params$precip_rate)
# The vital rate expressions can now use the names "temp" and "precip"
# as if they were in the data_list.
out <- list(temp = temp_now, precip = precip_now)
return(out)
}
# Again, we can define our own functions and pass them into calls to make_ipm. This
# isn't strictly necessary, but can make the model code more readable/less error prone.
inv_logit <- function(lin_term) {
1/(1 + exp(-lin_term))
}
We are now ready to begin implementing the model. This next chunk
should look familiar, with the caveat that we have now add the terms
temp
and precip
to vital rates in the
P
kernel.
general_stoch_param_model <- init_ipm(sim_gen = "general",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = "P_stoch",
family = "CC",
# As in the examples above, we have to add the d_surf_area
# to ensure the integration of the functions is done.
formula = s * g * d_surf_area,
# We can reference continuously varying parameters by name
# in the vital rate expressions just as before, even though
# they are passed in define_env_state() as opposed to the kernel's
# data_list
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c("surf_area")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = c_r * c_s * c_d * (1 - r_e) * d_surf_area,
c_r_lin_p = c_r_int + c_r_slope * surf_area_1,
c_r = inv_logit(c_r_lin_p),
c_s = exp(c_s_int + c_s_slope * surf_area_1),
c_d = dnorm(surf_area_2, c_d_mu, c_d_sd),
data_list = constant_params,
states = list(c("surf_area")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
) %>%
define_kernel(
# Name can be anything, but it helps to make sure they're descriptive
name = "go_discrete",
# Family is now "CD" because it is a continuous -> discrete transition
family = "CD",
formula = r_e * r_s * c_r * c_s * d_surf_area,
c_r_lin_p = c_r_int + c_r_slope * surf_area_1,
c_r = inv_logit(c_r_lin_p),
c_s = exp(c_s_int + c_s_slope * surf_area_1),
data_list = constant_params,
states = list(c("surf_area", "sb")),
uses_par_sets = FALSE,
# There is not eviction to correct here, so we can set this to false
evict_cor = FALSE
) %>%
define_kernel(
name = "stay_discrete",
family = "DD",
formula = r_s * r_r,
data_list = constant_params,
states = list("sb"),
uses_par_sets = FALSE,
evict_cor = FALSE
) %>%
define_kernel(
name = "leave_discrete",
family = "DC",
formula = r_d * r_s * c_d,
c_d = dnorm(surf_area_2, c_d_mu, c_d_sd),
data_list = constant_params,
states = list(c("surf_area", "sb")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P_stoch",
"F",
"go_discrete",
"stay_discrete",
"leave_discrete"),
int_rule = rep("midpoint", 5),
state_start = c("surf_area",
"surf_area",
"surf_area",
"sb",
"sb"),
state_end = c("surf_area",
"surf_area",
"sb",
"sb",
"surf_area")
)
) %>%
define_domains(
surf_area = c(0, 10, 100)
)
Now, we need to define_env_state()
. This consists of two
parts - specifying expressions that generate environmental covariates,
and supplying the information needed to evaluate those expressions. In
this example, we want to use the env_params
in
sample_env
(i.e. sample_env(env_params)
).
define_env_state
uses the data_list
argument
to supply the env_params
, and the call to
sample_env
goes into the ...
portion of the
function. Here, we assign the result to env_covs
. The name
env_covs
isn’t important for specifying vital rate
expressions and you could call it whatever you want, but it must be
named something in order to work properly. The only thing that needs to
match are the names in the list that sample_env
returns,
and the names used in the vital rate expressions/kernels.
Note that you can pass the sample_env
function in either
the data_list
of define_env_state()
, or the
usr_funs
argument of make_ipm()
.
# In the first version, sample_env is provided in the data_list of
# define_env_state.
general_stoch_param_ipm <- define_env_state(
proto_ipm = general_stoch_param_model,
env_covs = sample_env(env_params),
data_list = list(env_params = env_params,
sample_env = sample_env)
) %>%
define_pop_state(
n_surf_area = runif(100),
n_sb = rpois(1, 20)
) %>%
make_ipm(usr_funs = list(inv_logit = inv_logit),
iterate = TRUE,
iterations = 100)
# in the second version, sample_env is provided in the usr_funs list of
# make_ipm(). These two versions are equivalent.
general_stoch_param_ipm <- define_env_state(
proto_ipm = general_stoch_param_model,
env_covs = sample_env(env_params),
data_list = list(env_params = env_params)
) %>%
define_pop_state(
n_surf_area = runif(100),
n_sb = rpois(1, 20)
) %>%
make_ipm(usr_funs = list(inv_logit = inv_logit,
sample_env = sample_env),
iterate = TRUE,
iterations = 100,
return_sub_kernels = TRUE)
Stochastic parameter-resampled models also return an
env_seq
, but this time it will be a data.frame
of parameter draws rather than a character vector. We can also compute
mean sub-kernels and \(\lambda_s\) as
we did in the kernel-resampled models. This time, each mean kernel is
computed as the average of each sub-kernel over the course of the
simulation (i.e. mean of all P
kernels). Some sub-kernels
in our example are not time-varying - they will be numerically
equivalent to the sub-kernels stored in the IPM object.
env_draws <- general_stoch_param_ipm$env_seq
mean_kerns <- mean_kernel(general_stoch_param_ipm)
lam_s <- lambda(general_stoch_param_ipm)
all.equal(mean_kerns$mean_F,
general_stoch_param_ipm$sub_kernels$F_it_1,
check.attributes = FALSE)
Longer running stochastic parameter-resampled models can take up a
lot of space in memory when all of the sub-kernels are saved from each
iteration. For example, running the model above for 10,000 iterations
would result in 20,000 \(100 \times
100\) matrices, 20,000 \(1 \times
100\)/\(100 \times 1\) matrices,
and 10,000 \(1 \times 1\) matrices in
the sub_kernels
slot of the IPM object (~16.4 GB of RAM).
This will likely result in crashes as smaller machines run out of
available RAM. Therefore, make_ipm()
contains an argument
return_sub_kernels
for these types of models that allows
you to switch off that behavior and conserve available RAM. By default,
this is set to FALSE
. If you need sub-kernels for
downstream analyses, set this option to TRUE
and make sure
you have a computer with sufficient RAM (64-128 GB range is likely
required to store all information for longer running models).
These warnings also apply to all density dependent model
classes and the same return_sub_kernels
argument can be
used for those as well.
Sometimes, we may want to work with our sub-kernels arranged into a
single block kernel. This isn’t required for any of the code in
ipmr
, except for the plot
method for
general_di_det
IPMs. Other use cases may arise for analyses
not included in this package though, so below is a brief overview of how
to generate those.
make_iter_kernel()
takes an IPM object and a vector of
symbols (for interactive use) or a character version of the expression
(for programming) showing where each sub-kernel should go. It works in
ROW MAJOR order. This example will use the model from
the general deterministic example at the top of the article.
First, re-run the model to create the IPM object (if you haven’t already).
data_list <- list(
g_int = 5.781,
g_slope = 0.988,
g_sd = 20.55699,
s_int = -0.352,
s_slope = 0.122,
s_slope_2 = -0.000213,
r_r_int = -11.46,
r_r_slope = 0.0835,
r_s_int = 2.6204,
r_s_slope = 0.01256,
r_d_mu = 5.6655,
r_d_sd = 2.0734,
e_p = 0.15,
g_i = 0.5067
)
L <- 1.02
U <- 624
n <- 500
set.seed(2312)
init_pop_vec <- runif(500)
init_seed_bank <- 20
# Initialize the state list and add some helper functions. The survival function
# in this model is a quadratic function.
inv_logit <- function(int, slope, sv) {
1/(1 + exp(-(int + slope * sv)))
}
inv_logit_2 <- function(int, slope, slope_2, sv) {
1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}
general_ipm <- init_ipm(sim_gen = "general",
di_dd = "di",
det_stoch = "det") %>%
define_kernel(
name = "P",
formula = s * g * d_ht,
family = "CC",
g = dnorm(ht_2, g_mu, g_sd),
g_mu = g_int + g_slope * ht_1,
s = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
data_list = data_list,
states = list(c('ht')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'g')
) %>%
define_kernel(
name = "go_discrete",
formula = r_r * r_s * g_i,
family = 'CD',
r_r = inv_logit(r_r_int, r_r_slope, ht_1),
r_s = exp(r_s_int + r_s_slope * ht_1),
data_list = data_list,
states = list(c('ht', "b")),
uses_par_sets = FALSE
) %>%
define_kernel(
name = 'stay_discrete',
formula = 0,
family = "DD",
states = list(c('ht', "b")),
evict_cor = FALSE
) %>%
define_kernel(
name = 'leave_discrete',
formula = e_p * r_d,
r_d = dnorm(ht_2, r_d_mu, r_d_sd),
family = 'DC',
data_list = data_list,
states = list(c('ht', "b")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'r_d')
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "go_discrete", "stay_discrete", "leave_discrete"),
int_rule = c(rep("midpoint", 4)),
state_start = c('ht', "ht", "b", "b"),
state_end = c('ht', "b", "b", 'ht')
)
) %>%
define_domains(
ht = c(L, U, n)
) %>%
define_pop_state(
pop_vectors = list(
n_ht = init_pop_vec,
n_b = init_seed_bank
)
) %>%
make_ipm(iterations = 100,
usr_funs = list(inv_logit = inv_logit,
inv_logit_2 = inv_logit_2))
Now, we specify which kernel belongs where in row major order, using
a call to c()
.
mega_mat <- make_iter_kernel(ipm = general_ipm,
mega_mat = c(
stay_discrete, go_discrete,
leave_discrete, P
))
# These values should be almost identical, so this should ~0
Re(eigen(mega_mat[[1]])$values[1]) - lambda(general_ipm)
Say we wanted to program with this function. Passing bare expression
is difficult programatically, and how to do that is not really within
the scope of this vignette (though if you’re interested in learning how,
this
is a good start). make_iter_kernel()
also accepts text
strings in the same format as above.
# Get the names of each sub_kernel
sub_k_nms <- names(general_ipm$sub_kernels)
mega_mat_text <- c(sub_k_nms[3], sub_k_nms[2], sub_k_nms[4], sub_k_nms[1])
mega_mat_2 <- make_iter_kernel(general_ipm,
mega_mat = mega_mat_text)
# Should be TRUE
identical(mega_mat, mega_mat_2)
make_iter_kernel()
can also handle cases where you need
blocks of 0s or identity matrices. These are specified using
0
for 0s, and I
for identity matrices.
make_iter_kernel()
automatically works out the correct
dimensions internally, so you don’t need to worry about specifying
those.
Below is an example that inserts 0s and identity matrices on the
off-diagonals with the P
kernel duplicated along the
diagonal.
mega_mat <- make_iter_kernel(general_ipm,
mega_mat = c(P, 0,
I, P))
Finally, make_iter_kernel
supports ipmr
’s
parameter set index syntax as well, enabling us to generate a list of
mega-kernels for each combination of parameter set values. We’ll re-run
the "general_di_stoch_kern"
example from above to
demonstrate this.
all_g_int <- as.list(rnorm(5, mean = 5.781, sd = 0.9))
all_f_s_int <- as.list(rnorm(5, mean = 2.6204, sd = 0.3))
names(all_g_int) <- paste("g_int_", 1:5, sep = "")
names(all_f_s_int) <- paste("f_s_int_", 1:5, sep = "")
constant_list <- list(
g_slope = 0.988,
g_sd = 20.55699,
s_int = -0.352,
s_slope = 0.122,
s_slope_2 = -0.000213,
f_r_int = -11.46,
f_r_slope = 0.0835,
f_s_slope = 0.01256,
f_d_mu = 5.6655,
f_d_sd = 2.0734,
e_p = 0.15,
g_i = 0.5067
)
all_params <- c(constant_list, all_g_int, all_f_s_int)
L <- 1.02
U <- 624
n <- 500
set.seed(2312)
init_pop_vec <- runif(500)
init_seed_bank <- 20
inv_logit <- function(int, slope, sv) {
1/(1 + exp(-(int + slope * sv)))
}
inv_logit_2 <- function(int, slope, slope_2, sv) {
1/(1 + exp(-(int + slope * sv + slope_2 * sv ^ 2)))
}
general_stoch_kern_ipm <- init_ipm(sim_gen = "general",
di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
name = "P_year",
formula = s * g_year * d_ht,
family = "CC",
g_year = dnorm(ht_2, g_mu_year, g_sd),
g_mu_year = g_int_year + g_slope * ht_1,
s = inv_logit_2(s_int, s_slope, s_slope_2, ht_1),
data_list = all_params,
states = list(c('ht')),
uses_par_sets = TRUE,
par_set_indices = list(year = 1:5),
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'g_year')
) %>%
define_kernel(
name = "go_discrete_year",
formula = f_r * f_s_year * g_i * d_ht,
family = 'CD',
f_r = inv_logit(f_r_int, f_r_slope, ht_1),
f_s_year = exp(f_s_int_year + f_s_slope * ht_1),
data_list = all_params,
states = list(c('ht', "b")),
uses_par_sets = TRUE,
par_set_indices = list(year = 1:5)
) %>%
define_kernel(
name = 'stay_discrete',
formula = 0,
family = "DD",
states = list(c('b')),
uses_par_sets = FALSE,
evict_cor = FALSE
) %>%
define_kernel(
name = 'leave_discrete',
formula = e_p * f_d * d_ht,
f_d = dnorm(ht_2, f_d_mu, f_d_sd),
family = 'DC',
data_list = all_params,
states = list(c('ht', "b")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions('norm',
'f_d')
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P_year", "go_discrete_year", "stay_discrete", "leave_discrete"),
int_rule = c(rep("midpoint", 4)),
state_start = c('ht', "ht", "b", "b"),
state_end = c('ht', "b", "b", 'ht')
)
) %>%
define_domains(
ht = c(L, U, n)
) %>%
define_pop_state(
n_ht = init_pop_vec,
n_b = init_seed_bank
) %>%
make_ipm(
iterations = 10,
kernel_seq = sample(1:5, size = 10, replace = TRUE),
usr_funs = list(inv_logit = inv_logit,
inv_logit_2 = inv_logit_2)
)
Next, we call make_iter_kernel()
using the kernel names
like so:
block_list <- make_iter_kernel(general_stoch_kern_ipm,
mega_mat = c(stay_discrete, go_discrete_year,
leave_discrete, P_year))
make_iter_kernel()
also works for simple models, but
assumes that all sub-kernels are combined additively (i.e. \(K(z',z) = P(z',z) + F('z,z)\)).
It can handle parameter set index syntax as well, but does not require
the mega_mat
argument, and can just be called with an IPM
object.