ipmr
is a package for implementing integral projection
models of varying degrees of complexity. It uses mathematical-ish
expressions to build up the iteration kernels from smaller pieces, as
well as helpers to ensure that models are implemented correctly.
Finally, it provides machinery for stochastic simulations. basic
analyses, and model diagnostics. More complicated analysis functions are
being implemented in a separate package.
This package does not help with fitting regression models to
demographic data! This is a distinct enough problem that it
should not be in the purview of this package - and there are much better
tools out there already that can do a much better job of helping you
with that than I can. Some of my favorites are lme4
,
brms
, mgcv
, and nlme
.
IPMpack
handles the regression modeling and IPM
construction for some, but not all, types of IPMs. IPMpack
is no longer on CRAN, but can be installed from the
MetaCRAN/IPMpack
Github page or running
devtools::install_github("CRAN/IPMpack")
.
Unfortunately, ipmr
does not yet have built-in functions
for dealing with uncertainty. The goal is to change that soon, but for
now, doing this will require for
loops and storing the
results of each iteration. Some example code is provided at the end of
this vignette.
NB: If you are already familiar with how to build IPMs, you can probably skip this part and get straight into the next section.
An IPM describes how the abundance and distribution of trait values (also called state variables/states, and denoted \(z\) and \(z'\)) for a population changes in discrete time. The distribution of trait values in a population at time \(t\) is given by the function \(n(z,t)\). A simple IPM for the trait distribution \(z'\) at time \(t+1\) is then:
\(K(z',z)\), known as the projection kernel, describes all possible transitions of existing individuals and recruitment of new individuals from \(t\) to \(t+1\), generating a new trait distribution \(n(z',t+1)\). \(L,U\) are the lower and upper bounds for values that the trait \(z\) can have, which defines the domain over which the integration is performed. The integral \(\int_L^Un(z,t)\mathrm{dz}\) gives the total population size at time \(t\).
To make the model more biologically interpretable, the projection kernel \(K(z',z)\) is usually decomposed into sub-kernels (Eq 2). For example, a projection kernel to describe a lifecycle where individuals can survive, transition to different state values, and reproduce via sexual and asexual pathways, can be decomposed as follows:
\(P(z',z)\) is a sub-kernel describing transitions due to survival and trait changes of existing individuals, \(F(z',z)\) is a sub-kernel describing per-capita sexual contributions of existing individuals to recruitment, and \(C(z',z)\) is a sub-kernel describing per-capita asexual contributions of existing individuals to recruitment. The sub-kernels are typically comprised of functions derived from regression models that relate an individuals trait value \(z\) at time \(t\) to a new trait value at \(t+1\). For example, the \(P\) kernel for Soay sheep (Ovis aries) on St. Kilda (Eq 3) may contain two regression models: (i) a logistic regression of survival on log body mass (Eq 4), and (ii) a linear regression of log body mass at \(t+1\) on log body mass at \(t\) (Eq 5-6). Here, \(f_G\) is used to denote a normal probability density function, and \(\sigma_G\) is computed as the standard deviation of the residuals of the linear regression.
\(P(z',z) = s(z) * G(z',z)\)
\(Logit(s(z)) = \alpha_s + \beta_s * z\)
\(G(z',z) = f_G(z', \mu_G, \sigma_G)\)
\(\mu_G = \alpha_G + \beta_G * z\)
Projecting the population state \(n(z',t+1)\) requires a solution to the
integral in Eq 1. Analytical solutions to the integral in Eq 1 are
usually not possible (Ellner & Rees 2006). However, numerical
approximations of these integrals can be constructed using a numerical
integration rule. A commonly used rule is the midpoint rule (this is
currently the only integration rule available in ipmr
,
though others should be implemented soon(ish)). The midpoint rule
divides the domain \([L,U]\) into \(m\) artificial size bins centered at \(z_i\) with width \(h = (U-L) / m\). The midpoints \(z_i = L + (i - 0.5) * h\) for \(i = \textrm{1, 2, ...}, m\). The midpoint
rule approximation for Eq 1 then becomes:
In practice, the numerical approximation of the integral converts the continuous projection kernel into a (large) discretized matrix. A matrix multiplication of the discretized projection kernel and the discretized trait distribution then generates a new trait distribution, a process referred to as model iteration (sensu Easterling et al. 2000).
While ipmr
intends to mimic the notation above as
closely as possible, it does provide some abstraction layers. For
example, Equations 1 and 2 are generated internally by
ipmr
, so you will not need to define those.
ipmr
The first step of defining a model in ipmr
is to
initialize the model using init_ipm()
. This function has
five arguments: sim_gen
, di_dd
,
det_stoch
, kern_param
, and
uses_age
. We will ignore uses_age
for now,
because age-size models are less common and have their own
vignette.
The combination of these arguments defines the type of IPM, and makes sure that the machinery for subsequent analyses works correctly. The possible entries for each argument are as follows:
sim_gen
:
"simple"
/"general"
di_dd
: "di"
/"dd"
A. di: This is used to denote a density-independent IPM.
B. dd: This is used to denote a density-dependent IPM.
det_stoch
:
"det"
/"stoch"
A. det: This is used to denote a deterministic
IPM. If this is the third argument of init_ipm
,
kern_param
must be left as NULL
.
B. stoch: This is used to denote a stochastic
IPM. If this is the third argument of init_ipm
,
kern_param
must be specified. The two possibilities for the
fourth are described next.
kern_param
: "kern"
/"param"
(Complete definitions found in Metcalf
et al. 2015)
A. kern: This describes an IPM with discretely
varying parameters such that their values are known before the IPM is
specified. This is usually the case for vital rate models that estimate
discrete fixed and/or random year/site effects and we do not wish to
sample from parameter distributions. These models can be a bit more
computationally efficient than the param
alternative
because all kernels can be constructed before the iteration procedure
begins, as opposed to requiring reconstruction for every single
iteration. ipmr
provides a parameter set index syntax to
help describe these models quickly and safely. See below for more details on this feature.
B. param: This describes an IPM with parameters
that are re-sampled from some distribution at each iteration of the
model. This could be a multivariate normal defined by covarying slopes
and intercepts, or distributions of environmental variables that change
from time to time. All that is required is that the parameters for the
distribution are specified and that the function that generates the
parameters at each iteration returns named lists that correspond to the
parameter names in the model. Jump down to the
"simple_di_stoch_param"
example for some inspiration in
writing those.
The rest of this vignette will deal with simple, density independent IPMs. If you already know that you need a general IPM (i.e. an IPM with discrete stages and/or multiple continuous state variables), I still strongly recommend reading at least one complete example here before you start with those.
This is the simplest model that ipmr
works with. It is
an IPM with a single continuous state variable and no density dependent
functions. We’ll walk through the steps required to implement such an
IPM before getting into more complex models.
The model that we are going to implement here is similar to the one above, except there is not asexual reproduction (i.e. no \(C(z', z)\)). It can be written like this:
\(n(z',t+1) = \int_L^U K(z',z)n(z,t)dz\)
\(K(z',z) = P(z',z) + F(z',z)\)
\(P(z',z) = s(z) * G(z',z)\)
\(Logit(s(z)) = \alpha_s + \beta_s * z\)
\(G(z',z) = f_G(z', \mu_G, \sigma_G)\)
\(\mu_G = \alpha_G + \beta_G * z\)
\(F(z',z) = r_r(z) * r_s(z) * r_d(z')\)
\(Logit(r_r(z)) = \alpha_{r_r} + \beta_{r_r} * z\)
\(Log(r_s(z)) = \alpha_{r_s} + \beta_{r_s} * z\)
\(r_d(z') = f_{r_d}(z', \mu_{r_d}, \sigma_{r_d})\)
\(f_{r_d}\) and \(f_G\) denote normal probability density functions. The vital rate functions and code that might be used to generate models corresponding to them are below.
Survival (\(s(z)\)/s
): a generalized
linear model w/ a logit link.
glm(surv ~ size_1, data = my_surv_data, family = binomial())
Growth (\(G(z',z)\)/G
): a linear
model with a normal error distribution.
lm(size_2 ~ size_1, data = my_grow_data)
Pr(flowering) (\(r_r(z)\)/r_r
): a generalized
linear model w/ a logit link.
glm(flower ~ size_1, data = my_repro_data, family = binomial())
Seed production (\(r_s(z)\)/r_s
): a generalized
linear model w/ log link.
glm(seeds ~ size_1, data = my_flower_data, family = poisson())
Recruit size distribution (\(r_d(z')\)/r_d
): a normal
distribution w parameters mu_rd
(mean) and
sd_rd
(standard deviation).
mu_fd = mean(my_recr_data$size_2, na.rm = TRUE)
and
sd_fd = sd(my_recr_data$size_2, na.rm = TRUE)
The first step is to decide what class of model we want to implement.
We have one continuous state variable and no spatial or temporal
variation to deal with. Thus, we have a simple, density independent,
deterministic IPM. We initialize it with init_ipm()
.
my_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det")
The next step is to define the sub-kernels comprising the IPM. This
corresponds to equations 3-10. These are defined individually with calls
to define_kernel()
.
my_ipm <- define_kernel(
proto_ipm = my_ipm,
name = "P",
# The formula describes how the vital rates generate the sub-kernel. This is
# equivalent to equation 3 above.
formula = s * g,
# Next, we define the vital rate expressions for the P kernel (Eqs 4-6).
# Here, we create an expression for the inverse of the link function from
# our GLM for survival. In this case, it's the inverse logit (Eq 4).
s = 1/(1 + exp(-(s_int + s_slope * dbh_1))),
# Growth is defined by Eqs 5-6 above. There is no inverse link function required
# because we use a Gaussian GLM with an identity link function (i.e. no
# transformation).
g = dnorm(dbh_2, g_mu, g_sd), # Eq 5
g_mu = g_int + g_slope * dbh_1, # Eq 6
# The family describes the type of transition that kernel produces. See below.
family = "CC",
data_list = list(
s_int = 0.2, # coef(my_surv_mod)[1]
s_slope = 0.5, # coef(my_surv_mod)[2]
g_int = 0.1, # coef(my_grow_mod)[1]
g_slope = 1.033, # coef(my_grow_mod)[2]
g_sd = 2.2 # sd(resid(my_grow_mod))
),
# states should be a list of the state variables that the kernel operates on
states = list(c('dbh')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
)
This function takes the kernel name
, the mathematical
formula
for the kernel, and expressions for the vital rates
that comprise it (s
, g
,
g_mu
).
The family
argument refers to the type of transition
that the kernel describes and has 4 options:
"CC"
- a continuous -> continuous
transition
"CD"
- a continuous -> discrete
transition
"DC"
- a discrete -> continuous
transition
"DD"
- a discrete -> discrete transition
These aren’t required for simple
IPMs, as all
transitions are from a continuous state to a continuous state, and so
family = "CC"
every time. If you forget to specify this for
simple models, ipmr
will automatically add it for you.
However, it will throw an error for general models. As such, it is
probably good to get in the habit of specifying it for every kernel.
The data_list
argument holds the constant parameters
(e.g. regression coefficient estimates).
The states
argument is a list containing the names of
the state variables in use. ipmr
internally appends
_1
and _2
to the names in states
list. These correspond the trait values at \(t\) and \(t+1\), respectively (distributions, if
continuously distributed states, single numbers if discrete traits).
They can also be referenced in the vital rate expressions. If they are
continuously distributed state variables, ipmr
will also
generate meshpoints for them that are defined in
define_domains()
(see below).
uses_par_sets
is a logical indicating whether or not
some or all of the model parameters have multiple values they can take
on. These usually correspond vital rates that are estimated from
populations in different years and/or sites. In this example, the model
is a simple, deterministic IPM and so this is set to FALSE
.
There are examples later on that introduce their usage and syntax, so we
will ignore it for now.
evict_cor
refers to whether or not to correct for
eviction in the kernel. If this is set to TRUE
, then you
must supply a function specifying which expressions need to be corrected
and the correction to apply. ipmr
provides
truncated_distributions
for now, though others will
eventually be implemented as well. Subsequent additions are mainly to
accommodate models in PADRINO, and I very strongly suggest sticking to
truncated_distributions
for your own models.
Finally, ipmr
is designed to be pipe-friendly. All functions
prefixed with define_*
take a proto_ipm
as
their first argument and always return a proto_ipm
, meaning
operations can be chained together with the %>%
operator. This function is included in ipmr
, so there is no
need to load any other packages to access it (e.g. dplyr
or
magrittr
). The first chunk below is equivalent to the two
chunks above.
# the %>% takes the result of the first operation and passes it as the first
# argument to the second function.
my_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
formula = s * g,
family = "CC",
s = 1/(1 + exp(-(s_int + s_slope * dbh_1))),
g = dnorm(dbh_2, g_mu, g_sd),
g_mu = g_int + g_slope * dbh_1,
data_list = list(
s_int = -3,
s_slope = 0.5,
g_int = 0.1,
g_slope = 1.033,
g_sd = 2.2
),
states = list(c('dbh')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "g")
)
We’ll now continue with specifying Eqs 7-10, which comprise the sexual reproduction portion of the IPM.
The rest of the model definition sequence in this example will use
the %>%
operator. It is not a requirement - you
can assign a value to my_ipm
at each step and the model
will be identical. Choose which ever process is more comfortable for
you!
my_ipm <- my_ipm %>%
define_kernel(
name = "F",
formula = r_r * r_s * r_d, # Eq 7
family = "CC",
# Eq 8. Again, we use the inverse logit transformation to compute pr(flowering)
r_r = 1/(1 + exp(-(r_r_int + r_r_slope * dbh_1))),
# Eq 9. We exponentiate this because of the log link in our seed production model
r_s = exp(r_s_int + r_s_slope * dbh_1),
# Eq 10. In this case, both the mean and standard deviation are constants
r_d = dnorm(dbh_2, r_d_mu, r_d_sd),
data_list = list(
r_r_int = -5, # coef(my_flower_mod)[1]
r_r_slope = 0.1, # coef(my_flower_mod)[2]
r_s_int = -3, # coef(my_seed_mod)[1]
r_s_slope = 0.03, # coef(my_seed_mod)[2]
r_d_mu = 1.2, # mean(my_recr_data$size_2, na.rm = TRUE)
r_d_sd = 0.7 # sd(my_recr_data$size_2, na.rm = TRUE)
),
states = list(c('dbh')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
)
We are now ready to describe how to implement the model numerically.
impl_args
)
Next, we need to define how the kernels are implemented numerically,
and how they act on state values at time \(t\) to create new state values at \(t+1\). This step is where we define the
integration rule int_rule
, the state the kernels modify
state_start
, and the state that they produce
state_end
. This is done with the define_impl()
function. It takes a named list where names correspond to the kernel
names, and each entry is itself a list containing 3 slots:
int_rule
, state_start
, and
state_end
. That information is used to generate Eq 1 from
above internally - you will not ever need to define Eqs 1-2 when using
ipmr
.
my_ipm <- my_ipm %>%
define_impl(
# Create one list of lists with all the info
list(
# The components of the big list should be the sub-kernels. Each sub-kernel
# requires 3 pieces of information: the numerical integration rule, the
# state variable it modifies (state_start), and the state variable it
# generates (state_end).
P = list(int_rule = "midpoint",
state_start = "dbh",
state_end = "dbh"),
F = list(int_rule = "midpoint",
state_start = "dbh",
state_end = "dbh")
)
)
Because this format is a bit specific, there is a helper function
that can be called within define_impl
or called before
initializing the IPM to make sure everything is formatted correctly -
make_ipml_args_list()
.
The first argument to the make_ipml_args_list()
function
is kernel_names
. This is a character vector with kernel
names for which the implementation arguments are being supplied. These
should be the same as name
from
define_kernel
.
Next, int_rule
is a character vector, and currently
'midpoint'
is the only option that is implemented.
'b2b'
(bin to bin method) and 'cdf'
(cumulative density function) are on the to-do list. Additional rules
may be added if there is more demand for certain ones.
state_start
and state_end
are always the
same in simple_*
IPMs. In general_*
ones, they
may be different as individuals move from, for example, discrete to
continuous states or vice versa. These must always match the names
provided in the states
list.
Elements of each vector in each argument in
make_impl_args_list()
are matched by position. Thus, if you
specify "P"
as the first entry in
kernel_names
, then the first entries of
int_rule
, state_start
, and
state_end
should all correspond to the P
kernel. If you specify "F"
as the second entry, the second
entries in int_rule
, state_start
, and
state_end
should be the rules that correspond to the
F
kernel, and so on.
It is important to note that make_impl_args_list
does not return a proto_ipm
. Thus, it must either
be called before beginning the model definition, or inside of
define_impl
!
# Alternative 1 - call make_impl_args_list() before beginning the IPM creation pipe
impl_args <- make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("dbh", 2),
state_end = rep("dbh", 2)
)
my_ipm <- my_ipm %>%
define_impl(impl_args)
my_ipm <- my_ipm %>%
# Alternative 2, put the call to make_impl_args_list() inside of define_impl().
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("dbh", 2),
state_end = rep("dbh", 2)
)
)
The next step in creating an IPM with ipmr
is to define
the domain of each continuous state variable in the states
list. This is done with the define_domains()
function. When
the int_rule
is "midpoint"
, this takes a named
set of vectors that have 3 entries each. The name corresponds to the
state
it is associated with, the first entry is the lower
bound, the second entry is the upper bound, and the third entry is the
number of meshpoints.
Note that for other int_rule
s, the vectors associated
with each domain will look different. However, those are not yet
implemented and so beyond the scope of this vignette for now.
my_ipm <- my_ipm %>%
define_domains(
dbh = c( # the name of the state variable.
1, # the lower bound for the domain. L from Eq 1
30, # the upper bound for the domain. U from Eq 1
200 # the number of mesh points to use for integration
)
)
ipmr
computes all values by iterating models (as opposed
to eigenvalue methods). This means we have to specify the initial
conditions for each simulation. In this case, the only initial condition
we need to specify is the initial population state. We do this using
define_pop_state()
. It takes a set of named expressions:
the name corresponds to the state variable, and the expression generates
values. The name of each state variable should be prefixed with an
n_
. The resulting vector should be the same length as the
number of meshpoints for the state variable. This chunk generates a
uniform distribution to represent the initial population state.
my_ipm <- my_ipm %>%
define_pop_state(n_dbh = rep(1/200, 200))
The minimal set of information to generate a deterministic IPM is now
wrapped up in our proto_ipm
and it is time to run the
model. make_ipm()
is a generic function and can be used for
any type of proto_ipm
generated with ipmr
.
my_ipm <- make_ipm(my_ipm,
iterations = 100)
lambda_ipmr <- lambda(my_ipm)
repro_value <- left_ev(my_ipm)
stable_dist <- right_ev(my_ipm)
# make_ipm_report works on either proto_ipms or made ipm objects. This
# requires the 'rmarkdown' package to run.
make_ipm_report(my_ipm$proto_ipm, title = "my_ipm_report", render_output = TRUE)
In this example, there are no additional arguments that need to be
passed to make_ipm()
- the proto_ipm
has all
of the information needed to generate a deterministic iteration kernel.
All of the make_ipm()
methods return a list with a length
of 5 with entries:
sub_kernels
contains, in this example,
P
and F
.
env_list
contains an environment named
main_env
. This environment holds key information for
implementing the model, such as the meshpoints, and upper and lower
state variable bounds. This can be omitted by setting
return_main_env = FALSE
.
return_all_envs = TRUE
in make_ipm()
returns
the evaluation environments for each sub-kernel. These are used by
subsequent analysis methods and so are important for internal usage, but
are probably of limited direct use to most users.
env_seq
contains either a character vector with the
sequence of indices used to select kernels from the
sub_kernels
during a stochastic simulation
(*_stoch_kern
), or a matrix of parameter estimates from
each iteration of the stochastic simulation (*_kern_param
).
Not relevant for *_det
methods and so contains
NA
.
pop_state
contains a list of matrices for each item
defined in define_pop_state()
. The rows of each matrix
correspond to the population state and columns are time steps.
Additionally, contains a slot lambda
with a 1 x
iterations
matrix containing per-capita growth rates for
every time step of the model.
proto_ipm
contains the proto_ipm
object
used to generate the model. This is used extensively by other methods in
the package.
lambda
, left_ev
, and right_ev
are generic functions corresponding to the dominant eigenvalue, dominant
left eigenvector, and dominant right eigenvector of the iteration
kernel, respectively. lambda
is available for all classes
of IPMs, while left/right_ev
is available for all
deterministic IPMs. Stochastic equivalents of the latter will get
implemented eventually. make_ipm_report()
generates an
Rmarkdown file containing Latex equations and parameter values used to
implement the IPM. This may be useful for publications/appendices, or
for sending an IPM to the PADRINO project for
archiving.
predict()
methods in vital rate expressions
The vast majority of IPM pipelines entail fitting a statistical models to data to create vital rate functions. Thus, we’re typically working with model objects rather than just parameter values. Sometimes the models have many terms, including interactions and/or discrete grouping effects, and the vital rate expressions are complicated to write out. Furthermore, the flexibility of IPMs means our vital rate models may be semi- or fully non-parametric, and writing out the functional form may not even be possible prior to the model fitting exercise (e.g. a GAM).
To deal with this, many regression modeling packages provide a
predict
method for the class of models that they implement.
We can use these in ipmr
to take the place of the vital
rate expressions (more common) or the kernel formulas (probably not as
common).
We’ll go through a quick example of how to incorporate these
expressions into the model building process using a data set for
Carpobrotus edulis, an invasive succulent. This is real data
collected in Israel and appears in Bogdan
et al. (2021). The data set is also included in ipmr
and can be loaded by running data(iceplant_ex)
.
First, we’ll load in the example data set and fit some simple vital rate models. Some notes about the data:
log_size
/log_size_next
are log
transformed surface areas in \(\mathrm{m}^2\). This will be the state
variable for the model, and is abbreviated sa_1
and
sa_2
in the model implementation code below.
repro
indicates whether or not a plant flowered and
is either 0 or 1
flower_n
indicates the number of flowers a
reproductive plant made, so is either a positive integer or
NA
.
\(f_G\) and \(f_{r_d}\) denote normal probability density
functions. The vital rates models and the IPM variable names are as
follows (vital_rate_model
,
IPM_variable_name
):
grow_mod
, g
)\(G(z',z) = f_G(z', \mu_G, \sigma_G)\)
\(\mu_G = \alpha_G + \beta_G * z\)
surv_mod
, s
)repr_mod
,
r_p
)seed_mod
,
r_s
)recr_n
, recr_n
)flow_n
, flow_n
)NA
,
r_r
)recr_mu, recr_sd
,
r_d
)
library(ipmr)
data(iceplant_ex)
# growth model
grow_mod <- lm(log_size_next ~ log_size, data = iceplant_ex)
grow_sd <- sd(resid(grow_mod))
# survival model
surv_mod <- glm(survival ~ log_size, data = iceplant_ex, family = binomial())
# Pr(flowering) model
repr_mod <- glm(repro ~ log_size, data = iceplant_ex, family = binomial())
# Number of flowers per plant model
seed_mod <- glm(flower_n ~ log_size, data = iceplant_ex, family = poisson())
# New recruits have no size(t), but do have size(t + 1)
recr_data <- subset(iceplant_ex, is.na(log_size))
recr_mu <- mean(recr_data$log_size_next)
recr_sd <- sd(recr_data$log_size_next)
# This data set doesn't include information on germination and establishment.
# Thus, we'll compute the realized recruitment parameter as the number
# of observed recruits divided by the number of flowers produced in the prior
# year.
recr_n <- length(recr_data$log_size_next)
flow_n <- sum(iceplant_ex$flower_n, na.rm = TRUE)
# Now, we bundle everything into a list as we did before. We can
# pass the model objects directly into the list, and do not need to extract
# coefficients.
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = surv_mod,
grow_mod = grow_mod,
repr_mod = repr_mod,
seed_mod = seed_mod,
recr_n = recr_n,
flow_n = flow_n)
# The lower and upper bounds for integration. Adding 20% on either end to minimize
# eviction. The minimum value is negative, so we multiply by 1.2, rather than 0.8.
# If it were positive, we would need to change that to 0.8.
L <- min(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
U <- max(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
We now have everything we need to get started. We use almost the same
code as the first example to implement the model, but substitute
predict()
in for the mathematical form of the vital rate
expressions. When we do this, we have to make sure that the names of the
newdata
argument in predict
match the names in
our model formula, and that the values we put into it are the names of
the state variables in our model. These same caveats apply to using
predict
interactively.
pred_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
family = "CC",
formula = s * g,
# Instead of the inverse logit transformation, we use predict() here.
# We have to be sure that the "newdata" argument of predict is correctly specified.
# This means matching the names used in the model itself (log_size) to the names
# we give the domains (sa_1). In this case, we use "sa" (short for surface area) as
# the new data to generate predictions for.
s = predict(surv_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g_mu = predict(grow_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
# We specify the rest of the kernel the same way.
g = dnorm(sa_2, g_mu, grow_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_s * r_d * r_r,
# As above, we use predict(model_object). We make sure the names of the "newdata"
# match the names in the vital rate model formulas, and the values match the
# names of domains they use.
r_p = predict(repr_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_s = predict(seed_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_r = recr_n / flow_n,
r_d = dnorm(sa_2, recr_mu, recr_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep("sa", 2),
state_end = rep("sa", 2)
)
) %>%
define_domains(
sa = c(L,
U,
100)
) %>%
define_pop_state(n_sa = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 200)
make_ipm()
can handle most types of widely used model
classes without any additional effort (e.g. lm
,
glm
, lme4
, brms
,
nlme
,betareg
). The list of method available in
ipmr
is not exhaustive though, and sometimes there will be
errors.
If you are sure that a predict
method exists for your
model type, but you get errors along the lines of
Element <some number> of '.x' must be a vector, not a call
,
this usually means that I forgot to add the class of model that you’re
trying to use to ipmr
. We can get around this by wrapping
those model objects in use_vr_model()
. Please also create
an Issue here
describing the model class you’re trying to work with so it can be added
to future versions.
Say our growth model isn’t recognized by ipmr
. We try to
make_ipm()
, and get our obscure error message about
.x
. The snippet below shows how to rectify this. We wrap
our model objects in use_vr_model()
when creating the
data_list
, and then re-run it with the exact same code as
above. use_vr_model()
tells ipmr
that this
object is a model object, not a raw parameter value. As the package
(hopefully) gets used more widely, this issue should disappear, and
use_vr_model()
will (hopefully) become obsolete.
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = use_vr_model(surv_mod), # wrap the model
grow_mod = use_vr_model(grow_mod), # wrap the model
repr_mod = use_vr_model(repr_mod), # wrap the model
seed_mod = use_vr_model(seed_mod), # wrap the model
recr_n = recr_n,
flow_n = flow_n)
pred_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
name = "P",
family = "CC",
formula = s * g,
s = predict(surv_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g_mu = predict(grow_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g = dnorm(sa_2, g_mu, grow_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_s * r_d * r_r,
r_p = predict(repr_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_s = predict(seed_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_r = recr_n / flow_n,
r_d = dnorm(sa_2, recr_mu, recr_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep("sa", 2),
state_end = rep("sa", 2)
)
) %>%
define_domains(
sa = c(L,
U,
100)
) %>%
define_pop_state(n_sa = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 200)
Usually, we want a bit more out of our model than just the
eigenvalues and eigenvectors. Here is a quick example of how to compute
generation time (\(T\)) and the
per-generation growth rate (\(R_0\)).
We’ll write two functions to help out. These are not part of
ipmr
, but will eventually be incorporated into a separate
package for life history analyses with IPMs which is designed to work
with ipmr
’s models.
R_0 <- function(ipm) {
Fm <- ipm$sub_kernels$F
Pm <- ipm$sub_kernels$P
I <- diag(nrow(Pm))
N <- solve(I - Pm)
R <- Fm %*% N
out <- Re(eigen(R)$values[1])
return(out)
}
gen_T <- function(ipm) {
R0 <- R_0(ipm)
lam <- lambda(ipm) %>% as.vector()
out <- log(R0) / log(lam)
return(out)
}
R_0(pred_ipm)
gen_T(pred_ipm)
In the example above, we created a model with a single, deterministic kernel defined by a single state variable. Next, we’ll go through an example showing how to build kernels that are constructed from discretely varying parameter estimates.
Say we have multiple sites sampled in the same year, but are only
interested in each site’s asymptotic dynamics (perhaps to compare
performance across space). ipmr
uses a syntax that closely
mirrors the mathematical notation of these models to successively build
up more complicated expressions without requiring that much extra code
as compared to the examples above. ipmr
refers to these as
parameter sets, and abbreviates them par_sets
. For
example, a random intercept for site may be denoted \(\alpha_{site}\), where \(_{site}\) provides an index for the
different values that \(\alpha\) can
take on.
The general idea is to append an index suffix to each variable in the
code that is modified, and supply a list of values that the index can
actually take in the par_set_indices
slot of
define_kernel()
. In the example below, the \(site\) index subscripts are incorporated
into the ipmr
code using the _site
suffix.
There is a longer vignette dedicated to translating math to R code to
the ipmr
syntax here.
NB: because of the way these indices are handled internally, they should always appear at the end of the name they modify. For example, use
s_slope_site
instead ofs_site_slope
. Multiple indices are fine as long as they chained together at the end (e.g.s_slope_site_year
). Models may still work if formatted with indices in the middle of a name, but they also may not (often with obscure error messages).
For this example, we’ll use the following vital rate models. The
(g)lmer
functions are from the lme4
package.
survival (s_site
): a logistic regression with a
random site intercept (s_r_site
).
Example mathematical notation: \(Logit(s_{site}(z)) = \alpha_{s,site} + \alpha_s + \beta_s * z\)
Example model formula:
glmer(surv ~ size_1 + (1 | site), data = my_surv_data, family = binomial()))
growth (g_site
): A linear regression random site
intercept (g_r_site
).
Example mathematical notation:
\(\mu_G = \alpha_{G,site} + \alpha_G + \beta_G * z\)
\(G_{site}(z',z) = f_G(z', \mu_{G,site}, \sigma_G)\)
Example model formula:
lmer(size_2 ~ size_1 + (1 | site), data = my_grow_data, family = gaussian()))
pr(flowering) (p_r
): A logistic regression. This has
no random site effect.
Example mathematical notation: \(Logit(p_r(z)) = \alpha_{r_p} + \beta_{r_p} * z\)
Example model formula:
glm(flower ~ size_1 , data = my_flower_data, family = binomial()))
seed production (r_s_site
): A Poisson regression
with a random site intercept (r_s_r_site
)
Example mathematical notation: \(Log(r_{s_{site}}(z)) = \alpha_{r_s, site} + \alpha_{r_s} + \beta_{r_s} * z\)
Example model formula:
glmer(seed_num ~ size_1 + (1 | site), data = my_surv_data, family = poisson()))
recruit size distribution (r_d
): A normal
distribution with two constant parameters, the mean (mu_rd
)
and standard deviation (sd_rd
).
The chunk below takes the place of fitting regression models to
actual data, so the code that replaces this chunk will look a little
different (and probably involve the use of
fixef(some_vital_rate_model
and
ranef(some_vital_rate_model)
)).
library(ipmr)
# Define some fixed parameters
fixed_list <- list(
s_int = 1.03, # fixef(my_surv_mod)[1] - uses fixef because we now have a model with random effects
s_slope = 2.2, # fixef(my_surv_mod)[2]
g_int = 3.7, # fixef(my_grow_mod)[1]
g_slope = 0.92, # fixef(my_grow_mod)[2]
sd_g = 0.9, # sd(resid(my_grow_mod))
r_r_int = 0.09, # coef(my_repro_mod)[1] - uses coef because there are no random effects in this model
r_r_slope = 0.05, # coef(my_repro_mod)[2]
r_s_int = 0.1, # fixef(my_flower_mod)[1]
r_s_slope = 0.005, # fixef(my_flower_mod)[2]
mu_rd = 9, # mean(my_recr_data$size_2, na.rm = TRUE)
sd_rd = 2 # sd(my_recr_data$size_2, na.rm = TRUE)
)
We’ve defined a fixed_list
that holds all of the fixed
parameters in our model. Next, we’ll make up some random site specific
intercepts, and add those to the fixed_list
, naming it
all_params_list
. You don’t necessarily need to rename
anything, this is just for disambiguation.
# Now, simulate some random intercepts for growth (g_), survival (s_),
# and offspring production (r_s_). This part is for the purpose of the example.
# First, we create vector of values that each random component can take.
g_r_int <- rnorm(5, 0, 0.3) # unlist(ranef(my_grow_mod)) for an lme4 output
s_r_int <- rnorm(5, 0, 0.7) # unlist(ranef(my_surv_mod)) for an lme4 output
r_s_r_int <- rnorm(5, 0, 0.2) # unlist(ranef(my_flower_mod)) for an lme4 output
# We'll call our hypothetical sites 1, 2, 3, 4, and 5. The "r" prefix is to
# remind us that these are random quantities.
nms <- paste("r_", 1:5, sep = "")
names(g_r_int) <- paste('g_', nms, sep = "")
names(s_r_int) <- paste('s_', nms, sep = "")
names(r_s_r_int) <- paste('r_s_', nms, sep = "")
# Each set of parameters is converted to a named list. The names should match
# the variables referenced in each define_kernel() call.
g_params <- as.list(g_r_int)
s_params <- as.list(s_r_int)
r_s_params <- as.list(r_s_r_int)
# add them all together using c()
all_params_list <- c(fixed_list, g_params, s_params, r_s_params)
We’ve created a list where each entry is named and contains a single
parameter value. This is now ready for use in
define_kernel()
.
my_ipm <- init_ipm(sim_gen = "simple", di_dd = "di", det_stoch = "det") %>%
define_kernel(
# Our P kernels will vary from site to site, so we index it with "_site"
name = 'P_site',
# Similarly, our survival and growth functions will vary from site to site
# so these are also indexed
formula = s_site * g_site,
family = "CC",
# The linear predictor for the survival function can be split out
# into its own expression as well. This might help keep track of things.
# Survival is indexed by site as well.
s_lin_site = s_int + s_r_site + s_slope * ht_1,
s_site = 1 / (1 + exp(-s_lin_site)),
# Again, we modify the vital rate expression to include "_site".
g_site = dnorm(ht_2, mean = mu_g_site, sd = sd_g),
mu_g_site = g_int + g_slope * ht_1 + g_r_site,
data_list = all_params_list,
states = list(c('ht')),
# Here, we tell ipmr that the model has some parameter sets, and
# provide a list describing the values the index can take. The values in
# par_set_indices are substituted for "site" everywhere in the model, except
# for the data list. This is why we had to make sure that the names there
# matched the levels we supply here.
uses_par_sets = TRUE,
par_set_indices = list(site = 1:5),
# We must also index the variables in the eviction function
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_site")
) %>%
define_kernel(
# The F kernel also varies from site to site
name = "F_site",
formula = r_r * r_s_site * r_d,
family = "CC",
# In this example, we didn't include a site level effect for probability
# of flowering, only seed production. Thus, this expression is NOT indexed.
r_r_lin = r_r_int + r_r_slope * ht_1,
r_r = 1 / (1 + exp(- r_r_lin)),
# We index the seed production expression with the site effect
r_s_site = exp(r_s_int + r_s_r_site + r_s_slope * ht_1),
r_d = dnorm(ht_2, mean = mu_rd, sd = sd_rd),
data_list = all_params_list,
states = list(c('ht')),
# As in the P kernel, we specify the values the index can have.
uses_par_sets = TRUE,
par_set_indices = list(site = 1:5),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
# The impl_args are also modified with the index
kernel_names = c("P_site", "F_site"),
int_rule = rep("midpoint", 2),
state_start = rep("ht", 2),
state_end = rep("ht", 2)
)
) %>%
define_domains(ht = c(0.2, 40, 100)) %>%
# We also append the suffix in define_pop_state(). THis will create a deterministic
# simulation for every "site"
define_pop_state(n_ht_site = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 100)
lambda(my_ipm)
Aside from appending the _site
index to a number of
variables, the code is pretty much the same as the first example.
ipmr
automatically substitutes 1, 2, 3, 4, and 5 for each
occurrence of site
in the kernel formulas and vital rate
expressions. Thus, P_site
is expanded to P_1
,
P_2
, P_3
, P_4
, P_5
,
s_site
to s_1
, s_2
,
s_3
, s_4
, and s_5
.
s_r_site
is converted to s_r_1
,
s_r_2
, etc. This is why we needed to make sure the names in
all_params_list
had the actual numbers appended to
them.
For more complicated models, you may need multiple indices. In those
cases, you can append them in any order you like to each variable. For
example, sites (_site
) and years (_yr
) could
be appended to survival (s_
) like so:
s_site_yr = some_expression
The par_set_indices
list then gets modified like so:
par_set_indices = list(site = c("A", "B", "C"), yr = c(2001:2005))
In the examples above, we first created a model with a single, deterministic kernel defined by a single state variable. Then we modeled multiple sites with deterministic kernels that varied from site to site. Next, we’ll go through an example showing how to build stochastic models from kernels that are constructed from discretely varying parameter estimates.
The example below is the same as the last example, except that this
time, we call our random effect “year” (indexed with _yr
)
and we’ll run a simulation by randomly choosing a year’s kernel for each
model iteration (“kernel resampling” sensu Metcalf
et al. 2015). Along the way, we’ll also learn how to pass custom
functions to the model.
The vital rate models are as follows:
survival (s_yr
): a logistic regression with a random
year intercept (s_r_yr
).
glmer(surv ~ size_1 + (1 | yr), data = my_surv_data, family = binomial()))
growth (g_yr
): A linear regression random year
intercept (g_r_yr
).
lmer(size_2 ~ size_1 + (1 | yr), data = my_grow_data, family = gaussian()))
pr(flowering) (p_r
): A logistic regression. This has
no random year effect.
glm(flower ~ size_1 , data = my_surv_data, family = binomial()))
seed production (r_s_yr
): A Poisson regression with
a random year intercept (r_s_r_yr
)
glmer(seed_num ~ size_1 + (1 | yr), data = my_surv_data, family = poisson()))
recruit size distribution (r_d
): A normal
distribution with two constant parameters, the mean (mu_rd
)
and standard deviation (sd_rd
).
The chunk below takes the place of fitting regression models to
actual data, so the code that replaces this chunk will look a little
different (and probably involve the use of
fixef(some_vital_rate_model)
and/or
ranef(some_vital_rate_model)
).
library(ipmr)
# Define some fixed parameters
fixed_list <- list(
s_int = 1.03, # fixef(my_surv_mod)[1] - uses fixef because we now have a model with random effects
s_slope = 2.2, # fixef(my_surv_mod)[2]
g_int = 3.7, # fixef(my_grow_mod)[1]
g_slope = 0.92, # fixef(my_grow_mod)[2]
sd_g = 0.9, # sd(resid(my_grow_mod))
r_r_int = 0.09, # coef(my_repro_mod)[1] - uses coef because there are no random effects in this model
r_r_slope = 0.05, # coef(my_repro_mod)[2]
r_s_int = 0.1, # fixef(my_flower_mod)[1]
r_s_slope = 0.005, # fixef(my_flower_mod)[2]
mu_fd = 9, # mean(my_recr_data$size_2, na.rm = TRUE)
sd_fd = 2 # sd(my_recr_data$size_2, na.rm = TRUE)
)
We’ve defined a fixed_list
that holds all of the fixed
parameters in our model. Next, we’ll make up some random year specific
intercepts, and add those to the fixed_list
, naming it
all_params_list
. You don’t necessarily need to rename
anything, this is just for disambiguation.
# Now, simulate some random intercepts for growth (g_), survival (s_),
# and offspring production (r_s_). This part is for the purpose of the example.
# First, we create vector of values corresponding to
g_r_int <- rnorm(5, 0, 0.3) # unlist(ranef(my_grow_mod)) for an lme4 output
s_r_int <- rnorm(5, 0, 0.7) # unlist(ranef(my_surv_mod)) for an lme4 output
r_s_r_int <- rnorm(5, 0, 0.2) # unlist(ranef(my_flower_mod)) for an lme4 output
nms <- paste("r_", 1:5, sep = "")
names(g_r_int) <- paste('g_', nms, sep = "")
names(s_r_int) <- paste('s_', nms, sep = "")
names(r_s_r_int) <- paste('r_s_', nms, sep = "")
# Each set of parameters is converted to a named list. The names should match
# the variables referenced in each define_kernel() call.
g_params <- as.list(g_r_int)
s_params <- as.list(s_r_int)
r_s_params <- as.list(r_s_r_int)
# add them all together using c()
all_params_list <- c(fixed_list, g_params, s_params, r_s_params)
We’ve created a list where each entry is named and contains a single
parameter value. This is now ready for use in
define_kernel()
We have our parameter values. In the next step, we’ll compose some
helper functions to make the vital rate expressions inside of
define_kernel()
a bit more concise and expressive. These
are passed in a list to the usr_funs
argument of
make_ipm()
.
inv_logit <- function(sv, int, slope) {
return(
1/(1 + exp(-(int + slope * sv)))
)
}
# same as above, but handles the extra term from the random effect we simulated.
inv_logit_r <- function(sv, int, slope, r_eff) {
return(
1/(1 + exp(-(int + slope * sv + r_eff)))
)
}
pois_r <- function(sv, int, slope, r_eff) {
return(
exp(
int + slope * sv + r_eff
)
)
}
my_funs <- list(inv_logit = inv_logit,
inv_logit_r = inv_logit_r,
pois_r = pois_r)
The only requirement for these functions is that they contain valid R code, and they return either real numbers or integers.
With the functions and parameter values defined, we are now ready to
begin composing the model. Each expression’s syntax will look very
similar to the deterministic example - the main difference is we are now
pretending that our expressions are indexed by year (_yr
)
as opposed to site, and we change the det_stoch
and
kern_param
arguments in init_ipm()
.
my_ipm <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
name = 'P_yr', # P becomes P_yr
formula = s_yr * g_yr, # g and s become g_yr and s_yr, respectively
family = "CC",
# Note the usage of the inv_logit_r, which we defined in the block above.
# it is passed to make_ipm()
s_yr = inv_logit_r(ht_1, s_int, s_slope, s_r_yr),
g_yr = dnorm(ht_2, mean = mu_g_yr, sd = sd_g),
mu_g_yr = g_int + g_slope * ht_1 + g_r_yr,
# all_params_list contains the named parameters g_r_1, g_r_2, s_r_1, s_r_2, etc.
# This is the only level where the user is required to fully expand the name
# X par_set_indices combinations.
data_list = all_params_list,
states = list(c('ht')),
uses_par_sets = TRUE,
par_set_indices = list(yr = 1:5),
evict_cor = TRUE,
# reference to g_yr in evict_fun is also updated
evict_fun = truncated_distributions("norm", "g_yr")
) %>%
define_kernel(
name = "F_yr", # Update the names as we did for the P kernel
formula = r_r * r_s_yr * r_d,
family = "CC",
r_r = inv_logit(ht_1, r_r_int, r_r_slope),
r_s_yr = pois_r(ht_1, r_s_int, r_s_slope, r_s_r_yr),
r_d = dnorm(ht_2, mean = mu_fd, sd = sd_fd),
data_list = all_params_list,
states = list(c('ht')),
uses_par_sets = TRUE,
par_set_indices = list(yr = 1:5),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P_yr", "F_yr"),
int_rule = rep("midpoint", 2),
state_start = rep("ht", 2),
state_end = rep("ht", 2)
)
) %>%
define_domains(ht = c(0.2, 40, 100)) %>%
define_pop_state(n_ht = runif(100)) %>%
make_ipm(usr_funs = my_funs,
kernel_seq = sample(1:5, 100, replace = TRUE),
iterate = TRUE,
iterations = 100)
# The default for stochastic lambda is to compute mean(log(ipm$pop_state$lambda)).
# It also removes the first 10% of iterations to handle "burn in". The amount
# removed can be adjusted with the burn_in parameter.
stoch_lambda <- lambda(my_ipm)
# lambda(comp_method = 'pop_size', type = 'all') will compute the population
# growth rate for every time step as the sum(n_ht_t_1) / sum(n_ht_t).
iteration_lambdas <- lambda(my_ipm, type_lambda = 'all')
The only major difference between the IPM definition in the first
example and this one is that we’ve passed custom functions to the call
to make_ipm()
, and altered our vital rate expressions to
use them instead of the pure math for each variable transformation. On
the other hand, the contents of the output will look a little
different.
The sub_kernels
slot of my_ipm
now
contains 5 P and 5 F kernels (10 total).
The env_seq
slot will now contain a character vector
with the indices used to select sub-kernels for each model iteration.
These can be used to reproduce results later on.
All other slots will look the same as in the previous
simple_di_det
example.
100 iterations is not enough to estimate stochastic growth rates (\(\lambda_s\)), but the computations can take some time with a lot of iterations, and are not practical for demonstration purposes.
In some cases, it is not desirable to work with single estimates of a random variable, and we would prefer to work with the distributions they come from. Environmental variables like temperature and precipitation may be random with reasonably well known distributions. A stochastic simulation can incorporate this information to help us understand the consequences of this random variability.
This is where the kern_param = "param"
methods come in
handy. Unfortunately, these methods are less computationally efficient
than their kern_param = "kern"
counterparts because they
must rebuild each kernel for every single iteration. However, they are
fantastic tools for exploring the consequences of continuous
environmental variation.
Below is an example that demonstrates how to work with environmental
variation. ipmr
is careful to only evaluate each expression
in define_env_state()
once per iteration, so we can safely
work with multivariate distributions using very little additional code.
It is important to note that this not necessarily the same thing as
incorporating parameter uncertainty into your model. Parameter
uncertainty is covered in the final section of this vignette.
The parameters for the growth and survival functions will be sampled
from distributions for temperature and precipitation one time per
iteration, and an example of a function to pass to
define_env_state()
is included in the first chunk
below.
Stochastic simulations require specification of the initial
conditions. ipmr
aims to make this straightforward for you
by providing two helpers - define_pop_state()
and
define_env_state()
. We were introduced
define_pop_state()
above, and will now cover
define_env_state()
.
define_env_state()
define_env_state()
takes a named set of expressions in
the ...
and then a data list, much like how
define_kernel()
takes them. The values created need to be
in a list that has names corresponding to the parameter names in the
vital rate expressions. In this example, they are called
temp
and precip
. The data_list
in
define_env_state()
should contain any variables used in the
function we define (in this example, sample_env()
). we can
reference the names in list that sample_env
returns in the
vital rate expressions in each kernel definition as if they were in the
data_list
of define_kernel
.
The first chunk below initializes the parameters and functions that the model uses. It takes the place of the usual vital rate model fitting process.
This example uses models for survival and growth that include
environmental covariates. To limit complexity, we won’t include an
interaction term, but you are free to include as many as you want in
your own models. We define a single function to sample the environmental
distributions to illustrate how to use continuously varying parameter
distributions in ipmr
. This only uses two models and one
function to limit the the complexity of the example, however there is no
upper limit on the number of parameters or functions you can use in your
own models.
The vital rate functions are described here:
survival (s
): a logistic regression.
glm(survival ~ size_1 + temp + precip, data = my_surv_data, family = binomial())
growth (g
): a linear regression
glm(size_2 ~ size_1 + temp + precip, data = my_grow_data)
flower probability (r_r
): A logistic regression.
glm(repro ~ size_1, data = my_repro_data, family = binomial())
seed production (r_s
): a logistic regression.
glm(flower_n ~ size_1, data = my_flower_data, family = poisson())
recruit sizes (r_d
): A normal distribution
r_d_mu
)
mean(my_recruit_data$size_2, na.rm = TRUE)
and standard
deviation (r_d_sd
)
sd(my_recruit_data$size_2, na.rm = TRUE)
And the the parameter values are given here:
library(ipmr)
# Define the fixed parameters in a list. The temperature and precipitation
# coefficients are defined as s_temp, s_precip, g_temp, and g_precip.
constant_params <- list(
s_int = -10,
s_slope = 1.5,
s_precip = 0.00001,
s_temp = -0.003,
g_int = 0.2,
g_slope = 1.01,
g_sd = 1.2,
g_temp = -0.002,
g_precip = 0.004,
r_r_int = -3.2,
r_r_slope = 0.55,
r_s_int = -0.4,
r_s_slope = 0.5,
r_d_mu = 1.1,
r_d_sd = 0.1
)
# Now, we create a set of environmental covariates. In this example, we use
# a normal distribution for temperature and a Gamma for precipitation.
env_params <- list(
temp_mu = 8.9,
temp_sd = 1.2,
precip_shape = 1000,
precip_rate = 2
)
# We define a wrapper function that samples from these distributions
sample_env <- function(env_params) {
# We generate one value for each covariate per iteration, and return it
# as a named list. We can reference the names in this list in vital rate
# expressions.
temp_now <- rnorm(1,
env_params$temp_mu,
env_params$temp_sd)
precip_now <- rgamma(1,
shape = env_params$precip_shape,
rate = env_params$precip_rate)
out <- list(temp = temp_now, precip = precip_now)
return(out)
}
# Again, we can define our own functions and pass them into calls to make_ipm. This
# isn't strictly necessary, but can make the model code more readable/less error prone.
inv_logit <- function(lin_term) {
1/(1 + exp(-lin_term))
}
We now have parameter estimates. Time to build the IPM!
init_pop_vec <- runif(100)
param_resamp_model <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = 'P',
formula = s * g,
family = 'CC',
# Parameters created by define_env_state() can be referenced by name just like
# any other parameter in the model.
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = 'F',
formula = r_r * r_s * r_d,
family = 'CC',
r_r_lin_p = r_r_int + r_r_slope * surf_area_1,
r_r = inv_logit(r_r_lin_p),
r_s = exp(r_s_int + r_s_slope * surf_area_1),
r_d = dnorm(surf_area_2, r_d_mu, r_d_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep('surf_area', 2),
state_end = rep('surf_area', 2)
)
) %>%
define_domains(surf_area = c(0, 10, 100))
The continuously varying parameters and the expressions that generate
them should be passed into define_env_state()
.
make_ipm()
will ensure that these are evaluated only once
per iteration of the model, so that we can safely work with joint
distributions that generate multiple parameter estimates per draw.
The ...
part of define_env_state()
should
be expressions that generate the variables we would like to reference.
temp
and precip
are the names that will be
generated by the IPM code, and so they are the names that should be
referenced in the kernels’ vital rate expressions, not
env_covs
. we don’t need to remember that we called this
particular object env_covs
when we write our vital rate
expressions, but it does have to be named something for the IPM code to
run.
The data_list
contains the list of parameter values for
the expressions in ...
, and any custom functions that we
specify to sample with. In this example, the environmental variables
don’t co-vary, but this is not a requirement. We could just as easily
specify them as coming from a multivariate distribution, and modify our
env_sample
function accordingly.
param_resamp_model <- param_resamp_model %>%
define_env_state(
env_covs = sample_env(env_params),
data_list = list(env_params = env_params,
sample_env = sample_env)
) %>%
define_pop_state(
pop_vectors = list(
n_surf_area = init_pop_vec
)
) %>%
make_ipm(usr_funs = list(inv_logit = inv_logit),
iterate = TRUE,
iterations = 10,
return_sub_kernels = TRUE)
# By default, lambda computes stochastic lambda with stochastic models
lambda(param_resamp_model)
# We can get lambdas for each time step by adding type_lambda = "all"
lambda(param_resamp_model, type_lambda = 'all')
# If we want to see the actual draws that were used at each step of the
# model iteration, we can access these using the output's $env_seq slot.
param_resamp_model$env_seq
Longer running stochastic parameter-resampled models can take up a
lot of space in memory when all of the sub-kernels are saved from each
iteration. For example, running the model above for 10,000 iterations
would result in 20,000 \(100 \times
100\) matrices in the sub_kernels
slot of the IPM
object (~16 GB of RAM). This will likely result in crashes as smaller
machines run out of available RAM. Therefore, make_ipm()
contains an argument return_sub_kernels
for these types of
models that allows you to switch off that behavior and conserve
available RAM. By default, this is set to FALSE
. If you
need sub-kernels for downstream analyses, set this option to
TRUE
and make sure you have a computer with sufficient RAM
(64-128 GB range is likely required to store all information for longer
running models).
These warnings also apply to all density dependent model
classes and the same return_sub_kernels
argument can be
used for those as well.
In some cases, we might want to provide a sequence of environmental
values ahead of time, as opposed to sampling them at each iteration. We
can do this by re-writing the sample_env
function so that
it takes the current model iteration as a parameter that selects values
from the pre-specified environmental values. ipmr
defines
the variable t
internally, which we can use to access the
current model iteration. First, we re-define the model:
param_resamp_model <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = 'P',
formula = s * g,
family = 'CC',
# Parameters created by define_env_state() can be referenced by name just like
# any other parameter in the model.
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = 'F',
formula = r_r * r_s * r_d,
family = 'CC',
r_r_lin_p = r_r_int + r_r_slope * surf_area_1,
r_r = inv_logit(r_r_lin_p),
r_s = exp(r_s_int + r_s_slope * surf_area_1),
r_d = dnorm(surf_area_2, r_d_mu, r_d_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep('surf_area', 2),
state_end = rep('surf_area', 2)
)
) %>%
define_domains(surf_area = c(0, 10, 100))
Next, we generate some parameter values for temp
and
precip
:
env_states <- data.frame(precip = rgamma(10, shape = 1000, rate = 2),
temp = rnorm(10, mean = 8.9, sd = 1.2))
In this case, we are trying to sample these in a specific order - the
1st row first, the 2nd row second, 3rd row third, etc. Instead of
writing a function that generates random draws from a distribution, we
write one that samples rows from this data frame and creates a named
list. ipmr
generates a helper variable, t
,
that lets us access the current model iteration. Thus, our function to
generate the environmental variables at each time step could look like
this:
sample_env <- function(env_states, iteration) {
out <- as.list(env_states[iteration, ])
names(out) <- names(env_states)
return(out)
}
We can now define_env_state()
and
define_pop_state()
:
param_resamp_model <- param_resamp_model %>%
define_env_state(env_params = sample_env(env_states,
iteration = t), # "t" indexes the current model iteration
data_list = list(
env_states = env_states,
sample_env = sample_env
)) %>%
define_pop_state(
pop_vectors = list(
n_surf_area = init_pop_vec
)
) %>%
make_ipm(usr_funs = list(inv_logit = inv_logit),
iterate = TRUE,
iterations = 10)
Notice that we set iteration = t
in the call to
sample_env
. Otherwise, everything else looks the same as
before.
Above, we defined the environment explicitly ahead of time. However, there may be cases where we’d rather incorporate other environmental models directly into our stochastic IPM (e.g. a specific climate change scenario). We could sample our environmental model ahead of time and simply incorporate those directly into the model, indexing by time, similar to what we did above. However, this may not be convenient for some purposes.
Below is an example that is similar to above, but includes information on time when drawing environmental covariate values. This simulation will set up a scenario in which temperature increases and precipitation decreases with time, and adds random noise to each draw.
param_resamp_model <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = 'P',
formula = s * g,
family = 'CC',
# Parameters created by define_env_state() can be referenced by name just like
# any other parameter in the model.
g_mu = g_int + g_slope * surf_area_1 + g_temp * temp + g_precip * precip,
s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp + s_precip * precip,
s = inv_logit(s_lin_p),
g = dnorm(surf_area_2, g_mu, g_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g")
) %>%
define_kernel(
name = 'F',
formula = r_r * r_s * r_d,
family = 'CC',
r_r_lin_p = r_r_int + r_r_slope * surf_area_1,
r_r = inv_logit(r_r_lin_p),
r_s = exp(r_s_int + r_s_slope * surf_area_1),
r_d = dnorm(surf_area_2, r_d_mu, r_d_sd),
data_list = constant_params,
states = list(c('surf_area')),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep('surf_area', 2),
state_end = rep('surf_area', 2)
)
) %>%
define_domains(surf_area = c(0, 10, 100))
Next, we’ll set up a model for temperature and precipitation, and a
function that samples from those models to generate values we can use in
our IPM simulation. Setting the initial values
init_temp
/init_precip
is optional, but making
them parameters makes it a bit easier to specify a different model later
on.
env_sampler <- function(time, init_temp = 10, init_precip = 1500) {
t_est <- init_temp + 0.2 * time + rnorm(1, mean = 0, sd = 0.2)
p_est <- init_precip + -3.3 * time + rnorm(1, mean = 0, sd = 100)
if(p_est <= 0) p_est <- 1
out <- list(temp = t_est,
precip = p_est)
return(out)
}
We’re pretty much ready to run our simulation now!
param_resamp_ipm <- param_resamp_model %>%
define_env_state(
env_params = env_sampler(time = t, init_temp = 10, init_precip = 1500),
data_list = list()
) %>%
define_pop_state(
n_surf_area = init_pop_vec
) %>%
make_ipm(
iterations = 100,
usr_funs = list(env_sampler = env_sampler,
inv_logit = inv_logit)
)
lambda(param_resamp_ipm)
Currently, ipmr
doesn’t contain functions to deal with
uncertainty. The goal is to change that soon. In the meantime, here is
an example of how to deal with that in the context of a simple,
deterministic IPM. The procedure is similar for kernel- and
parameter-resampled models, and can be adapted to suit those as
needed.
There are a multitude of ways to incorporate uncertainty and other sources of continuous variation into regression models and IPMs. That variety is beyond the scope of this vignette. The following resources are excellent introductions to both and themselves contain a multitude of further readings:
For the purpose of this example, we’ll use the iceplant data set and do a bootstrapping procedure on the raw data. At each iteration of the bootstrap, we’ll re-run the regression models, and then re-build the IPM. We’ll store the lambda values for each one, and then repeat the procedure 50 times to keep the example short. Normally, many more re-samplings would be required to obtain reasonable confidence intervals.
First, we construct our vital rate models and IPM from the observed data.
library(ipmr)
data(iceplant_ex)
grow_mod <- lm(log_size_next ~ log_size, data = iceplant_ex)
grow_sd <- sd(resid(grow_mod))
surv_mod <- glm(survival ~ log_size, data = iceplant_ex, family = binomial())
repr_mod <- glm(repro ~ log_size, data = iceplant_ex, family = binomial())
seed_mod <- glm(flower_n ~ log_size, data = iceplant_ex, family = poisson())
recr_data <- subset(iceplant_ex, is.na(log_size))
recr_mu <- mean(recr_data$log_size_next)
recr_sd <- sd(recr_data$log_size_next)
recr_n <- length(recr_data$log_size_next)
flow_n <- sum(iceplant_ex$flower_n, na.rm = TRUE)
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = surv_mod,
grow_mod = grow_mod,
repr_mod = repr_mod,
seed_mod = seed_mod,
recr_n = recr_n,
flow_n = flow_n)
L <- min(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
U <- max(c(iceplant_ex$log_size, iceplant_ex$log_size_next), na.rm = TRUE) * 1.2
obs_ipm <- init_ipm(sim_gen = "simple",
di_dd = "di",
det_stoch = "det") %>%
define_kernel(
name = "P",
family = "CC",
formula = s * g,
# Instead of the inverse logit transformation, we use predict() here.
# We have to be sure that the "newdata" argument of predict is correctly specified.
# This means matching the names used in the model itself (log_size) to the names
# we give the domains. In this case, we use "sa" (short for surface area) as
# the new data to generate predictions for.
s = predict(surv_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
g_mu = predict(grow_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
# We specify the rest of the kernel the same way.
g = dnorm(sa_2, g_mu, grow_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "g")
) %>%
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_s * r_d * r_r,
# As above, we use predict(model_object). We make sure the names of the "newdata"
# match the names in the vital rate model formulas, and the values match the
# names of domains they use.
r_p = predict(repr_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_s = predict(seed_mod,
newdata = data.frame(log_size = sa_1),
type = 'response'),
r_r = recr_n / flow_n,
r_d = dnorm(sa_2, recr_mu, recr_sd),
states = list(c("sa")),
data_list = params,
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions(fun = "norm",
target = "r_d")
) %>%
define_impl(
make_impl_args_list(
kernel_names = c("P", "F"),
int_rule = rep('midpoint', 2),
state_start = rep("sa", 2),
state_end = rep("sa", 2)
)
) %>%
define_domains(
sa = c(L,
U,
100)
) %>%
define_pop_state(n_sa = runif(100)) %>%
make_ipm(iterate = TRUE,
iterations = 100)
lambda_obs <- lambda(obs_ipm)
Next, we’ll initialize a vector to hold the final time-step lambdas for each resampled dataset. We’re also going to split out the new recruits from the other data. We only have 12, and the model building won’t work if we happen to not pull any of them out when we resample the data.
all_lambdas <- numeric(50L)
recr_data <- subset(iceplant_ex, is.na(log_size))
adults <- subset(iceplant_ex, !is.na(log_size))
use_proto <- obs_ipm$proto_ipm
Now, we refit the vital rate models, and use the
parameters<-
setter function to update the original
proto_ipm
object with the new vital rate models. This saves
us from re-typing the whole model pipeline again. Normally, we would
bootstrap more than 50 times, but for the sake of this example and
saving time, we will only do 50.
for(i in 1:50) {
sample_ind <- seq(1, nrow(adults), by = 1)
boot_ind <- sample(sample_ind, size = nrow(adults), replace = TRUE)
boot_data <- rbind(adults[boot_ind, ],
recr_data)
grow_mod <- lm(log_size_next ~ log_size, data = boot_data)
grow_sd <- sd(resid(grow_mod))
surv_mod <- glm(survival ~ log_size, data = boot_data, family = binomial())
repr_mod <- glm(repro ~ log_size, data = boot_data, family = binomial())
seed_mod <- glm(flower_n ~ log_size, data = boot_data, family = poisson())
recr_mu <- mean(recr_data$log_size_next)
recr_sd <- sd(recr_data$log_size_next)
recr_n <- length(recr_data$log_size_next)
flow_n <- sum(boot_data$flower_n, na.rm = TRUE)
params <- list(recr_mu = recr_mu,
recr_sd = recr_sd,
grow_sd = grow_sd,
surv_mod = surv_mod,
grow_mod = grow_mod,
repr_mod = repr_mod,
seed_mod = seed_mod,
recr_n = recr_n,
flow_n = flow_n)
# Insert the new vital rate models into the proto_ipm, then rebuild the IPM.
parameters(use_proto) <- params
boot_ipm <- use_proto %>%
make_ipm(iterate = TRUE,
iterations = 100)
all_lambdas[i] <- lambda(boot_ipm)
}
# Plot the results
hist(all_lambdas)
abline(v = lambda_obs, col = 'red', lwd = 2, lty = 2)
You can adapt this code to store any quantity of interest (e.g. regression coefficients, population structures).
An article on these is available on the website and from an R session:
browseVignettes("ipmr")