| Title: | Bayesian Multi-State Models for Intermittently-Observed Data |
|---|---|
| Description: | Bayesian multi-state models for intermittently-observed data. Markov and phase-type semi-Markov models, and misclassification hidden Markov models. |
| Authors: | Christopher Jackson [aut, cre] (ORCID: <https://orcid.org/0000-0002-6656-8913>) |
| Maintainer: | Christopher Jackson <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.3 |
| Built: | 2026-05-06 17:39:03 UTC |
| Source: | https://github.com/chjackson/msmbayes |
For an introduction to and overview of the msmbayes package, and full documentation, see
http://chjackson.github.io/msmbayes.
For more resources on multi-state modelling, see the msm package and its documentation.
Maintainer: Christopher Jackson [email protected] (ORCID)
Useful links:
A simulated multistate dataset with lots of observations and covariates
bigdatbigdat
See data-raw/bigdata.R in the source for simulation settings.
Convert between canonical parameters and rates for a phase-type distribution
canpars_to_rates(pars, type = "vector") rates_to_canpars(rates, type = "vector")canpars_to_rates(pars, type = "vector") rates_to_canpars(rates, type = "vector")
pars |
Canonical parameters, supplied in the order:
or a list with three components, one vector for each of these three parameter types. |
type |
|
rates |
List with two components for progression and absorption rates, in increasing order of phase, or a vector with these concatenated. |
A list with components
p progression rates between phases
a absorption rates
or a vector with these components concatenated, depending on the "type" argument.
Misclassification error probabilities from an msmbayes model
edf(draws)edf(draws)
draws |
Object returned by |
A data frame with one row per modelled misclassification probability.
from indicates the true state, and to indicates the observed state.
Error probabilities fixed by the user are not included.
qdf for more information about the format.
Matrix of misclassification error probabilities from an msmbayes model
ematrix(draws, type = "posterior")ematrix(draws, type = "posterior")
draws |
Object returned by |
type |
|
An array or matrix of rvar objects containing the
misclassification error matrix for each new prediction data point
Example fitted model objects used for testing msmbayes
infsim_model infsim_modelc infsim_modelp infsim_modelpcinfsim_model infsim_modelc infsim_modelp infsim_modelpc
An object of class msmbayes, obtained by fitting a
Markov model to the dataset infsim. See
data-raw/infsim.R in the source for the model
specification code
infsim_modelc includes covariates on the transition
intensities.
infsim_modelp is a phase-type model.
infsim_modelpc is a phase-type model with covariates.
Simulated
A simulated dataset from an illness-death model
illdeath_datailldeath_data
See data-raw/illdeath.R in the source for simulation settings.
Simulated infection testing data
infsim infsim2infsim infsim2
infsim has 3600 rows, with 36 state observations for each of 100 people. A smaller dataset infsim2 has only 360 rows, from 20 people, and
is simulated using a sojourn time of 60 days in the test-negative state
and 10 days in test-positive.
Columns are:
subject Subject identifier
days Observation time (in days)
months Observation time (in moths)
state State simulated from a Markov model with no covariates
sex: "male" or "female".
age10: Age, in units of 10 years since age 50
statec: State simulated from a Markov model with covariates
statep: State simulated from a phase-type model (unused in any examples. See data-raw/infsim.R in the source for simulation settings)
statepc: State simuated from a phase-type model with covariates (unused in any examples)
An object of class data.frame with 360 rows and 14 columns.
The transition intensities used for the simulation are defined using a mean sojourn time of 180 days in the "test negative" state and 10 days in the "test positive" state.
For the model with covariates, the log hazard ratios are
2 for male on the 1-2 transition, 1 for age10 on the 1-2 transition,
and -1 for age10 on the 2-1 transition. Baseline intensities are
for female, age 50 (i.e. age10= 0).
In the state data, state 1 is negative and 2 is positive.
Simulated
In semi-Markov models specified with pastates, these only include
covariate effects on transitions out of "Markov" states, that is, those not included
in pastates. Effects on scale parameters of sojourn distributions
in "semi-Markov" states are extracted with logtaf.
loghr(draws) hr(draws)loghr(draws) hr(draws)
draws |
Object returned by |
A data frame containing samples from the posterior distribution.
See qdf for notes on this format and how to summarise.
hr returns these on the natural scale, loghr
returns them on the log scale.
Log likelihood from an msmbayes model
loglik(draws) ## S3 method for class 'msmbayes' logLik(object, ...)loglik(draws) ## S3 method for class 'msmbayes' logLik(object, ...)
draws |
Object returned by |
object |
Object returned by |
... |
For loglik, a data frame with rows for the log likelihood, log prior density and log posterior density,
and columns for the posterior (as an rvar object) and the mode (only if optimisation was used to fit the model, rather than MCMC).
If msmbayes was called with priors="mle", the maximised log posterior and log likelihood should be the same.
For logLik (note the different capitalisation), just the likelihood (mode if available, or posterior if not) is returned. This is a
method for the generic logLik function in the stats package.
Log odds of transition to a competing destination state, relative
to baseline destination state. Only applicable to phase-type
approximation models, specified with pastates.
logoddsnext(draws, new_data = NULL, keep_covid = FALSE)logoddsnext(draws, new_data = NULL, keep_covid = FALSE)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
keep_covid |
(logical) Keep the integer column |
pnext
Extracts the covariate effects on scale parameters in phase-type approximation
sojourn distributions in semi-Markov models. Note these are not hazard
ratios for transitions on the observable state space. They are time acceleration
factors, such that an increase in one covariate unit makes time pass at taf
times the speed, such that the sojourn time is expected to reduce by .
The log TAFs are labelled in the paper and vignettes.
logtaf(draws) taf(draws)logtaf(draws) taf(draws)
draws |
Object returned by |
See loghr for effects of covariates on transitions out of Markov states,
which are log hazard ratios.
A data frame containing samples from the posterior distribution.
See qdf for notes on this format and how to summarise.
taf returns these on the natural scale, logtaf
returns them on the log scale.
Mean sojourn times from an msmbayes model
mean_sojourn(draws, new_data = NULL, states = "obs", keep_covid = FALSE)mean_sojourn(draws, new_data = NULL, states = "obs", keep_covid = FALSE)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
states |
If If |
keep_covid |
(logical) Keep the integer column |
A data frame containing samples from the posterior distribution.
See qdf for notes on this format and how to summarise.
Fit a multi-state model to longitudinal data consisting of intermittent observation of a discrete state. Bayesian, approximate Bayesian or maximum likelihood estimation is used, via the Stan software.
msmbayes( data, state = "state", time = "time", subject = "subject", Q, covariates = NULL, pastates = NULL, pafamily = "gamma", panphase = NULL, E = NULL, Efix = NULL, obstype = NULL, deathexact = FALSE, obstrue = NULL, censor_states = NULL, constraint = NULL, nphase = NULL, priors = NULL, prob_initstate = NULL, soj_priordata = NULL, fit_method = NULL, keep_data = FALSE, ... )msmbayes( data, state = "state", time = "time", subject = "subject", Q, covariates = NULL, pastates = NULL, pafamily = "gamma", panphase = NULL, E = NULL, Efix = NULL, obstype = NULL, deathexact = FALSE, obstrue = NULL, censor_states = NULL, constraint = NULL, nphase = NULL, priors = NULL, prob_initstate = NULL, soj_priordata = NULL, fit_method = NULL, keep_data = FALSE, ... )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in
the data. This variable must either be an integer in 1,2,...,K,
where K is the number of states, or a factor with these integers
as level labels. If omitted, this is assumed to be |
time |
Character string naming the observation time variable in the data.
If omitted, this is assumed to be |
subject |
Character string naming the individual ID variable in the data.
If omitted, this is assumed to be |
Q |
Matrix indicating the transition structure. A zero entry
indicates that instantaneous transitions from (row) to (column)
are disallowed. An entry of 1 (or any other positive value)
indicates that the instantaneous transition is allowed. The
diagonal of There is no need to "guess" initial values and put them here, as
is sometimes done in |
covariates |
Specification of covariates on transition intensities. This should be a list of formulae, or a single formula. If a list is supplied, each formula should have a
left-hand side that looks like For example,
specifies that the log of the 1-2 transition intensity is an additive linear function of age and sex, and the log 2-1 transition intensity is a linear function of age. You do not have to list all of the intensities here if some of them are not influenced by covariates. If a single formula is supplied, this is assumed to apply to all intensities. If doing this, then take care with potential lack of identifiability of effects from sparse data. In models with phase-type approximated states (specified with
In models with phase-type approximations and competing exit states,
covariates on the relative risk of different exit states are
specified with a formula with
In phase-type models specified with |
pastates |
This indicates which states (if any) are given a
Weibull or Gamma sojourn distribution approximated by a phase-type model.
Ignored if |
pafamily |
|
panphase |
Number of phases to use for each state given a
phase-type Gamma or Weibull approximation. Vector of same length
as |
E |
By default, |
Efix |
Misclassfication probabilities in Markov models are
commonly not identifiable from data, particulary if the data are
intermittently observed. Instead of estimating them, a Markov
model with misclassification can be specified by supplying
assumed misclassification probabilities in the |
obstype |
Character string, giving a variable in the data which defines what a "row of the data" means. The variable must contain only the following values, which may be different in different rows:
This is the same feature as in the |
deathexact |
Set to |
obstrue |
Only applicable to models with misclassification. A character string indicating a variable in the data whose value is 1 if the true state is known to equal the value in "state", and 0 otherwise. |
censor_states |
A named list indicating the codes used for
"censored" states. This is used when there are observations that
are known to be one of a subset of states, but it is not known
which. The names of the list indicate codes that may appear in
the
means that a code of 99 in the Note the names of the list must be quoted strings that are
interpretable as integers, since the In misclassification models, the subset refers to values of the
true state if Unlike in |
constraint |
Constraints that a covariate has an equal effect on a particular set of transition intensities. A list with one component for each covariate that has constraints. Each component is a list of sets (or a single set) of intensities where the effect of that covariate is equal. For example, to constrain the effect of age to be equal for transitions 1-2 and 2-3, and also equal for transitions 1-4 and 2-4, and the effect of sexMALE to be equal for transitions 1-2 and 2-3, specify
This is the same feature as in |
nphase |
Only required for models with phase-type sojourn
distributions specified directly (not through |
priors |
A list specifying priors. Each component should be
the result of a call to In phase-type approximation models, the default priors are normal with mean 2, SD 2 for scale parameters (i.e. the log inverse of the default prior for the rate), normal(0, SD=0.5) truncated on the supported region for log shape parameters, and normal(0,1) for log odds of transition (relative to first exit state) in structures with competing exit states. See If only one parameter is given a non-default prior, a single Maximum likelihood estimation can be performed by setting
|
prob_initstate |
Probabilities of true states at a person's first observation time in a misclassification or model. If supplied, this should be a matrix with a row for each individual subject, and a column for each true state, or a vector with one element for each state that is assumed to apply to all individuals. If not supplied, every person is assumed to be in state 1 with probability 1 in misclassification models, or phase 1 of the observed state with probability 1 in phase-type models. Note no warning is currently given if the first observed state would be impossible if the person was really in state 1. This applies to both misclassification models, and phase-type models where a person's first observed state is phased. If the first observed state is not phased or misclassified, then this is ignored. |
soj_priordata |
Synthetic data that represents prior information about the mean sojourn times. Experimental, undocumented feature. |
fit_method |
Quoted string specifying the algorithm to fit the
model.
|
keep_data |
Store a copy of the cleaned data in the returned
object. |
... |
Other arguments to be passed to the function from
|
A data frame in the draws format of the
posterior package, containing draws from the posterior of
the model parameters.
Attributes are added to give information about the model structure,
and a class "msmbayes" is prepended.
See, e.g. summary.msmbayes, qdf,
hr, and similar functions, to extract parameter
estimates from the fitted model.
Called in the same way as msmbayes. The data should
still be supplied in this function, to ensure we are simulating
from a valid msmbayes model, but it is sufficient to
supply an empty data frame with no rows, and columns named as if
we were fitting a model with the given priors.
msmbayes_prior_sample( data, state = "state", time = "time", subject = "subject", Q, covariates = NULL, pastates = NULL, pafamily = "weibull", panphase = NULL, nphase = NULL, E = NULL, priors = NULL, nsim = 1 )msmbayes_prior_sample( data, state = "state", time = "time", subject = "subject", Q, covariates = NULL, pastates = NULL, pafamily = "weibull", panphase = NULL, nphase = NULL, E = NULL, priors = NULL, nsim = 1 )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in
the data. This variable must either be an integer in 1,2,...,K,
where K is the number of states, or a factor with these integers
as level labels. If omitted, this is assumed to be |
time |
Character string naming the observation time variable in the data.
If omitted, this is assumed to be |
subject |
Character string naming the individual ID variable in the data.
If omitted, this is assumed to be |
Q |
Matrix indicating the transition structure. A zero entry
indicates that instantaneous transitions from (row) to (column)
are disallowed. An entry of 1 (or any other positive value)
indicates that the instantaneous transition is allowed. The
diagonal of There is no need to "guess" initial values and put them here, as
is sometimes done in |
covariates |
Specification of covariates on transition intensities. This should be a list of formulae, or a single formula. If a list is supplied, each formula should have a
left-hand side that looks like For example,
specifies that the log of the 1-2 transition intensity is an additive linear function of age and sex, and the log 2-1 transition intensity is a linear function of age. You do not have to list all of the intensities here if some of them are not influenced by covariates. If a single formula is supplied, this is assumed to apply to all intensities. If doing this, then take care with potential lack of identifiability of effects from sparse data. In models with phase-type approximated states (specified with
In models with phase-type approximations and competing exit states,
covariates on the relative risk of different exit states are
specified with a formula with
In phase-type models specified with |
pastates |
This indicates which states (if any) are given a
Weibull or Gamma sojourn distribution approximated by a phase-type model.
Ignored if |
pafamily |
|
panphase |
Number of phases to use for each state given a
phase-type Gamma or Weibull approximation. Vector of same length
as |
nphase |
Only required for models with phase-type sojourn
distributions specified directly (not through |
E |
By default, |
priors |
A list specifying priors. Each component should be
the result of a call to In phase-type approximation models, the default priors are normal with mean 2, SD 2 for scale parameters (i.e. the log inverse of the default prior for the rate), normal(0, SD=0.5) truncated on the supported region for log shape parameters, and normal(0,1) for log odds of transition (relative to first exit state) in structures with competing exit states. See If only one parameter is given a non-default prior, a single Maximum likelihood estimation can be performed by setting
|
nsim |
Number of samples to generate |
A data frame with one column per model parameter (on a transformed scale, e.g. log intensities), and one row per sample. The names are in the natural
format as specified in priors.
An attribute "stan_names" contains the names of the
corresponding parameters in the draws object that would be
returned by msmbayes if this model were to be fitted to data.
These are the names used internally by Stan, and not meant to be
interpretable by users.
An attribute "expand" contains the same sample but with
parameters for covariate effects referring to state transitions
on the latent space. Used internally for posterior predictive
sampling.
This generates a single sample of parameters from the prior, then
generates observed states from a multi-state model with those
parameters. The data argument should contain the time and
subject indicators at which states are to be simulated (by default),
or the maximum observation time (if complete_obs=FALSE).
msmbayes_priorpred_sample( data, state = "state", time = "time", subject = "subject", Q, covariates = NULL, pastates = NULL, pafamily = "gamma", panphase = NULL, nphase = NULL, E = NULL, priors = NULL, complete_obs = FALSE, cov_format = "orig" )msmbayes_priorpred_sample( data, state = "state", time = "time", subject = "subject", Q, covariates = NULL, pastates = NULL, pafamily = "gamma", panphase = NULL, nphase = NULL, E = NULL, priors = NULL, complete_obs = FALSE, cov_format = "orig" )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in
the data. This variable must either be an integer in 1,2,...,K,
where K is the number of states, or a factor with these integers
as level labels. If omitted, this is assumed to be |
time |
Character string naming the observation time variable in the data.
If omitted, this is assumed to be |
subject |
Character string naming the individual ID variable in the data.
If omitted, this is assumed to be |
Q |
Matrix indicating the transition structure. A zero entry
indicates that instantaneous transitions from (row) to (column)
are disallowed. An entry of 1 (or any other positive value)
indicates that the instantaneous transition is allowed. The
diagonal of There is no need to "guess" initial values and put them here, as
is sometimes done in |
covariates |
Specification of covariates on transition intensities. This should be a list of formulae, or a single formula. If a list is supplied, each formula should have a
left-hand side that looks like For example,
specifies that the log of the 1-2 transition intensity is an additive linear function of age and sex, and the log 2-1 transition intensity is a linear function of age. You do not have to list all of the intensities here if some of them are not influenced by covariates. If a single formula is supplied, this is assumed to apply to all intensities. If doing this, then take care with potential lack of identifiability of effects from sparse data. In models with phase-type approximated states (specified with
In models with phase-type approximations and competing exit states,
covariates on the relative risk of different exit states are
specified with a formula with
In phase-type models specified with |
pastates |
This indicates which states (if any) are given a
Weibull or Gamma sojourn distribution approximated by a phase-type model.
Ignored if |
pafamily |
|
panphase |
Number of phases to use for each state given a
phase-type Gamma or Weibull approximation. Vector of same length
as |
nphase |
Only required for models with phase-type sojourn
distributions specified directly (not through |
E |
By default, |
priors |
A list specifying priors. Each component should be
the result of a call to In phase-type approximation models, the default priors are normal with mean 2, SD 2 for scale parameters (i.e. the log inverse of the default prior for the rate), normal(0, SD=0.5) truncated on the supported region for log shape parameters, and normal(0,1) for log odds of transition (relative to first exit state) in structures with competing exit states. See If only one parameter is given a non-default prior, a single Maximum likelihood estimation can be performed by setting
|
complete_obs |
If If |
cov_format |
If |
For phase-type approximation models, this simulates from the phase-type approximation, not the Weibull or Gamma (e.g) distribution that it is designed to approximate.
A data frame or a list, see msm::simmulti.msm or msm::sim.msm respectively.
This works similarly to a histogram. The state observations are
binned into time intervals with roughly equal numbers of
observations. Within each bin, the probability that an
observation comes from each state is estimated.
msmhist( data, state = "state", time = "time", subject = "subject", nbins, absorbing = NULL, censtimes = NULL, stacked = TRUE )msmhist( data, state = "state", time = "time", subject = "subject", nbins, absorbing = NULL, censtimes = NULL, stacked = TRUE )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in
the data. This variable must either be an integer in 1,2,...,K,
where K is the number of states, or a factor with these integers
as level labels. If omitted, this is assumed to be |
time |
Character string naming the observation time variable in the data.
If omitted, this is assumed to be |
subject |
Character string naming the individual ID variable in the data.
If omitted, this is assumed to be |
nbins |
Number of time intervals to bin the state observations into. The underlying distribution of states illustrated by the plot will be assumed constant within each interval. |
absorbing |
Indices of any absorbing states. Individuals are assumed to stay in their absorbing state, and contribute one observation to each bin after their absorption time. By default, no states are assumed to be absorbing. |
censtimes |
Vector of maximum intended follow-up times for the people in the data who entered absorbing states. This supposes that had the person not entered the absorbing state, they would not have been observed after this time. |
stacked |
If If |
If each subject has at most one observation in a bin, then
is estimated as the proportion of observations in the bin that are
of that state.
More generally, if an individual has more than one observation in
the bin, is estimated as follows. For each observed
individual and each state , we define a variable
equal to the proportion of individual 's
observations that are of state . For example, in a
three-state model, where a person has two observations in a bin,
and these are states 2 and 3, then for
states 1, 2 and 3 respectively. The bin-specific estimate of
is then the average of over individuals
who have at least one observation in that bin.
The results are visualised as a stacked bar plot. The individual observations of states are represented as points placed at random y positions within each state-specific bar.
This is intended as an alternative to the "observed prevalences"
plot in the function prevalence.msm from the msm package,
with a clearer connection to the data. It can be overlaid
with predictions of transition probabilities from a msmbayes
or msm model, to check the fit of the model.
The method used by "observed prevalences" plots places a strong assumption on the (unobserved) individual data, that individuals stay in the same state between observations, or transition at the midpoint between observations.
msmhist places no assumption on the individual data. Instead
the assumption is placed on the distribution underlying the data.
In a similar fashion to a histogram, it assumes that the
distribution of states is the same at all times within each
time interval bin.
A ggplot2 plot object.
msmhist_bardata to extract the numbers
behind this plot so the plot can be customised by hand.
msmhist(infsim, "state", "months", "subject", nbins=30) msmhist(infsim2, "state", "months", "subject", nbins=6)msmhist(infsim, "state", "months", "subject", nbins=30) msmhist(infsim2, "state", "months", "subject", nbins=6)
msmhist
Estimate state occupation probabilities to be illustrated by a bar
plot in msmhist
msmhist_bardata( data, state = "state", time = "time", subject = "subject", nbins, absorbing = NULL, censtimes = NULL )msmhist_bardata( data, state = "state", time = "time", subject = "subject", nbins, absorbing = NULL, censtimes = NULL )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in
the data. This variable must either be an integer in 1,2,...,K,
where K is the number of states, or a factor with these integers
as level labels. If omitted, this is assumed to be |
time |
Character string naming the observation time variable in the data.
If omitted, this is assumed to be |
subject |
Character string naming the individual ID variable in the data.
If omitted, this is assumed to be |
nbins |
Number of time intervals to bin the state observations into. The underlying distribution of states illustrated by the plot will be assumed constant within each interval. |
absorbing |
Indices of any absorbing states. Individuals are assumed to stay in their absorbing state, and contribute one observation to each bin after their absorption time. By default, no states are assumed to be absorbing. |
censtimes |
Vector of maximum intended follow-up times for the people in the data who entered absorbing states. This supposes that had the person not entered the absorbing state, they would not have been observed after this time. |
Data frame with one row per bin and state, and columns:
binid: Integer ID for bin
binlabel: Character label for bin, with time interval
state: State
binstart, binend: Start and end time of the bin (numeric)
props: estimates of state $s$ occupancy proportions $p(s)$ for each bin
cumpstart, cumpend: Cumulative sum of props over the set of
states, where cumpstart starts at 0, and cumpend ends at
Intended for creating stacked bar plots with geom_rect or
similar.
Constructor for a prior distribution in msmbayes
msmprior( par, mean = NULL, sd = NULL, median = NULL, lower = NULL, upper = NULL )msmprior( par, mean = NULL, sd = NULL, median = NULL, lower = NULL, upper = NULL )
par |
Character string indicating the model parameter to place the prior on. This should start with one of the following:
Covariate effects on competing transitions out of semi-Markov
states are specified with In general, the indices or the covariate name can be omitted to indicate that
the same prior will used for all transitions, or/and all
covariates. This can be done with or without the brackets, e.g.
|
mean |
Prior mean. This is only used for the parameters that have direct normal priors, that is |
sd |
Prior standard deviation (only for parameters with direct normal priors) |
median |
Prior median |
lower |
Prior lower 95% quantile |
upper |
Prior upper 95% quantile |
In msmbayes, a normal prior is used for the log
transition intensities (logq) and log hazard ratios (loghr).
The goal of this function is to determine the mean and SD of this
prior. It can be used in two ways:
(a) directly specifying the prior mean and SD of logq or loghr'
(b) specifying prior quantiles for more interpretable
transformations of these. These may include q (the transition
intensity) or time (the reciprocal of the intensity,
interpreted as a mean time to this transition when observing a
sequence of individuals at risk of it). Or hr (hazard ratio)
Two quantiles (out of the median, lower or upper) should be provided.
If all three are provided, then the upper quantile is ignored. These
are transformed back to the scale of logq or loghr, and the
unique normal prior with these quantiles is deduced.
A list of class "msmprior", with components
par (as supplied by the user)
par_base (e.g. "logq" if "time" was provided, or "loghr" if "hr" was provided)
covname (name of covariate effect)
ind1, ind2 (as supplied by the user)
mean (of log-normal prior on par_base)
sd (of log-normal prior on par_base)
priors <- list( msmprior("logq(1,2)", median=-2, lower=-4), msmprior("q(2,1)", median=0.1, upper=10) ) Q <- rbind(c(0,1),c(1,0)) mod <- msmbayes(data=infsim2, state="state", time="months", subject="subject", Q=Q, priors=priors, fit_method="optimize") summary(mod)priors <- list( msmprior("logq(1,2)", median=-2, lower=-4), msmprior("q(2,1)", median=0.1, upper=10) ) Q <- rbind(c(0,1),c(1,0)) mod <- msmbayes(data=infsim2, state="state", time="months", subject="subject", Q=Q, priors=priors, fit_method="optimize") summary(mod)
From Bobbio et al. (Theorem 3.1)
n3_moment_bounds(n2, n)n3_moment_bounds(n2, n)
n2 |
Second normalised moment |
n |
Number of phases |
List with components lower, upper defining the
bounds on the third normalised moment (n3) required for
n2 and n3 to be the moments of a phase type distribution
with n phases.
Density, probability distribution, quantile, moment, hazard and random number generation functions for the Coxian phase-type distribution with any number of phases.
dnphase(x, prate, arate, initp = NULL, method = "expm") pnphase(q, prate, arate, initp = NULL, method = "expm", lower.tail = TRUE) hnphase(x, prate, arate, initp = NULL, method = "expm") mean_nphase(prate, arate, initp = NULL) var_nphase(prate, arate, initp = NULL) skewness_nphase(prate, arate, initp = NULL) ncmoment_nphase(prate, arate, i, initp = NULL) rnphase(n, prate, arate) qnphase(p, prate, arate, lower.tail = TRUE, log.p = FALSE, method = "expm")dnphase(x, prate, arate, initp = NULL, method = "expm") pnphase(q, prate, arate, initp = NULL, method = "expm", lower.tail = TRUE) hnphase(x, prate, arate, initp = NULL, method = "expm") mean_nphase(prate, arate, initp = NULL) var_nphase(prate, arate, initp = NULL) skewness_nphase(prate, arate, initp = NULL) ncmoment_nphase(prate, arate, i, initp = NULL) rnphase(n, prate, arate) qnphase(p, prate, arate, lower.tail = TRUE, log.p = FALSE, method = "expm")
x |
Value at which to evaluate the PDF, CDF, or hazard. |
prate |
Progression rates. Either a vector of length
|
arate |
Absorption rates. Either a vector of length |
initp |
Vector of probabilities of occupying each phase at the start of the sojourn. By default, the first phase has probability 1. |
method |
If |
q |
Value at which to evaluate the CDF. |
lower.tail |
If |
i |
which moment to return from |
n |
Number of random samples to generate. |
p |
Probability at which to evaluate the quantile |
log.p |
return log probability |
The number of phases, nphase, is taken from the
dimensions of the object supplied as arate. If arate is a
vector, then the number of phases is assumed to equal the length
of this vector. If arate is a matrix, then the number of
phases is assumed to be the number of columns.
mean_nphase, var_nphase, skewness_nphase and
ncmoment_nphase return the mean, variance, skewness and general
non-central moments of the distribution.
These functions work in a vectorised way, so that alternative
parameter values or evaluation values x can be supplied. The
number of alternative values is determined from the number of rows
nrep of arate. Then if necessary, prate and x are
replicated to match the size of arate.
A vector of length n or length(x).
Given a phase-type sojourn distribution, return the corresponding Markov intensity matrix where the last state is the absorbing state, and the the time to absorption is the sojourn distribution.
nphase_Q(prate, arate)nphase_Q(prate, arate)
prate |
Progression rates. Either a vector of length
|
arate |
Absorption rates. Either a vector of length |
In models with covariates on the scale parameter, this currently only presents these parameters for covariate values of zero.
phaseapprox_pars(draws, log = FALSE)phaseapprox_pars(draws, log = FALSE)
draws |
Object returned by |
log |
Return parameters on log scale |
Transition probability matrix from an msmbayes model
pmatrix( draws, t = 1, new_data = NULL, states = "obs", drop = TRUE, type = "posterior" )pmatrix( draws, t = 1, new_data = NULL, states = "obs", drop = TRUE, type = "posterior" )
draws |
Object returned by |
t |
prediction time or vector of prediction times |
new_data |
Data frame with covariate values to predict for |
states |
If If |
drop |
Only used if there are no covariates supplied in
|
type |
|
Array or matrix of rvar objects giving the transition probability matrix at each requested prediction time and covariate value. See qdf for notes on the rvar format.
For phase-type models, if states="obs", so that we want
transition probabilities on the observable space, this returns the
probability of transition to any phase of each "destination" state,
for an individual who is in the first phase of each "starting"
state.
pmatrixdf returns the same information in a tidy
data frame format.
Transition probabilities from an msmbayes model, presented as a tidy data frame
pmatrixdf(draws, t = 1, new_data = NULL, states = "obs")pmatrixdf(draws, t = 1, new_data = NULL, states = "obs")
draws |
Object returned by |
t |
prediction time or vector of prediction times |
new_data |
Data frame with covariate values to predict for |
states |
If If |
A data frame containing samples from the posterior distribution.
See qdf for notes on this format and how to summarise.
For phase-type models, if states="obs", so that we want
transition probabilities on the observable space, this returns the
probability of transition to any phase of each "destination" state,
for an individual who is in the first phase of each "starting"
state.
Given an individual is currently in state , these are the
probabilities that when leaving state , the individual will
move to a particular state .
pnext(draws, new_data = NULL)pnext(draws, new_data = NULL)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
In a Markov model, this is defined as the transition intensity
from to divided by the sum of all transition
intensities out of .
In semi-Markov models, this quantity is a model parameter in itself. In phase-type approximation models, the parameters consist of the parameters of the sojourn distribution and the next-state probabilities, which (as in a Markov model) are assumed to be independent of the sojourn time.
As the models in msmbayes work in continuous time, the
next-state probability is different from the transition
probability. The transition probability is the probability that
the individual is in state at a specific time in the
future, and can be obtained from an msmbayes model with the
functions pdf, pmatrix.
Transition intensities from an msmbayes model, presented as a tidy data frame
qdf(draws, new_data = NULL, keep_covid = FALSE)qdf(draws, new_data = NULL, keep_covid = FALSE)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
keep_covid |
(logical) Keep the integer column |
A data frame with one row per from-state / to-state / covariate value.
Column posterior is in the rvar format of the
posterior package, representing a sample from a posterior
distribution. Use the summary function on the data frame to
produce summary statistics such as the posterior median or mean (see
summary.msmbayes).
qmatrix returns the same information in matrix format
qdf(infsim_model) summary(qdf(infsim_model)) summary(qdf(infsim_model), median, ~quantile(.x, c(0.025, 0.975))) qdf(infsim_modelc, new_data = data.frame(sex=c("female","male")))qdf(infsim_model) summary(qdf(infsim_model)) summary(qdf(infsim_model), median, ~quantile(.x, c(0.025, 0.975))) qdf(infsim_modelc, new_data = data.frame(sex=c("female","male")))
Transition intensity matrix from an msmbayes model
qmatrix(draws, new_data = NULL, X = NULL, drop = TRUE, type = "posterior")qmatrix(draws, new_data = NULL, X = NULL, drop = TRUE, type = "posterior")
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
X |
Lower-level alternative to specifying |
drop |
Only used if there are no covariates supplied in
|
type |
|
An array or matrix of rvar objects or numbers, representing the
transition intensity matrix for each new prediction data point
qdf returns the same information in a
tidy data frame format
qmatrix(infsim_model) summary(qmatrix(infsim_model)) summary(qmatrix(infsim_model), median, ~quantile(.x, c(0.025, 0.975)))qmatrix(infsim_model) summary(qmatrix(infsim_model)) summary(qmatrix(infsim_model), median, ~quantile(.x, c(0.025, 0.975)))
Convert a multi-state model intensity matrix with one or more non-Markov states to an intensity matrix on a phase-type state space, where the non-Markov states are modelled with a phase-type approximation of a shape/scale distribution.
qphaseapprox( qmatrix, pastates, shape, scale = NULL, family = "gamma", nphase = NULL, att = FALSE )qphaseapprox( qmatrix, pastates, shape, scale = NULL, family = "gamma", nphase = NULL, att = FALSE )
qmatrix |
Intensity matrix on the observable state space. Only the rates for transitions out of Markov states are used, and values of rates for transitions out of the non-Markov state are ignored, unless there are competing next states. In that case the relative value of the intensities are interpreted as the transition probability to each next state. These transition probabilities are multiplied by the phase transition rates of the sojourn distribution in the non-Markov state to get the transition rates from the phases to the destination state. |
pastates |
This indicates which states (if any) are given a
Weibull or Gamma sojourn distribution approximated by a phase-type model.
Ignored if |
shape |
shape parameter. This can be vectorised. |
scale |
scale parameter. This can be vectorised. |
family |
parametric family approximated by the phase-type distribution: |
nphase |
Only required for models with phase-type sojourn
distributions specified directly (not through |
att |
keep attributes indicating progression and absorption states |
Intensity matrix on the latent state space.
Effects of covariates on competing exit transitions in phase type models
logrrnext(draws) rrnext(draws)logrrnext(draws) rrnext(draws)
draws |
Object returned by |
Only applicable to phase-type approximation models,
specified with pastates.
logrrnext gives the Linear effect of covariates on log relative
risk of transition to a competing destination state, relative to
baseline destination state.
rrnext gives the exponential of the linear effect,
interpretable as a hazard ratio. See the semi-Markov models
vignette and paper for the mathematical details.
A data frame containing samples from the posterior distribution.
See qdf for notes on this format and how to summarise.
Test whether a shape parameter of is in the bounds required for a valid phase-type approximation
gamma_shape_in_bounds(shape, nphase) weibull_shape_in_bounds(shape, nphase)gamma_shape_in_bounds(shape, nphase) weibull_shape_in_bounds(shape, nphase)
shape |
Shape parameter or vector) |
nphase |
Number of phases |
Also verifies that the parameter satisfies Case 1 of Theorem 1 in Bobbio et al.
Vector or logicals, whether each shape parameter is in the bounds require for a phase-type approximation with that number of phases.
Upper bound for shape parameter in moment-based phase-type approximations
shape_ubound(nphase, family)shape_ubound(nphase, family)
nphase |
Number of approximating phases |
family |
|
Upper bound for shape parameter
Determine parameters of a phase-type model that approximate a parametric shape-scale distribution
shapescale_to_rates( shape, scale = 1, family = "gamma", canonical = FALSE, nphase = 3, list = FALSE, drop = TRUE )shapescale_to_rates( shape, scale = 1, family = "gamma", canonical = FALSE, nphase = 3, list = FALSE, drop = TRUE )
shape |
shape parameter. This can be vectorised. |
scale |
scale parameter. This can be vectorised. |
family |
parametric family approximated by the phase-type distribution: |
canonical |
Return the phase-type parameters in canonical form (phase 1 sojourn rate, sojourn rate increments in subsequent states, absorption probabilities). If |
nphase |
Number of phases. |
list |
If |
drop |
If shape or scale have both have one element, and |
The approximating phase-type distribution is one for which the first three moments are the same as those of the target distribution. See the vignettes and paper for full details.
Either or both states can have any sojourn distribution that we know how to simulate from.
sim_2state_smm( nindiv, obstimes, rfn1 = rexp, pars1 = list(rate = 1), rfn2 = exp, pars2 = list(rate = 1) )sim_2state_smm( nindiv, obstimes, rfn1 = rexp, pars1 = list(rate = 1), rfn2 = exp, pars2 = list(rate = 1) )
nindiv |
Number of individuals. |
obstimes |
Observation times, common between individuals. |
rfn1 |
Function to simulate from the sojourn distribution in state 2. By default this is the exponential. |
pars1 |
Named list of arguments and their values to be passed
to |
rfn2 |
Function to simulate from the sojourn distribution in state 2. |
pars2 |
Named list of arguments and their values to be passed
to |
Sojourn probability in a state of a msmbayes model
soj_prob(draws, t, state, new_data = NULL, method = "analytic")soj_prob(draws, t, state, new_data = NULL, method = "analytic")
draws |
Object returned by |
t |
Time since state entry. A single time or a vector can be supplied. |
state |
State of interest (A single integer) |
new_data |
Data frame with covariate values to predict for |
method |
Only applicable to phase-type models. Method for computing
the matrix exponential involved in the phase-type sojourn distribution.
See |
For the inverse of this function, see soj_quantile.
A data frame with column posterior giving the posterior
distribution for the probability of remaining in state by time
t since state entry, as an rvar object. Other columns give
the time and any covariate values.
Note this is the analogue of the survival probability, or one minus the CDF of the time-to-event distribution
See qdf for notes on the rvar format.
Quantiles of the sojourn distribution in a state of a msmbayes model
soj_quantile(draws, p, state, new_data = NULL, method = "analytic")soj_quantile(draws, p, state, new_data = NULL, method = "analytic")
draws |
Object returned by |
p |
Vector of probabilities at which to evaluate quantiles of the sojourn distribution |
state |
State of interest (A single integer) |
new_data |
Data frame with covariate values to predict for |
method |
Only applicable to phase-type models. Method for computing
the matrix exponential involved in the phase-type sojourn distribution.
See |
For the inverse of this function, see soj_prob.
A data frame with column posterior giving the posterior
distribution (as an rvar object) for the time in state for
which the probability of remaining in this state by this time is
p.
Standardised outputs are outputs from models with covariates, that are defined by marginalising (averaging) over covariate values in a given population, rather than being conditional on a given covariate value.
standardise_to(new_data) standardize_to(new_data)standardise_to(new_data) standardize_to(new_data)
new_data |
Data frame with covariate values to predict for |
Standardised outputs are produced from a Monte Carlo sample from
the joint distribution of parameters and covariate
values , , where
is defined by the empirical distribution of covariates
in the standard population. This joint sample is obtained by
concatenating samples of covariate-specific outputs.
Hence applying an output function (such as the transition
probability) to this sample produces a sample from the posterior of
: the average transition probability (say)
for a heterogeneous population.
nd <- data.frame(sex=c("female","male")) ## gender-specific outputs qdf(infsim_modelc, new_data = nd) ## averaged over men and women in the same proportions as are in `nd` ## in this case, `nd` has two rows, so we take a 50/50 average qdf(infsim_modelc, new_data = standardise_to(nd))nd <- data.frame(sex=c("female","male")) ## gender-specific outputs qdf(infsim_modelc, new_data = nd) ## averaged over men and women in the same proportions as are in `nd` ## in this case, `nd` has two rows, so we take a 50/50 average qdf(infsim_modelc, new_data = standardise_to(nd))
Tabulate observed transitions between states over successive observations, by from-state, to-state and (optionally) time interval length and covariate values.
statetable( data, state = "state", subject = "subject", time = "time", covariates = NULL, time_groups = 1, format = "wide" )statetable( data, state = "state", subject = "subject", time = "time", covariates = NULL, time_groups = 1, format = "wide" )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in
the data. This variable must either be an integer in 1,2,...,K,
where K is the number of states, or a factor with these integers
as level labels. If omitted, this is assumed to be |
subject |
Character string naming the individual ID variable in the data.
If omitted, this is assumed to be |
time |
Character string naming the observation time variable in the data.
If omitted, this is assumed to be |
covariates |
Vector of names of covariates to summarise counts by. |
time_groups |
Number of groups to summarise the time intervals by. The transitions are categorised into groups according to equally-spaced quantiles of the time interval length. |
format |
|
This is like the function statetable.msm in msm, except that it
uses msmbayes syntax for specifying the data, it summarises
the length of the time intervals between successive observations,
and it returns a tidy data frame.
Warning: it is not appropriate to choose the transition structure
(the Q argument to msmbayes()) on the basis of this summary.
statetable counts transitions over a time interval, whereas
Q indicates which instantaneous transitions are possible.
The structures will not be the same. For example, in a model with
instananeous transitions from mild to moderate illness, and moderate
to severe, we might observe transitions from mild to severe over
an interval of 1 year (say), but the instantaneous transition from
mild to severe is impossible.
Note this is not fully tidy-friendly, as it will not work
if data is grouped using dplyr.
A data frame with columns fromstate, timelag and
n (count of transitions), and column or columns for tostate.
Summarise basic parameter estimates from an msmbayes model
## S3 method for class 'msmbayes' summary(object, pars = NULL, ...)## S3 method for class 'msmbayes' summary(object, pars = NULL, ...)
object |
Object returned by |
pars |
Character string indicating the parameters to include in the summary. This can include:
This defaults to whichever of |
... |
A data frame with one row for each basic model parameter,
plus rows for the mean sojourn times. The posterior distribution
for the parameter is encoded in the column posterior, which
has the rvar data type defined by the posterior
package. This distribution can be summarised in any way by
calling summary again on the data frame (see the
examples).
Transition intensities, or transformations of transition intensities, are those for covariate values of zero.
Remaining parameters (in non-HMMs) are log hazard ratios for covariate effects.
The columns prior and prior_string summarise the
corresponding prior distribution in two different ways.
prior is a quasi-random sample from the prior in the rvar
data type, and is printed as mean and standard deviation. This
sample can then be used to produce any summary or plot of the
prior. The string prior_string is a summary of this sample,
showing the median and 95% equal tailed credible interval.
qdf, hr, loghr,
posterior::summarise_draws
summary(infsim_model) summary(summary(infsim_model)) summary(summary(infsim_model), median, ~quantile(.x, 0.025, 0.975))summary(infsim_model) summary(summary(infsim_model)) summary(summary(infsim_model), median, ~quantile(.x, 0.025, 0.975))
See msm::totlos.msm() for the theory behind the method used to
calculate this. The analytic formula is used, not numerical integration.
totlos( draws, t, new_data = NULL, fromt = 0, pstart = NULL, discount = 0, states = "obs" )totlos( draws, t, new_data = NULL, fromt = 0, pstart = NULL, discount = 0, states = "obs" )
draws |
Object returned by |
t |
End point of the time interval over which to measure length of stay in each state |
new_data |
Data frame with covariate values to predict for |
fromt |
Starting point of the time interval, by default 0 |
pstart |
Vector giving distribution of states at time 0 |
discount |
Discount rate in continuous time |
states |
If If |
Data frame with one row for each state and covariate value, giving the expected amount of time spent in that state over the forecast interval.