Title: | Bayesian Multi-State Models for Intermittently-Observed Data |
---|---|
Description: | Bayesian multi-state models for intermittently-observed data. Markov and phase-type semi-Markov models, and misclassification hidden Markov models. |
Authors: | Christopher Jackson [aut, cre]
|
Maintainer: | Christopher Jackson <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2 |
Built: | 2025-02-16 05:37:42 UTC |
Source: | https://github.com/chjackson/msmbayes |
For an introduction to and overview of the msmbayes
package, and full documentation, see
http://chjackson.github.io/msmbayes.
For more resources on multi-state modelling, see the msm
package and its documentation.
Maintainer: Christopher Jackson [email protected] (ORCID)
Useful links:
A simulated multistate dataset with lots of observations and covariates
bigdat
bigdat
See infsim/bigdata.R
in the source for simulation settings.
Misclassification probabilities from an msmbayes model
edf(draws)
edf(draws)
draws |
Object returned by |
A data frame with one row per misclassification probability.
from
indicates the true state, and to
indicates the observed state.
qdf
for more information about the format.
Example fitted model objects used for testing msmbayes
infsim_model infsim_modelc infsim_modelp infsim_modelpc cav_misc
infsim_model infsim_modelc infsim_modelp infsim_modelpc cav_misc
An object of class msmbayes
, obtained by fitting a
Markov model to the dataset infsim
. See
data-raw/infsim.R
in the source for the model
specification code
infsim_modelc
includes covariates on the transition
intensities.
infsim_modelp
is a phase-type model.
infsim_modelpc
is a phase-type model with covariates.
cav_misc
is the misclassification model fitted to the CAV data from msm in the "Advanced multi-state models in msmbayes" vignette.
intensities.
Simulated
Hazard ratios for covariates on transition intensities
hr(draws)
hr(draws)
draws |
Object returned by |
A data frame containing samples from the posterior distribution.
See qdf
for notes on this format and how to summarise.
Simulated infection testing data
infsim infsim2
infsim infsim2
infsim
has 3600 rows, with 36 state observations for each of 100 people. Columns are:
subject
Subject identifier
days
Observation time (in days)
months
Observation time (in moths)
state
State simulated from a Markov model with no covariates
sex
: "male" or "female".
age10
: Age, in units of 10 years since age 50
statec
: State simulated from a Markov model with covariates
statep
: State simulated from a phase-type model (unused in any examples. See data-raw/infsim.R
in the source for simulation settings)
statepc
: State simuated from a phase-type model with covariates (unused in any examples)
A smaller dataset infsim2
has only 360 rows, from 20 people, and
is simulated using a sojourn time of 60 days in the test-negative state
and 10 days in test-positive.
An object of class data.frame
with 360 rows and 14 columns.
Simulated
Log hazard ratios for covariates on transition intensities
loghr(draws)
loghr(draws)
draws |
Object returned by |
A data frame containing samples from the posterior distribution.
See qdf
for notes on this format and how to summarise.
Mean sojourn times from an msmbayes model
mean_sojourn(draws, new_data = NULL, by_phase = TRUE)
mean_sojourn(draws, new_data = NULL, by_phase = TRUE)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
by_phase |
For states with phase type distributions: If If |
A data frame containing samples from the posterior distribution.
See qdf
for notes on this format and how to summarise.
Fit a multi-state model to longitudinal data consisting of intermittent observation of a discrete state. Bayesian estimation is used, via the Stan software.
msmbayes( data, state, time, subject, Q, E = NULL, covariates = NULL, nphase = NULL, priors = NULL, soj_priordata = NULL, fit_method = "sample", keep_data = FALSE, ... )
msmbayes( data, state, time, subject, Q, E = NULL, covariates = NULL, nphase = NULL, priors = NULL, soj_priordata = NULL, fit_method = "sample", keep_data = FALSE, ... )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels. |
time |
Character string naming the observation time variable in the data. |
subject |
Character string naming the individual ID variable in the data. |
Q |
Matrix indicating the transition structure. A zero entry
indicates that instantaneous transitions from (row) to (column)
are disallowed. An entry of 1 (or any other positive value)
indicates that the instantaneous transition is allowed. The
diagonal of There is no need to "guess" initial values and put them here, as
is sometimes done in |
E |
If |
covariates |
Specification of covariates on transition intensities.
This should be a list of formulae. Each formula should have a
left-hand side that looks like For example,
specifies that the log of the 1-2 transition intensity is an additive linear function of age and sex, and the log 2-1 transition intensity is a linear function of age. You do not have to list all of the intensities here if some of them are not influenced by covariates. |
nphase |
For phase-type models, this is a vector with one element per state, giving the number of phases per state. This element is 1 for states that do not have phase-type sojourn distributions. Not required for non-phase-type models. |
priors |
A list specifying priors. Each component should be
the result of a call to If only one parameter is given a non-default prior, a single msmprior call can be supplied here instead of a list. |
soj_priordata |
Synthetic data that represents prior information about the mean sojourn time. Experimental feature, currently undocumented. |
fit_method |
Quoted name of a function from the |
keep_data |
Store a copy of the cleaned data in the returned
object. |
... |
Other arguments to be passed to the function from
|
A data frame in the draws
format of the
posterior package, containing draws from the posterior of
the model parameters.
Attributes are added to give information about the model structure,
and a class "msmbayes"
is appended.
See, e.g. summary.msmbayes
, qdf
,
hr
, and similar functions, to extract parameter
estimates from the fitted model.
This works similarly to a histogram. The state observations are
binned into time intervals with roughly equal numbers of
observations. Within each bin, the probability that an
observation comes from each state
is estimated.
msmhist( data, state, time, subject, nbins, absorbing = NULL, censtimes = NULL, stacked = TRUE )
msmhist( data, state, time, subject, nbins, absorbing = NULL, censtimes = NULL, stacked = TRUE )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels. |
time |
Character string naming the observation time variable in the data. |
subject |
Character string naming the individual ID variable in the data. |
nbins |
Number of time intervals to bin the state observations into. The underlying distribution of states illustrated by the plot will be assumed constant within each interval. |
absorbing |
Indices of any absorbing states. Individuals are assumed to stay in their absorbing state, and contribute one observation to each bin after their absorption time. By default, no states are assumed to be absorbing. |
censtimes |
Vector of maximum intended follow-up times for the people in the data who entered absorbing states. This supposes that had the person not entered the absorbing state, they would not have been observed after this time. |
stacked |
If If |
If each subject has at most one observation in a bin, then
is estimated as the proportion of observations in the bin that are
of that state.
More generally, if an individual has more than one observation in
the bin, is estimated as follows. For each observed
individual
and each state
, we define a variable
equal to the proportion of individual
's
observations that are of state
. For example, in a
three-state model, where a person has two observations in a bin,
and these are states 2 and 3, then
for
states 1, 2 and 3 respectively. The bin-specific estimate of
is then the average of
over individuals
who have at least one observation in that bin.
The results are visualised as a stacked bar plot. The individual observations of states are represented as points placed at random y positions within each state-specific bar.
This is intended as an alternative to the "observed prevalences"
plot in the function prevalence.msm
from the msm
package,
with a clearer connection to the data. It can be overlaid
with predictions of transition probabilities from a msmbayes
or msm
model, to check the fit of the model.
The method used by "observed prevalences" plots places a strong assumption on the individual data, that individuals stay in the same state between observations, or transition at the midpoint between observations.
msmhist
places no assumption on the individual data. Instead
the assumption is placed on the distribution underlying the data.
The histogram-like visualisation assumes, essentially, that the
distribution of states is the same at all times within each bin.
A ggplot2
plot object.
msmhist_bardata
to extract the numbers
behind this plot so the plot can be customised by hand.
msmhist(infsim, "state", "months", "subject", nbins=30) msmhist(infsim2, "state", "months", "subject", nbins=6)
msmhist(infsim, "state", "months", "subject", nbins=30) msmhist(infsim2, "state", "months", "subject", nbins=6)
msmhist
Estimate state occupation probabilities to be illustrated by a bar
plot in msmhist
msmhist_bardata( data, state, time, subject, nbins, absorbing = NULL, censtimes = NULL )
msmhist_bardata( data, state, time, subject, nbins, absorbing = NULL, censtimes = NULL )
data |
Data frame giving the observed data. |
state |
Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels. |
time |
Character string naming the observation time variable in the data. |
subject |
Character string naming the individual ID variable in the data. |
nbins |
Number of time intervals to bin the state observations into. The underlying distribution of states illustrated by the plot will be assumed constant within each interval. |
absorbing |
Indices of any absorbing states. Individuals are assumed to stay in their absorbing state, and contribute one observation to each bin after their absorption time. By default, no states are assumed to be absorbing. |
censtimes |
Vector of maximum intended follow-up times for the people in the data who entered absorbing states. This supposes that had the person not entered the absorbing state, they would not have been observed after this time. |
Data frame with one row per bin and state, and columns:
binid
: Integer ID for bin
binlabel
: Character label for bin, with time interval
state
: State
binstart
, binend
: Start and end time of the bin (numeric)
props
: estimates of state $s$ occupancy proportions $p(s)$ for each bin
cumpstart
, cumpend
: Cumulative sum of props
over the set of
states, where cumpstart
starts at 0, and cumpend
ends at
Intended for creating stacked bar plots with geom_rect
or
similar.
Constructor for a prior distribution in msmbayes
msmprior( par, mean = NULL, sd = NULL, median = NULL, lower = NULL, upper = NULL )
msmprior( par, mean = NULL, sd = NULL, median = NULL, lower = NULL, upper = NULL )
par |
Character string indicating the model parameter to place the prior on This should start with one of the following:
Then for transition intensities, it should two include indices
indicating the transition, e.g. For covariate effects, the covariate name is supplied alongside the
transition indices, e.g. For factor covariates, this should include the level,
e.g. The indices or the covariate name can be omitted to indicate that
the same prior will used for all transitions, or/and all
covariates. This can be done with or without the brackets, e.g.
|
mean |
Prior mean (only used for logq or loghr) |
sd |
Prior standard deviation (only used for logq or loghr) |
median |
Prior median |
lower |
Prior lower 95% quantile |
upper |
Prior upper 95% quantile |
In msmbayes
, a normal prior is used for the log
transition intensities (logq
) and log hazard ratios (loghr
).
The goal of this function is to determine the mean and SD of this
prior. It can be used in two ways:
(a) directly specifying the prior mean and SD of logq or
loghr'
(b) specifying prior quantiles for more interpretable
transformations of these. These may include q
(the transition
intensity) or time
(the reciprocal of the intensity,
interpreted as a mean time to this transition when observing a
sequence of individuals at risk of it). Or hr
(hazard ratio2)
Two quantiles out of the median, lower or upper should be provided.
If three are provided, then the upper quantile is ignored. These
are transformed back to the scale of logq
or loghr
, and the
unique normal prior with these quantiles is deduced.
A list of class "msmprior"
, with components
par
(as supplied by the user)
par_base
(either "logq"
or "loghr"
)
covname
(name of covariate effect)
ind1
, ind2
(as supplied by the user)
mean
(of log-normal prior on par_base
)
sd
(of log-normal prior on par_base
)
priors <- list( msmprior("logq(1,2)", median=-2, lower=-4), msmprior("q(2,1)", median=0.1, upper=10) ) Q <- rbind(c(0,1),c(1,0)) mod <- msmbayes(data=infsim2, state="state", time="months", subject="subject", Q=Q, priors=priors, fit_method="optimize") summary(mod)
priors <- list( msmprior("logq(1,2)", median=-2, lower=-4), msmprior("q(2,1)", median=0.1, upper=10) ) Q <- rbind(c(0,1),c(1,0)) mod <- msmbayes(data=infsim2, state="state", time="months", subject="subject", Q=Q, priors=priors, fit_method="optimize") summary(mod)
Density, probability distribution, hazard and random number generation functions for the Coxian phase-type distribution with any number of phases.
dnphase(x, prate, arate, method = "analytic") pnphase(x, prate, arate, method = "analytic") hnphase(x, prate, arate, method = "analytic") rnphase(n, prate, arate)
dnphase(x, prate, arate, method = "analytic") pnphase(x, prate, arate, method = "analytic") hnphase(x, prate, arate, method = "analytic") rnphase(n, prate, arate)
x |
Value at which to evaluate the PDF, CDF, or hazard. |
prate |
Progression rates. Either a vector of length
|
arate |
Absorption rates. Either a vector of length |
method |
If |
n |
Number of random samples to generate. |
The number of phases, nphase
, is taken from the
dimensions of the object supplied as arate
. If arate
is a
vector, then the number of phases is assumed to equal the length
of this vector. If arate
is a matrix, then the number of
phases is assumed to be the number of columns.
These functions work in a vectorised way, so that alternative
parameter values or evaluation values x
can be supplied. The
number of alternative values is determined from the number of rows
nrep
of arate
. Then if necessary, prate
and x
are
replicated to match the size of arate
.
A vector of length n
or length(x)
.
Transition probability matrix from an msmbayes model
pmatrix(draws, t = 1, new_data = NULL, X = NULL, drop = TRUE)
pmatrix(draws, t = 1, new_data = NULL, X = NULL, drop = TRUE)
draws |
Object returned by |
t |
prediction time or vector of prediction times |
new_data |
Data frame with covariate values to predict for |
X |
Lower-level alternative to specifying |
drop |
Only used if there are no covariates supplied in
|
Array or matrix of rvar
objects giving the transition probability matrix at each requested prediction time and covariate value. See qdf
for notes on the rvar
format.
pmatrixdf
returns the same information in a tidy
data frame format.
Transition probabilities from an msmbayes model, presented as a tidy data frame
pmatrixdf(draws, t = 1, new_data = NULL)
pmatrixdf(draws, t = 1, new_data = NULL)
draws |
Object returned by |
t |
prediction time or vector of prediction times |
new_data |
Data frame with covariate values to predict for |
A data frame containing samples from the posterior distribution.
See qdf
for notes on this format and how to summarise.
Transition intensities from an msmbayes model, presented as a tidy data frame
qdf(draws, new_data = NULL)
qdf(draws, new_data = NULL)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
A data frame with one row per from-state / to-state / covariate value.
Column value
is in the rvar
format of the
posterior package, representing a sample from a posterior
distribution. Use the summary
function on the data frame to
produce summary statistics such as the posterior median or mean (see
summary.msmbayes
).
qmatrix
returns the same information in matrix format
qdf(infsim_model) summary(qdf(infsim_model)) summary(qdf(infsim_model), median, ~quantile(.x, c(0.025, 0.975))) qdf(infsim_modelc, new_data = data.frame(sex=c("female","male")))
qdf(infsim_model) summary(qdf(infsim_model)) summary(qdf(infsim_model), median, ~quantile(.x, c(0.025, 0.975))) qdf(infsim_modelc, new_data = data.frame(sex=c("female","male")))
Transition intensity matrix from an msmbayes model
qmatrix(draws, new_data = NULL, X = NULL, drop = TRUE)
qmatrix(draws, new_data = NULL, X = NULL, drop = TRUE)
draws |
Object returned by |
new_data |
Data frame with covariate values to predict for |
X |
Lower-level alternative to specifying |
drop |
Only used if there are no covariates supplied in
|
An array or matrix of rvar
objects containing the
transition intensity matrix for each new prediction data point
qdf
returns the same information in a
tidy data frame format
qmatrix(infsim_model) summary(qmatrix(infsim_model)) summary(qmatrix(infsim_model), median, ~quantile(.x, c(0.025, 0.975)))
qmatrix(infsim_model) summary(qmatrix(infsim_model)) summary(qmatrix(infsim_model), median, ~quantile(.x, c(0.025, 0.975)))
Sojourn probability in a state of a msmbayes model
soj_prob(draws, t, state, new_data = NULL, method = "analytic")
soj_prob(draws, t, state, new_data = NULL, method = "analytic")
draws |
Object returned by |
t |
Time since state entry |
state |
State of interest (integer) |
new_data |
Data frame with covariate values to predict for |
method |
If |
A data frame with column value
giving the probability of
remaining in state
by time t
since state entry, as an rvar
object. Other columns give the time and any covariate values.
See qdf
for notes on the rvar
format.
Standardised outputs are outputs from models with covariates, that are defined by marginalising (averaging) over covariate values in a given population, rather than being conditional on a given covariate value.
standardise_to(new_data) standardize_to(new_data)
standardise_to(new_data) standardize_to(new_data)
new_data |
Data frame with covariate values to predict for |
Standardised outputs are produced from a Monte Carlo sample from
the joint distribution of parameters and covariate
values
,
, where
is defined by the empirical distribution of covariates
in the standard population. This joint sample is obtained by
concatenating samples of covariate-specific outputs.
Hence applying an output function (such as the transition
probability) to this sample produces a sample from the posterior of
: the average transition probability (say)
for a heterogeneous population.
nd <- data.frame(sex=c("female","male")) ## gender-specific outputs qdf(infsim_modelc, new_data = nd) ## averaged over men and women in the same proportions as are in `nd` ## in this case, `nd` has two rows, so we take a 50/50 average qdf(infsim_modelc, new_data = standardise_to(nd))
nd <- data.frame(sex=c("female","male")) ## gender-specific outputs qdf(infsim_modelc, new_data = nd) ## averaged over men and women in the same proportions as are in `nd` ## in this case, `nd` has two rows, so we take a 50/50 average qdf(infsim_modelc, new_data = standardise_to(nd))
Summarise basic parameter estimates from an msmbayes model
## S3 method for class 'msmbayes' summary(object, log = FALSE, time = FALSE, ...)
## S3 method for class 'msmbayes' summary(object, log = FALSE, time = FALSE, ...)
object |
Object returned by |
log |
Present log transition intensities and log hazard ratios, rather than transition intensities and hazard ratios. |
time |
Present inverse transition intensities (i.e. mean times to events) |
... |
A data frame with one row for each basic model parameter,
plus rows for the mean sojourn times. The posterior distribution
for the parameter is encoded in the column value
, which
has the rvar
data type defined by the posterior
package. This distribution can be summarised in any way by
calling summary
again on the data frame (see the
examples).
A string summarising a sample from the prior distribution, as a
median and 95% equal-tailed credible interval, is given in the
prior
column.
Transition intensities, or transformations of transition intensities, are those for covariate values of zero.
Remaining parameters (in non-HMMs) are log hazard ratios for covariate effects.
qdf
, hr
, loghr
,
posterior::summarise_draws
summary(infsim_model) summary(summary(infsim_model)) summary(summary(infsim_model), median, ~quantile(.x, 0.025, 0.975))
summary(infsim_model) summary(summary(infsim_model)) summary(summary(infsim_model), median, ~quantile(.x, 0.025, 0.975))
See msm::totlos.msm()
for the theory behind the method used to
calculate this. The analytic formula is used, not numerical integration.
totlos(draws, t, new_data = NULL, fromt = 0, pstart = NULL, discount = 0)
totlos(draws, t, new_data = NULL, fromt = 0, pstart = NULL, discount = 0)
draws |
Object returned by |
t |
End point of the time interval over which to measure length of stay in each state |
new_data |
Data frame with covariate values to predict for |
fromt |
Starting point of the time interval, by default 0 |
pstart |
Vector giving distribution of states at time 0 |
discount |
Discount rate in continuous time |
Data frame with one row for each state and covariate value, giving the expected amount of time spent in that state over the forecast interval.