Package 'msmbayes'

Title: Bayesian Multi-State Models for Intermittently-Observed Data
Description: Bayesian multi-state models for intermittently-observed data. Markov and phase-type semi-Markov models, and misclassification hidden Markov models.
Authors: Christopher Jackson [aut, cre]
Maintainer: Christopher Jackson <[email protected]>
License: GPL (>= 3)
Version: 0.2
Built: 2025-02-16 05:37:42 UTC
Source: https://github.com/chjackson/msmbayes

Help Index


The 'msmbayes' package for Bayesian multi-state modelling of intermittently-observed data

Description

For an introduction to and overview of the msmbayes package, and full documentation, see

http://chjackson.github.io/msmbayes.

For more resources on multi-state modelling, see the msm package and its documentation.

Author(s)

Maintainer: Christopher Jackson [email protected] (ORCID)

See Also

Useful links:


A simulated multistate dataset with lots of observations and covariates

Description

A simulated multistate dataset with lots of observations and covariates

Usage

bigdat

Format

See infsim/bigdata.R in the source for simulation settings.


Misclassification probabilities from an msmbayes model

Description

Misclassification probabilities from an msmbayes model

Usage

edf(draws)

Arguments

draws

Object returned by msmbayes.

Value

A data frame with one row per misclassification probability. from indicates the true state, and to indicates the observed state.

See Also

qdf for more information about the format.


Example fitted model objects used for testing msmbayes

Description

Example fitted model objects used for testing msmbayes

Usage

infsim_model

infsim_modelc

infsim_modelp

infsim_modelpc

cav_misc

Format

An object of class msmbayes, obtained by fitting a Markov model to the dataset infsim. See data-raw/infsim.R in the source for the model specification code

infsim_modelc includes covariates on the transition intensities.

infsim_modelp is a phase-type model.

infsim_modelpc is a phase-type model with covariates.

cav_misc is the misclassification model fitted to the CAV data from msm in the "Advanced multi-state models in msmbayes" vignette. intensities.

Source

Simulated


Hazard ratios for covariates on transition intensities

Description

Hazard ratios for covariates on transition intensities

Usage

hr(draws)

Arguments

draws

Object returned by msmbayes.

Value

A data frame containing samples from the posterior distribution. See qdf for notes on this format and how to summarise.

See Also

loghr


Simulated infection testing data

Description

Simulated infection testing data

Usage

infsim

infsim2

Format

infsim has 3600 rows, with 36 state observations for each of 100 people. Columns are:

  • subject Subject identifier

  • days Observation time (in days)

  • months Observation time (in moths)

  • state State simulated from a Markov model with no covariates

  • sex: "male" or "female".

  • age10: Age, in units of 10 years since age 50

  • statec: State simulated from a Markov model with covariates

  • statep: State simulated from a phase-type model (unused in any examples. See data-raw/infsim.R in the source for simulation settings)

  • statepc: State simuated from a phase-type model with covariates (unused in any examples)

A smaller dataset infsim2 has only 360 rows, from 20 people, and is simulated using a sojourn time of 60 days in the test-negative state and 10 days in test-positive.

An object of class data.frame with 360 rows and 14 columns.

Source

Simulated


Log hazard ratios for covariates on transition intensities

Description

Log hazard ratios for covariates on transition intensities

Usage

loghr(draws)

Arguments

draws

Object returned by msmbayes.

Value

A data frame containing samples from the posterior distribution. See qdf for notes on this format and how to summarise.

See Also

hr


Mean sojourn times from an msmbayes model

Description

Mean sojourn times from an msmbayes model

Usage

mean_sojourn(draws, new_data = NULL, by_phase = TRUE)

Arguments

draws

Object returned by msmbayes.

new_data

Data frame with covariate values to predict for

by_phase

For states with phase type distributions:

If TRUE then one mean sojourn time per phase is returned.

If FALSE, then one overall mean sojourn time for the state is returned. This is not generally equal to the sum of the phase-specific mean sojourn times, because an individual may transition out of the state before progressing to the next phase.

Value

A data frame containing samples from the posterior distribution. See qdf for notes on this format and how to summarise.


Bayesian multi-state models for intermittently-observed data

Description

Fit a multi-state model to longitudinal data consisting of intermittent observation of a discrete state. Bayesian estimation is used, via the Stan software.

Usage

msmbayes(
  data,
  state,
  time,
  subject,
  Q,
  E = NULL,
  covariates = NULL,
  nphase = NULL,
  priors = NULL,
  soj_priordata = NULL,
  fit_method = "sample",
  keep_data = FALSE,
  ...
)

Arguments

data

Data frame giving the observed data.

state

Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels.

time

Character string naming the observation time variable in the data.

subject

Character string naming the individual ID variable in the data.

Q

Matrix indicating the transition structure. A zero entry indicates that instantaneous transitions from (row) to (column) are disallowed. An entry of 1 (or any other positive value) indicates that the instantaneous transition is allowed. The diagonal of Q is ignored.

There is no need to "guess" initial values and put them here, as is sometimes done in msm. Initial values for fitting are determined by Stan from the prior distributions, and the specific values supplied for positive entries of Q are disregarded.

E

If NULL a non-hidden Markov model is fitted. If non-NULL this should be a matrix indicating the structure of allowed misclassifications, where rows are the true states, and columns are the observed states. A zero (r,s)(r,s) entry indicates that true state rr cannot be observed as observed state ss. A non-zero (r,s)(r,s) entry indicates an initial value for a permitted misclassification probability. The diagonal of E is ignored.

covariates

Specification of covariates on transition intensities. This should be a list of formulae. Each formula should have a left-hand side that looks like Q(r,s), and a right hand side defining the regression model for the log of the transition intensity from state rr to state ss.

For example,

covariates = list(Q(1,2) ~ age + sex, Q(2,1) ~ age)

specifies that the log of the 1-2 transition intensity is an additive linear function of age and sex, and the log 2-1 transition intensity is a linear function of age. You do not have to list all of the intensities here if some of them are not influenced by covariates.

nphase

For phase-type models, this is a vector with one element per state, giving the number of phases per state. This element is 1 for states that do not have phase-type sojourn distributions. Not required for non-phase-type models.

priors

A list specifying priors. Each component should be the result of a call to msmprior. Any parameters with priors not specified here are given default priors (normal with mean -2 and SD 2 for log intensities, and normal with mean 0 and SD 10 for log hazard ratios).

If only one parameter is given a non-default prior, a single msmprior call can be supplied here instead of a list.

soj_priordata

Synthetic data that represents prior information about the mean sojourn time. Experimental feature, currently undocumented.

fit_method

Quoted name of a function from the cmdstanr package specifying the algorithm to fit the model. The default "sample" uses MCMC, via cmdstanr::sample(). Alternatives are cmdstanr::optimize(), cmdstanr::pathfinder(), cmdstanr::laplace() or cmdstanr::variational().

keep_data

Store a copy of the cleaned data in the returned object. FALSE by default.

...

Other arguments to be passed to the function from cmdstanr that fits the model.

Value

A data frame in the draws format of the posterior package, containing draws from the posterior of the model parameters.

Attributes are added to give information about the model structure, and a class "msmbayes" is appended.

See, e.g. summary.msmbayes, qdf, hr, and similar functions, to extract parameter estimates from the fitted model.


Illustrate the empirical distribution of states against time in intermittently-observed multistate data

Description

This works similarly to a histogram. The state observations are binned into time intervals with roughly equal numbers of observations. Within each bin, the probability p(s)p(s) that an observation comes from each state ss is estimated.

Usage

msmhist(
  data,
  state,
  time,
  subject,
  nbins,
  absorbing = NULL,
  censtimes = NULL,
  stacked = TRUE
)

Arguments

data

Data frame giving the observed data.

state

Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels.

time

Character string naming the observation time variable in the data.

subject

Character string naming the individual ID variable in the data.

nbins

Number of time intervals to bin the state observations into. The underlying distribution of states illustrated by the plot will be assumed constant within each interval.

absorbing

Indices of any absorbing states. Individuals are assumed to stay in their absorbing state, and contribute one observation to each bin after their absorption time. By default, no states are assumed to be absorbing.

censtimes

Vector of maximum intended follow-up times for the people in the data who entered absorbing states. This supposes that had the person not entered the absorbing state, they would not have been observed after this time.

stacked

If TRUE do a bar chart with the probabilities for different states stacked on top of each other, so the y-axis spans 0 to 1 exactly. This is more compact.

If FALSE, plot one panel per state, as is done in prevalence.msm. This is more convenient for constructing a check of the model fit.

Details

If each subject has at most one observation in a bin, then p(s)p(s) is estimated as the proportion of observations in the bin that are of that state.

More generally, if an individual has more than one observation in the bin, p(s)p(s) is estimated as follows. For each observed individual ii and each state ss, we define a variable p(i,s)p(i,s) equal to the proportion of individual ii's observations that are of state ss. For example, in a three-state model, where a person has two observations in a bin, and these are states 2 and 3, then p(i,s)=0,0.5,0.5p(i,s) = 0, 0.5, 0.5 for states 1, 2 and 3 respectively. The bin-specific estimate of p(s)p(s) is then the average of p(i,s)p(i,s) over individuals ss who have at least one observation in that bin.

The results are visualised as a stacked bar plot. The individual observations of states are represented as points placed at random y positions within each state-specific bar.

This is intended as an alternative to the "observed prevalences" plot in the function prevalence.msm from the msm package, with a clearer connection to the data. It can be overlaid with predictions of transition probabilities from a msmbayes or msm model, to check the fit of the model.

The method used by "observed prevalences" plots places a strong assumption on the individual data, that individuals stay in the same state between observations, or transition at the midpoint between observations.

msmhist places no assumption on the individual data. Instead the assumption is placed on the distribution underlying the data. The histogram-like visualisation assumes, essentially, that the distribution of states is the same at all times within each bin.

Value

A ggplot2 plot object.

See Also

msmhist_bardata to extract the numbers behind this plot so the plot can be customised by hand.

Examples

msmhist(infsim, "state", "months", "subject", nbins=30)
msmhist(infsim2, "state", "months", "subject", nbins=6)

Estimate state occupation probabilities to be illustrated by a bar plot in msmhist

Description

Estimate state occupation probabilities to be illustrated by a bar plot in msmhist

Usage

msmhist_bardata(
  data,
  state,
  time,
  subject,
  nbins,
  absorbing = NULL,
  censtimes = NULL
)

Arguments

data

Data frame giving the observed data.

state

Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels.

time

Character string naming the observation time variable in the data.

subject

Character string naming the individual ID variable in the data.

nbins

Number of time intervals to bin the state observations into. The underlying distribution of states illustrated by the plot will be assumed constant within each interval.

absorbing

Indices of any absorbing states. Individuals are assumed to stay in their absorbing state, and contribute one observation to each bin after their absorption time. By default, no states are assumed to be absorbing.

censtimes

Vector of maximum intended follow-up times for the people in the data who entered absorbing states. This supposes that had the person not entered the absorbing state, they would not have been observed after this time.

Value

Data frame with one row per bin and state, and columns:

  • binid: Integer ID for bin

  • binlabel: Character label for bin, with time interval

  • state: State

  • binstart, binend: Start and end time of the bin (numeric)

  • props: estimates of state $s$ occupancy proportions $p(s)$ for each bin

  • cumpstart, cumpend: Cumulative sum of props over the set of states, where cumpstart starts at 0, and cumpend ends at

    1. Intended for creating stacked bar plots with geom_rect or similar.

See Also

msmhist


Constructor for a prior distribution in msmbayes

Description

Constructor for a prior distribution in msmbayes

Usage

msmprior(
  par,
  mean = NULL,
  sd = NULL,
  median = NULL,
  lower = NULL,
  upper = NULL
)

Arguments

par

Character string indicating the model parameter to place the prior on

This should start with one of the following:

"logq". Log transition intensity.

"q", Transition intensity

"time". Defined as 1/q. This can be interpreted as the mean time to the next transition to state $s$ for people in state $r$ (from the point of view of someone observing one person at a time, and switching to observing a different person if a competing transition happens).

"loghr". Log hazard ratio

"hr". Hazard ratio

Then for transition intensities, it should two include indices indicating the transition, e.g. "logq(2,3)" for the log transition intensity from state 2 to state 3.

For covariate effects, the covariate name is supplied alongside the transition indices, e.g. "loghr(age,2,3)" for the effect of age on the log hazard ratio of transitioning from state 2 to state 3.

For factor covariates, this should include the level, e.g. "loghr(sexMALE,2,3)" for level "MALE" of factor "sex".

The indices or the covariate name can be omitted to indicate that the same prior will used for all transitions, or/and all covariates. This can be done with or without the brackets, e.g. "logq()" or "logq" are both understood.

mean

Prior mean (only used for logq or loghr)

sd

Prior standard deviation (only used for logq or loghr)

median

Prior median

lower

Prior lower 95% quantile

upper

Prior upper 95% quantile

Details

In msmbayes, a normal prior is used for the log transition intensities (logq) and log hazard ratios (loghr). The goal of this function is to determine the mean and SD of this prior. It can be used in two ways:

(a) directly specifying the prior mean and SD of ⁠logq or ⁠loghr'

(b) specifying prior quantiles for more interpretable transformations of these. These may include q (the transition intensity) or time (the reciprocal of the intensity, interpreted as a mean time to this transition when observing a sequence of individuals at risk of it). Or hr (hazard ratio2)

Two quantiles out of the median, lower or upper should be provided. If three are provided, then the upper quantile is ignored. These are transformed back to the scale of logq or loghr, and the unique normal prior with these quantiles is deduced.

Value

A list of class "msmprior", with components

par (as supplied by the user)

par_base (either "logq" or "loghr")

covname (name of covariate effect)

ind1, ind2 (as supplied by the user)

mean (of log-normal prior on par_base)

sd (of log-normal prior on par_base)

Examples

priors <- list(
   msmprior("logq(1,2)", median=-2, lower=-4),
   msmprior("q(2,1)",    median=0.1, upper=10)
)
Q <- rbind(c(0,1),c(1,0))
mod <- msmbayes(data=infsim2, state="state", time="months", subject="subject",
                Q=Q,  priors=priors, fit_method="optimize")
summary(mod)

Density, probability distribution, hazard and random number generation functions for the Coxian phase-type distribution with any number of phases.

Description

Density, probability distribution, hazard and random number generation functions for the Coxian phase-type distribution with any number of phases.

Usage

dnphase(x, prate, arate, method = "analytic")

pnphase(x, prate, arate, method = "analytic")

hnphase(x, prate, arate, method = "analytic")

rnphase(n, prate, arate)

Arguments

x

Value at which to evaluate the PDF, CDF, or hazard.

prate

Progression rates. Either a vector of length nphase-1, or a matrix with npar rows and nphase-1 columns.

arate

Absorption rates. Either a vector of length nphase, or a matrix with npar rows and nphase columns.

method

If "analytic" then for nphase 5 or less, an analytic solution to the matrix exponential is employed in the calculation. For nphase 6 or more, or if method="mexp" the matrix exponential is determined using numerical methods, via expm::expm().

n

Number of random samples to generate.

Details

The number of phases, nphase, is taken from the dimensions of the object supplied as arate. If arate is a vector, then the number of phases is assumed to equal the length of this vector. If arate is a matrix, then the number of phases is assumed to be the number of columns.

These functions work in a vectorised way, so that alternative parameter values or evaluation values x can be supplied. The number of alternative values is determined from the number of rows nrep of arate. Then if necessary, prate and x are replicated to match the size of arate.

Value

A vector of length n or length(x).


Transition probability matrix from an msmbayes model

Description

Transition probability matrix from an msmbayes model

Usage

pmatrix(draws, t = 1, new_data = NULL, X = NULL, drop = TRUE)

Arguments

draws

Object returned by msmbayes.

t

prediction time or vector of prediction times

new_data

Data frame with covariate values to predict for

X

Lower-level alternative to specifying new_data, for developer use only. X is a numeric matrix formed from column-binding the covariate design matrices for each transition in turn.

drop

Only used if there are no covariates supplied in new_data. Then if drop=TRUE this returns a nstates x nstates matrix, or if drop=FALSE this returns a 3D array with first dimension ncovs=1.

Value

Array or matrix of rvar objects giving the transition probability matrix at each requested prediction time and covariate value. See qdf for notes on the rvar format.

See Also

pmatrixdf returns the same information in a tidy data frame format.


Transition probabilities from an msmbayes model, presented as a tidy data frame

Description

Transition probabilities from an msmbayes model, presented as a tidy data frame

Usage

pmatrixdf(draws, t = 1, new_data = NULL)

Arguments

draws

Object returned by msmbayes.

t

prediction time or vector of prediction times

new_data

Data frame with covariate values to predict for

Value

A data frame containing samples from the posterior distribution. See qdf for notes on this format and how to summarise.


Transition intensities from an msmbayes model, presented as a tidy data frame

Description

Transition intensities from an msmbayes model, presented as a tidy data frame

Usage

qdf(draws, new_data = NULL)

Arguments

draws

Object returned by msmbayes.

new_data

Data frame with covariate values to predict for

Value

A data frame with one row per from-state / to-state / covariate value.

Column value is in the rvar format of the posterior package, representing a sample from a posterior distribution. Use the summary function on the data frame to produce summary statistics such as the posterior median or mean (see summary.msmbayes).

See Also

qmatrix returns the same information in matrix format

Examples

qdf(infsim_model)
summary(qdf(infsim_model))
summary(qdf(infsim_model), median, ~quantile(.x, c(0.025, 0.975)))

qdf(infsim_modelc,
    new_data = data.frame(sex=c("female","male")))

Transition intensity matrix from an msmbayes model

Description

Transition intensity matrix from an msmbayes model

Usage

qmatrix(draws, new_data = NULL, X = NULL, drop = TRUE)

Arguments

draws

Object returned by msmbayes.

new_data

Data frame with covariate values to predict for

X

Lower-level alternative to specifying new_data, for developer use only. X is a numeric matrix formed from column-binding the covariate design matrices for each transition in turn.

drop

Only used if there are no covariates supplied in new_data. Then if drop=TRUE this returns a nstates x nstates matrix, or if drop=FALSE this returns a 3D array with first dimension ncovs=1.

Value

An array or matrix of rvar objects containing the transition intensity matrix for each new prediction data point

See Also

qdf returns the same information in a tidy data frame format

Examples

qmatrix(infsim_model)
summary(qmatrix(infsim_model))
summary(qmatrix(infsim_model), median, ~quantile(.x, c(0.025, 0.975)))

Sojourn probability in a state of a msmbayes model

Description

Sojourn probability in a state of a msmbayes model

Usage

soj_prob(draws, t, state, new_data = NULL, method = "analytic")

Arguments

draws

Object returned by msmbayes.

t

Time since state entry

state

State of interest (integer)

new_data

Data frame with covariate values to predict for

method

If "analytic" then for nphase 5 or less, an analytic solution to the matrix exponential is employed in the calculation. For nphase 6 or more, or if method="mexp" the matrix exponential is determined using numerical methods, via expm::expm().

Value

A data frame with column value giving the probability of remaining in state by time t since state entry, as an rvar object. Other columns give the time and any covariate values.

See qdf for notes on the rvar format.


Constructor for a standardising population used for model outputs

Description

Standardised outputs are outputs from models with covariates, that are defined by marginalising (averaging) over covariate values in a given population, rather than being conditional on a given covariate value.

Usage

standardise_to(new_data)

standardize_to(new_data)

Arguments

new_data

Data frame with covariate values to predict for

Details

Standardised outputs are produced from a Monte Carlo sample from the joint distribution of parameters θ\theta and covariate values XX, p(X,θ)=p(θX)p(X)p(X,\theta) = p(\theta|X)p(X), where p(X)p(X) is defined by the empirical distribution of covariates in the standard population. This joint sample is obtained by concatenating samples of covariate-specific outputs.

Hence applying an output function g()g() (such as the transition probability) to this sample produces a sample from the posterior of g(θX)dX\int g(\theta|X) dX: the average transition probability (say) for a heterogeneous population.

Examples

nd <- data.frame(sex=c("female","male"))

## gender-specific outputs
qdf(infsim_modelc, new_data = nd)

## averaged over men and women in the same proportions as are in `nd`
## in this case, `nd` has two rows, so we take a 50/50 average
qdf(infsim_modelc, new_data = standardise_to(nd))

Summarise basic parameter estimates from an msmbayes model

Description

Summarise basic parameter estimates from an msmbayes model

Usage

## S3 method for class 'msmbayes'
summary(object, log = FALSE, time = FALSE, ...)

Arguments

object

Object returned by msmbayes.

log

Present log transition intensities and log hazard ratios, rather than transition intensities and hazard ratios.

time

Present inverse transition intensities (i.e. mean times to events)

...

Further arguments passed to both qdf and loghr.

Value

A data frame with one row for each basic model parameter, plus rows for the mean sojourn times. The posterior distribution for the parameter is encoded in the column value, which has the rvar data type defined by the posterior package. This distribution can be summarised in any way by calling summary again on the data frame (see the examples).

A string summarising a sample from the prior distribution, as a median and 95% equal-tailed credible interval, is given in the prior column.

Transition intensities, or transformations of transition intensities, are those for covariate values of zero.

Remaining parameters (in non-HMMs) are log hazard ratios for covariate effects.

See Also

qdf, hr, loghr, posterior::summarise_draws

Examples

summary(infsim_model)
summary(summary(infsim_model))
summary(summary(infsim_model), median, ~quantile(.x, 0.025, 0.975))

Total length of stay in each state over an interval

Description

See msm::totlos.msm() for the theory behind the method used to calculate this. The analytic formula is used, not numerical integration.

Usage

totlos(draws, t, new_data = NULL, fromt = 0, pstart = NULL, discount = 0)

Arguments

draws

Object returned by msmbayes.

t

End point of the time interval over which to measure length of stay in each state

new_data

Data frame with covariate values to predict for

fromt

Starting point of the time interval, by default 0

pstart

Vector giving distribution of states at time 0

discount

Discount rate in continuous time

Value

Data frame with one row for each state and covariate value, giving the expected amount of time spent in that state over the forecast interval.