Title: | Flexible Parametric Survival and Multi-State Models |
---|---|
Description: | Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models, based on either cause-specific hazards or mixture models. |
Authors: | Christopher Jackson [aut, cre], Paul Metcalfe [ctb], Jordan Amdahl [ctb], Matthew T. Warkentin [ctb], Michael Sweeting [ctb], Kevin Kunzmann [ctb] |
Maintainer: | Christopher Jackson <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.3.2 |
Built: | 2025-01-17 18:29:04 UTC |
Source: | https://github.com/chjackson/flexsurv-dev |
flexsurv: Flexible parametric models for time-to-event data, including the generalized gamma, the generalized F and the Royston-Parmar spline model, and extensible to user-defined distributions.
flexsurvreg
fits parametric models for time-to-event
(survival) data. Data may be right-censored, and/or left-censored, and/or
left-truncated. Several built-in parametric distributions are available.
Any user-defined parametric model can also be employed by supplying a list
with basic information about the distribution, including the density or
hazard and ideally also the cumulative distribution or hazard.
Covariates can be included using a linear model on any parameter of the distribution, log-transformed to the real line if necessary. This typically defines an accelerated failure time or proportional hazards model, depending on the distribution and parameter.
flexsurvspline
fits the flexible survival model of Royston
and Parmar (2002) in which the log cumulative hazard is modelled as a
natural cubic spline function of log time. Covariates can be included on
any of the spline parameters, giving either a proportional hazards model or
an arbitrarily-flexible time-dependent effect. Alternative proportional
odds or probit parameterisations are available.
Output from the models can be presented as survivor, cumulative hazard and
hazard functions (summary.flexsurvreg
). These can be plotted
against nonparametric estimates (plot.flexsurvreg
) to assess
goodness-of-fit. Any other user-defined function of the parameters may be
summarised in the same way.
Multi-state models for time-to-event data can also be fitted with the same
functions. Predictions from those models can then be made using the
functions pmatrix.fs
, pmatrix.simfs
,
totlos.fs
, totlos.simfs
, or
sim.fmsm
, or alternatively by msfit.flexsurvreg
followed by mssample
or probtrans
from the package
mstate.
Distribution (“dpqr”) functions for the generalized gamma and F
distributions are given in GenGamma
, GenF
(preferred parameterisations) and GenGamma.orig
,
GenF.orig
(original parameterisations).
flexsurv
also includes the standard Gompertz distribution
with unrestricted shape parameter, see Gompertz
.
The flexsurv user guide vignette explains the methods in detail, and gives several worked examples. A further vignette flexsurv-examples gives a few more complicated examples, and users are encouraged to submit their own.
Christopher Jackson [email protected]
Jackson, C. (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08
Royston, P. and Parmar, M. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(1):2175-2197.
Cox, C. (2008). The generalized distribution: An umbrella for
parametric survival analysis. Statistics in Medicine 27:4301-4312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007). Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:4252-4374
Useful links:
Report bugs at https://github.com/chjackson/flexsurv/issues
Defined as the sum of the AICs of the transition-specific models.
## S3 method for class 'fmsm' AIC(object, ..., k = 2)
## S3 method for class 'fmsm' AIC(object, ..., k = 2)
object |
Object returned by |
... |
Further arguments (currently unused). |
k |
Penalty applied to number of parameters (defaults to the standard 2). |
The sum of the AICs of the transition-specific models.
Generic function for the second-order Akaike information criterion.
The only current implementation of this in flexsurv is
in AICc.flexsurvreg
, see there for more details.
AICc(object, ...) AICC(object, ...)
AICc(object, ...) AICC(object, ...)
object |
Fitted model object. |
... |
Other arguments (currently unused). |
This can be spelt either as AICC
or AICc
.
Second-order or "corrected" Akaike information criterion, often
known as AICc (see, e.g. Burnham and Anderson 2002). This is
defined as -2 log-likelihood + (2*p*n)/(n - p -1)
, whereas
the standard AIC is defined as -2 log-likelihood + 2*p
,
where p
is the number of parameters and n
is the
sample size. The correction is intended to adjust AIC for
small-sample bias, hence it only makes a difference to the result
for small n
.
## S3 method for class 'flexsurvreg' AICc(object, cens = TRUE, ...) ## S3 method for class 'flexsurvreg' AICC(object, cens = TRUE, ...)
## S3 method for class 'flexsurvreg' AICc(object, cens = TRUE, ...) ## S3 method for class 'flexsurvreg' AICC(object, cens = TRUE, ...)
object |
Fitted model returned by |
cens |
Include censored observations in the sample size term
( |
... |
Other arguments (currently unused). |
This can be spelt either as AICC
or AICc
.
The second-order AIC of the fitted model.
Burnham, K. P., Anderson, D. R. (2002) Model Selection and Multimodel Inference: a practical information-theoretic approach. Second edition. Springer: New York.
BIC
, AIC
, BIC.flexsurvreg
, nobs.flexsurvreg
Given a fitted flexsurvmix model, return the Aalen-Johansen estimates of the
probability of occupying each state at a series of times covering the
observed data. State 1 represents not having experienced any of the
competing events, while state 2 and any further states correspond to having
experienced each of the competing events respectively. These estimates can
be compared with the fitted probabilities returned by
p_flexsurvmix
to check the fit of a flexsurvmix
model.
ajfit(x, newdata = NULL, tidy = TRUE)
ajfit(x, newdata = NULL, tidy = TRUE)
x |
Fitted model returned by |
newdata |
Data frame of alternative covariate values to check fit for. Only factor covariates are supported. |
tidy |
If |
This is only supported for models with no covariates or models containing only factor covariates.
For models with factor covariates, the Aalen-Johansen estimates are computed
for the subsets of the data defined in newdata
. If newdata
is
not supplied, then this function returns state occupancy probabilities for
all possible combinations of the factor levels.
The Aalen-Johansen estimates are computed using
survfit
from the survival
package (Therneau
2020).
Therneau T (2020). _A Package for Survival Analysis in R_. R package version 3.2-3, <URL: https://CRAN.R-project.org/package=survival>.
This computes Aalen-Johansen estimates of the probability of occupying each
state at a series of times, using ajfit
. The equivalent
estimates from the parametric model are then produced using
p_flexsurvmix
, and concatenated with the nonparametric
estimates to form a tidy data frame. This data frame can then simply be
plotted using ggplot
.
ajfit_flexsurvmix(x, maxt = NULL, startname = "Start", B = NULL)
ajfit_flexsurvmix(x, maxt = NULL, startname = "Start", B = NULL)
x |
Fitted model returned by |
maxt |
Maximum time to produce parametric estimates for. By default this is the maximum event time in the data, the maximum time we have nonparametric estimates for. |
startname |
Label to give the state corresponding to "no event happened
yet". By default this is |
B |
Number of simulation replications to use to calculate a confidence
interval for the parametric estimates in |
Computes both parametric and comparable Aalen-Johansen nonparametric estimates from a flexible parametric multi-state model, and returns them together in a tidy data frame. Only models with no covariates, or only factor covariates, are supported. If there are factor covariates, then the nonparametric estimates are computed for subgroups defined by combinations of the covariates. Another restriction of this function is that all transitions must have the same covariates on them.
ajfit_fmsm(x, maxt = NULL, newdata = NULL)
ajfit_fmsm(x, maxt = NULL, newdata = NULL)
x |
Object returned by |
maxt |
Maximum time to compute parametric estimates to. |
newdata |
Data frame defining the subgroups to consider. This must have a column for each covariate in the model. If omitted, then all potential subgroups defined by combinations of factor covariates are included. Continuous covariates are not supported. |
Tidy data frame containing both Aalen-Johansen and parametric estimates of state occupancy over time, and by subgroup if subgroups are included.
Augment accepts a model object and a dataset and adds information about each observation in the dataset. Most commonly, this includes predicted values in the .fitted
column, residuals in the .resid
column, and standard errors for the fitted values in a .se.fit
column. New columns always begin with a . prefix to avoid overwriting columns in the original dataset.
## S3 method for class 'flexsurvreg' augment( x, data = NULL, newdata = NULL, type.predict = "response", type.residuals = "response", ... )
## S3 method for class 'flexsurvreg' augment( x, data = NULL, newdata = NULL, type.predict = "response", type.residuals = "response", ... )
x |
Output from |
data |
A |
newdata |
A |
type.predict |
Character indicating type of prediction to use. Passed to the |
type.residuals |
Character indicating type of residuals to use. Passed to the type argument of |
... |
Additional arguments. Not currently used. |
If neither of data
or newdata
are specified, then model.frame(x)
will be used. It is worth noting that model.frame(x)
will include a Surv
object and not the original time-to-event variables used when fitting the flexsurvreg
object. If the original data is desired, specify data
.
A tibble
containing data
or newdata
and possible additional columns:
.fitted
Fitted values of model
.se.fit
Standard errors of fitted values
.resid
Residuals (not present if newdata
specified)
fit <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "exp") augment(fit, data = ovarian)
fit <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "exp") augment(fit, data = ovarian)
Compute a basis for a natural cubic spline, by default using the parameterisation described by Royston and Parmar (2002). Used for flexible parametric survival models.
basis(knots, x, spline = "rp")
basis(knots, x, spline = "rp")
knots |
Vector of knot locations in increasing order, including the boundary knots at the beginning and end. |
x |
Vector of ordinates to compute the basis for. |
spline |
|
The exact formula for the basis is given in flexsurvspline
.
A matrix with one row for each ordinate and one column for each knot.
basis
returns the basis, and dbasis
returns its derivative
with respect to x
.
fss
and dfss
are the same, but with the order of the
arguments swapped around for consistency with similar functions in other R
packages.
Christopher Jackson <[email protected]>
Royston, P. and Parmar, M. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(1):2175-2197.
Wang W, Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517.
Survival times of 686 patients with primary node positive breast cancer.
bc
bc
A data frame with 686 rows.
censrec |
(numeric) | 1=dead, 0=censored |
rectime |
(numeric) | Time of death or censoring in days |
group |
(numeric) |
Prognostic group: "Good" ,"Medium" or "Poor" , |
from a regression model developed by Sauerbrei and Royston (1999). | ||
German Breast Cancer Study Group, 1984-1989. Used as a reference
dataset for the spline-based survival model of Royston and Parmar (2002),
implemented here in flexsurvspline
. Originally provided with
the stpm
(Royston 2001, 2004) and stpm2
(Lambert 2009, 2010)
Stata modules.
Royston, P. and Parmar, M. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(1):2175-2197.
Sauerbrei, W. and Royston, P. (1999). Building multivariable prognostic and diagnostic models: transformation of the predictors using fractional polynomials. Journal of the Royal Statistical Society, Series A 162:71-94.
Bayesian Information Criterion (BIC) for comparison of flexsurvreg models
## S3 method for class 'flexsurvreg' BIC(object, cens = TRUE, ...)
## S3 method for class 'flexsurvreg' BIC(object, cens = TRUE, ...)
object |
Fitted model returned by |
cens |
Include censored observations in the sample size term
( |
... |
Other arguments (currently unused). |
There is no "official" definition of what the sample size should be for the use of BIC in censored survival analysis. BIC is based on an approximation to Bayesian model comparison using Bayes factors and an implicit vague prior with the same amount of information as one observation "unit". Informally, the sample size describes the number of "units" giving rise to a distinct piece of information (Kass and Raftery 1995). However censored observations provide less information than observed events, in principle. The default used here is the number of individuals, for consistency with more familiar kinds of statistical modelling. However if censoring is heavy, then the number of events may be a better represent the amount of information. Following these principles, the best approximation would be expected to be somewhere in between.
AIC and BIC are intended to measure different things. Briefly,
AIC measures predictive ability, whereas BIC is expected to choose
the true model from a set of models where one of them is the
truth. Therefore BIC chooses simpler models for all but the
tiniest sample sizes (,
). AIC might be preferred in the
typical situation where
"all models are wrong but some are useful". AIC also gives similar
results to cross-validation (Stone 1977).
The BIC of the fitted model. This is minus twice the log likelihood plus p*log(n)
, where
p
is the number of parameters and n
is the sample
size of the data. If weights
was supplied to
flexsurv
, the sample size is defined as the sum of the
weights.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773-795.
Stone, M. (1977). An asymptotic equivalence of choice of model by cross‐validation and Akaike's criterion. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 44-47.
BIC
, AIC
, AICC.flexsurvreg
, nobs.flexsurvreg
Calculate a confidence interval for a model output by repeatedly replacing the parameters in a fitted model object with a draw from the multivariate normal distribution of the maximum likelihood estimates, then recalculating the output function.
bootci.fmsm( x, B, fn, cl = 0.95, attrs = NULL, cores = NULL, sample = FALSE, ... )
bootci.fmsm( x, B, fn, cl = 0.95, attrs = NULL, cores = NULL, sample = FALSE, ... )
x |
Output from |
B |
Number of parameter draws to use |
fn |
Function to bootstrap the results of. It must have an argument named |
cl |
Width of symmetric confidence interval, by default 0.95 |
attrs |
Any attributes of the value returned from |
cores |
Number of cores to use for parallel processing. |
sample |
If |
... |
Additional arguments to pass to |
A matrix with two rows, giving the upper and lower confidence limits respectively. Each row is a vector of the same length as the unlisted result of the function corresponding to fncall
.
## How to use bootci.fmsm ## Write a function with one argument called x giving a fitted model, ## and returning some results of the model. The results may be in any form. tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) bexp <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans, data=bosms3, dist="exp") summfn <- function(x, t){ resp <- flexsurv::pmatrix.fs(x, trans=tmat, t=t) rest <- flexsurv::totlos.fs(x, trans=tmat, t=t) list(resp, rest) } ## Use bootci.fmsm to obtain the confidence interval ## The matrix columns are in the order of the unlisted results of the original ## summfn. You will have to rearrange them into the format that you want. ## If summfn has any extra arguments, in this case \code{t}, make sure they are ## passed through via the ... argument to bootci.fmsm bootci.fmsm(bexp, B=3, fn=summfn, t=10) bootci.fmsm(bexp, B=3, fn=summfn, t=5)
## How to use bootci.fmsm ## Write a function with one argument called x giving a fitted model, ## and returning some results of the model. The results may be in any form. tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) bexp <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans, data=bosms3, dist="exp") summfn <- function(x, t){ resp <- flexsurv::pmatrix.fs(x, trans=tmat, t=t) rest <- flexsurv::totlos.fs(x, trans=tmat, t=t) list(resp, rest) } ## Use bootci.fmsm to obtain the confidence interval ## The matrix columns are in the order of the unlisted results of the original ## summfn. You will have to rearrange them into the format that you want. ## If summfn has any extra arguments, in this case \code{t}, make sure they are ## passed through via the ... argument to bootci.fmsm bootci.fmsm(bexp, B=3, fn=summfn, t=10) bootci.fmsm(bexp, B=3, fn=summfn, t=5)
A dataset containing histories of bronchiolitis obliterans syndrome (BOS) from lung transplant recipients. BOS is a chronic decline in lung function, often observed after lung transplantation.
A data frame containing a sequence of observed or censored transitions to the next stage of severity or death. It is grouped by patient and includes histories of 204 patients. All patients start in state 1 (no BOS) at six months after transplant, and may subsequently develop BOS or die.
bosms3
contains the data for a three-state model: no BOS, BOS or
death. bosms4
uses a four-state representation: no BOS, mild BOS,
moderate/severe BOS or death.
id |
(numeric) | Patient identification number |
from |
(numeric) | Observed starting state of the transition |
to |
(numeric) | Observed or potential ending state of the transition |
Tstart |
(numeric) | Time at the start of the interval |
Tstop |
(numeric) | Time at the end of the interval |
time |
(numeric) | Time difference Tstart -Tstop |
status
|
(numeric) | 1 if the transition to state to was observed, or
0 if the transition to state to was censored (for example, if the
patient was observed to move to a competing state) |
trans |
(factor) | Number of the transition from -to in the set of
all ntrans allowed transitions, numbered from 1 to ntrans . |
The entry time of each patient into each stage of BOS was estimated by clinicians, based on their history of lung function measurements and acute rejection and infection episodes. BOS is only assumed to occur beyond six months after transplant. In the first six months the function of each patient's new lung stabilises. Subsequently BOS is diagnosed by comparing the lung function against the "baseline" value.
The same data are provided in the msm package, but in the native format of msm to allow Markov models to be fitted. In flexsurv, much more flexible models can be fitted.
Papworth Hospital, U.K.
Heng. D. et al. (1998). Bronchiolitis Obliterans Syndrome: Incidence, Natural History, Prognosis, and Risk Factors. Journal of Heart and Lung Transplantation 17(12)1255–1263.
Extract model coefficients from fitted flexible survival models. This presents all parameter estimates, transformed to the real line if necessary. For example, shape or scale parameters, which are constrained to be positive, are returned on the log scale.
## S3 method for class 'flexsurvreg' coef(object, ...)
## S3 method for class 'flexsurvreg' coef(object, ...)
object |
Output from |
... |
Further arguments passed to or from other methods. Currently unused. |
This matches the behaviour of coef.default
for standard R model
families such as glm
, where intercepts in regression
models are presented on the same scale as the covariate effects. Note that
any parameter in a distribution fitted by flexsurvreg
or
flexsurvreg
may be an intercept in a regression model.
This returns the mod$res.t[,"est"]
component of the fitted
model object mod
. See flexsurvreg
,
flexsurvspline
for full documentation of all components.
C. H. Jackson [email protected]
Cox-Snell residuals from a parametric survival model
coxsnell_flexsurvreg(x)
coxsnell_flexsurvreg(x)
x |
Object returned by |
A data frame with a column called est
giving the Cox-Snell residual, defined as the fitted cumulative hazard at each data point.
fitted cumulative hazard at the given observed data point, and other columns indicating the observation time,
observed event status, and covariate values defining the data at this point.
The cumulative hazards est
should form a censored sample from an Exponential(1).
Therefore to check the fit of the model, plot a nonparametric estimate of the cumulative
hazard curve against a diagonal line through the origin, which is the theoretical cumulative
hazard trajectory of the Exponential(1).
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") cs <- coxsnell_flexsurvreg(fitg) ## Model appears to fit well, with some small sample noise surv <- survfit(Surv(cs$est, ovarian$fustat) ~ 1) plot(surv, fun="cumhaz") abline(0, 1, col="red")
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") cs <- coxsnell_flexsurvreg(fitg) ## Model appears to fit well, with some small sample noise surv <- survfit(Surv(cs$est, ovarian$fustat) ~ 1) plot(surv, fun="cumhaz") abline(0, 1, col="red")
In a mixture model for competing events, an individual can experience one of a set of different events. We specify a model for the probability that they will experience each event before the others, and a model for the time to the event conditionally on that event occurring first.
flexsurvmix( formula, data, event, dists, pformula = NULL, anc = NULL, partial_events = NULL, initp = NULL, inits = NULL, fixedpars = NULL, dfns = NULL, method = "direct", em.control = NULL, optim.control = NULL, aux = NULL, sr.control = survreg.control(), integ.opts, hess.control = NULL, ... )
flexsurvmix( formula, data, event, dists, pformula = NULL, anc = NULL, partial_events = NULL, initp = NULL, inits = NULL, fixedpars = NULL, dfns = NULL, method = "direct", em.control = NULL, optim.control = NULL, aux = NULL, sr.control = survreg.control(), integ.opts, hess.control = NULL, ... )
formula |
Survival model formula. The left hand side is a Alternatively, A list of formulae may also be used to indicate that for particular individuals, different events may be observed in different ways, with different censoring mechanisms. Each list component specifies the data and censoring scheme for that mixture component. For example, suppose we are studying people admitted to hospital,and the competing states are death in hospital and discharge from hospital. At time t we know that a particular individual is still alive, but we do not know whether they are still in hospital, or have been discharged. In this case, if the individual were to die in hospital, their death time would be right censored at t. If the individual will be (or has been) discharged before death, their discharge time is completely unknown, thus interval-censored on (0,Inf). Therefore, we need to store different event time and status variables in the data for different alternative events. This is specified here as
where for this individual, The "dot" notation commonly used to indicate "all remaining variables" in a
formula is not supported in |
data |
Data frame containing variables mentioned in |
event |
Variable in the data that specifies which of the alternative
events is observed for which individual. If the individual's follow-up is
right-censored, or if the event is otherwise unknown, this variable must
have the value Ideally this should be a factor, since the mixture components can then be
easily identified in the results with a name instead of a number. If this
is not already a factor, it is coerced to one. Then the levels of the
factor define the required order for the components of the list arguments
|
dists |
Vector specifying the parametric distribution to use for each
component. The same distributions are supported as in
|
pformula |
Formula describing covariates to include on the component membership proabilities by multinomial logistic regression. The first component is treated as the baseline. The "dot" notation commonly used to indicate "all remaining variables" in a formula is not supported. |
anc |
List of component-specific lists, of length equal to the number of components. Each component-specific list is a list of formulae representing covariate effects on parameters of the distribution. If there are covariates for one component but not others, then a list
containing one null formula on the location parameter should be supplied
for the component with no covariates, e.g Covariates on the location parameter may also be supplied here instead of
in |
partial_events |
List specifying the factor levels of For example, suppose there are three alternative events called
|
initp |
Initial values for component membership probabilities. By default, these are assumed to be equal for each component. |
inits |
List of component-specific vectors. Each component-specific
vector contains the initial values for the parameters of the
component-specific model, as would be supplied as the |
fixedpars |
Indexes of parameters to fix at their initial values and
not optimise. Arranged in the order: baseline mixing probabilities,
covariates on mixing probabilities, time-to-event parameters by mixing
component. Within mixing components, time-to-event parameters are ordered
in the same way as in If Not currently supported when using the EM algorithm. |
dfns |
List of lists of user-defined distribution functions, one for
each mixture component. Each list component is specified as the
|
method |
Method for maximising the likelihood. Either |
em.control |
List of settings to control EM algorithm fitting. The only options currently available are
For example, |
optim.control |
List of options to pass as the |
aux |
A named list of other arguments to pass to custom distribution
functions. This is used, for example, by |
sr.control |
For the models which use |
integ.opts |
List of named arguments to pass to
|
hess.control |
List of options to control covariance matrix computation. Available options are:
The Hessian is positive definite, thus invertible, at the maximum
likelihood. If the Hessian computed after optimisation convergence can't
be inverted, this is either because the converged result is not the
maximum likelihood (e.g. it could be a "saddle point"), or because the
numerical methods used to obtain the Hessian were inaccurate. If you
suspect that the Hessian was computed wrongly enough that it is not
invertible, but not wrongly enough that the nearest valid inverse would be
an inaccurate estimate of the covariance matrix, then these tolerance
values can be modified (reducing |
... |
Optional arguments to the general-purpose optimisation routine
|
This differs from the more usual "competing risks" models, where we specify "cause-specific hazards" describing the time to each competing event. This time will not be observed for an individual if one of the competing events happens first. The event that happens first is defined by the minimum of the times to the alternative events.
The flexsurvmix
function fits a mixture model to data consisting of a
single time to an event for each individual, and an indicator for what type
of event occurs for that individual. The time to event may be observed or
censored, just as in flexsurvreg
, and the type of event may be
known or unknown. In a typical application, where we follow up a set of
individuals until they experience an event or a maximum follow-up time is
reached, the event type is known if the time is observed, and the event type
is unknown when follow-up ends and the time is right-censored.
The model is fitted by maximum likelihood, either directly or by using an
expectation-maximisation (EM) algorithm, by wrapping
flexsurvreg
to compute the likelihood or to implement the E
and M steps.
Some worked examples are given in the package vignette about multi-state
modelling, which can be viewed by running vignette("multistate", package="flexsurv")
.
List of objects containing information about the fitted model. The
important one is res
, a data frame containing the parameter
estimates and associated information.
Jackson, C. H. and Tom, B. D. M. and Kirwan, P. D. and Mandal, S. and Seaman, S. R. and Kunzmann, K. and Presanis, A. M. and De Angelis, D. (2022) A comparison of two frameworks for multi-state modelling, applied to outcomes after hospital admissions with COVID-19. Statistical Methods in Medical Research 31(9) 1656-1674.
Larson, M. G., & Dinse, G. E. (1985). A mixture model for the regression analysis of competing risks data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 34(3), 201-211.
Lau, B., Cole, S. R., & Gange, S. J. (2009). Competing risk regression models for epidemiologic data. American Journal of Epidemiology, 170(2), 244-256.
Parametric modelling or regression for time-to-event data. Several built-in distributions are available, and users may supply their own.
flexsurvreg( formula, anc = NULL, data, weights, bhazard, rtrunc, subset, na.action, dist, inits, fixedpars = NULL, dfns = NULL, aux = NULL, cl = 0.95, integ.opts = NULL, sr.control = survreg.control(), hessian = TRUE, hess.control = NULL, ... )
flexsurvreg( formula, anc = NULL, data, weights, bhazard, rtrunc, subset, na.action, dist, inits, fixedpars = NULL, dfns = NULL, aux = NULL, cl = 0.95, integ.opts = NULL, sr.control = survreg.control(), hessian = TRUE, hess.control = NULL, ... )
formula |
A formula expression in conventional R linear modelling
syntax. The response must be a survival object as returned by the
If there are no covariates, specify If the right hand side is specified as By default, covariates are placed on the “location” parameter of the
distribution, typically the "scale" or "rate" parameter, through a linear
model, or a log-linear model if this parameter must be positive. This
gives an accelerated failure time model or a proportional hazards model
(see Covariates can be placed on other (“ancillary”) parameters by using the
name of the parameter as a “function” in the formula. For example, in a
Weibull model, the following expresses the scale parameter in terms of age
and a treatment variable
However, if the names of the ancillary parameters clash with any real
functions that might be used in formulae (such as
|
|||||||||||||||||||||||||||||||||||||||||||||
anc |
An alternative and safer way to model covariates on ancillary parameters, that is, parameters other than the main location parameter of the distribution. This is a named list of formulae, with the name of each component giving the parameter to be modelled. The model above can also be defined as:
|
|||||||||||||||||||||||||||||||||||||||||||||
data |
A data frame in which to find variables supplied in
|
|||||||||||||||||||||||||||||||||||||||||||||
weights |
Optional numeric variable giving weights for each individual in the data. The fitted model is then defined by maximising the weighted sum of the individual-specific log-likelihoods. |
|||||||||||||||||||||||||||||||||||||||||||||
bhazard |
Optional variable giving expected hazards for relative survival models. The model is described by Nelson et al. (2007).
If For relative survival models, the log-likelihood returned by |
|||||||||||||||||||||||||||||||||||||||||||||
rtrunc |
Optional variable giving individual-specific right-truncation times. Used for analysing data with "retrospective ascertainment". For example, suppose we want to estimate the distribution of the time from onset of a disease to death, but have only observed cases known to have died by the current date. In this case, times from onset to death for individuals in the data are right-truncated by the current date minus the onset date. Predicted survival times for new cases can then be described by an un-truncated version of the fitted distribution. These models can suffer from weakly identifiable parameters and
badly-behaved likelihood functions, and it is advised to compare
convergence for different initial values by supplying different
|
|||||||||||||||||||||||||||||||||||||||||||||
subset |
Vector of integers or logicals specifying the subset of the observations to be used in the fit. |
|||||||||||||||||||||||||||||||||||||||||||||
na.action |
a missing-data filter function, applied after any 'subset'
argument has been used. Default is |
|||||||||||||||||||||||||||||||||||||||||||||
dist |
Typically, one of the strings in the first column of the following table, identifying a built-in distribution. This table also identifies the location parameters, and whether covariates on these parameters represent a proportional hazards (PH) or accelerated failure time (AFT) model. In an accelerated failure time model, the covariate speeds up or slows down the passage of time. So if the coefficient (presented on the log scale) is log(2), then doubling the covariate value would give half the expected survival time.
Alternatively, Very flexible spline-based distributions can also be fitted with
The parameterisations of the built-in distributions used here are the same
as in their built-in distribution functions: A package vignette "Distributions reference" lists the survivor functions and covariate effect parameterisations used by each built-in distribution. For the Weibull, exponential and log-normal distributions,
The Weibull parameterisation is different from that in
Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates. The object |
|||||||||||||||||||||||||||||||||||||||||||||
inits |
An optional numeric vector giving initial values for each unknown parameter. These are numbered in the order: baseline parameters (in the order they appear in the distribution function, e.g. shape before scale in the Weibull), covariate effects on the location parameter, covariate effects on the remaining parameters. This is the same order as the printed estimates in the fitted model. If not specified, default initial values are chosen from a simple summary
of the survival or censoring times, for example the mean is often used to
initialize scale parameters. See the object |
|||||||||||||||||||||||||||||||||||||||||||||
fixedpars |
Vector of indices of parameters whose values will be fixed
at their initial values during the optimisation. The indices are ordered
as in |
|||||||||||||||||||||||||||||||||||||||||||||
dfns |
An alternative way to define a custom survival distribution (see
section “Custom distributions” below). A list whose components may
include
If |
|||||||||||||||||||||||||||||||||||||||||||||
aux |
A named list of other arguments to pass to custom distribution
functions. This is used, for example, by |
|||||||||||||||||||||||||||||||||||||||||||||
cl |
Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95. |
|||||||||||||||||||||||||||||||||||||||||||||
integ.opts |
List of named arguments to pass to
|
|||||||||||||||||||||||||||||||||||||||||||||
sr.control |
For the models which use |
|||||||||||||||||||||||||||||||||||||||||||||
hessian |
Calculate the covariances and confidence intervals for the
parameters. Defaults to |
|||||||||||||||||||||||||||||||||||||||||||||
hess.control |
List of options to control covariance matrix computation. Available options are:
The Hessian is positive definite, thus invertible, at the maximum
likelihood. If the Hessian computed after optimisation convergence can't
be inverted, this is either because the converged result is not the
maximum likelihood (e.g. it could be a "saddle point"), or because the
numerical methods used to obtain the Hessian were inaccurate. If you
suspect that the Hessian was computed wrongly enough that it is not
invertible, but not wrongly enough that the nearest valid inverse would be
an inaccurate estimate of the covariance matrix, then these tolerance
values can be modified (reducing |
|||||||||||||||||||||||||||||||||||||||||||||
... |
Optional arguments to the general-purpose optimisation routine
|
Parameters are estimated by maximum likelihood using the algorithms
available in the standard R optim
function. Parameters
defined to be positive are estimated on the log scale. Confidence intervals
are estimated from the Hessian at the maximum, and transformed back to the
original scale of the parameters.
The usage of flexsurvreg
is intended to be similar to
survreg
in the survival package.
A list of class "flexsurvreg"
containing information about
the fitted model. Components of interest to users may include:
call |
A copy of the function call, for use in post-processing. |
dlist |
List defining the survival distribution used. |
res |
Matrix of maximum likelihood estimates and confidence limits, with parameters on their natural scales. Note the standard error |
res.t |
Matrix of maximum
likelihood estimates and confidence limits, with parameters all
transformed to the real line (using a log transform for all built-in
models where this is necessary). The
|
coefficients |
The transformed maximum likelihood
estimates, as in |
loglik |
Log-likelihood. This will differ from Stata, where the sum of the log uncensored survival times is added to the log-likelihood in survival models, to remove dependency on the time scale. For relative survival models specified with |
logliki |
Vector of individual contributions to the log-likelihood |
AIC |
Akaike's information criterion (-2*log likelihood + 2*number of estimated parameters) |
cov |
Covariance matrix of the parameters, on
the real-line scale (e.g. log scale), which can be extracted with
|
data |
Data used in the model fit. To extract
this in the standard R formats, use use
|
flexsurvreg
is intended to be
easy to extend to handle new distributions. To define a new distribution
for use in flexsurvreg
, construct a list with the following
elements:
"name"
A string naming the distribution. If this
is called "dist"
, for example, then there must be visible in the
working environment, at least, either
a) a function called ddist
which defines the probability density,
or
b) a function called hdist
which defines the hazard.
Ideally, in case a) there should also be a function called pdist
which defines the probability distribution or cumulative density, and in
case b) there should be a function called Hdist
defining the
cumulative hazard. If these additional functions are not provided,
flexsurv attempts to automatically create them by numerically
integrating the density or hazard function. However, model fitting will
be much slower, or may not even work at all, if the analytic versions of
these functions are not available.
The functions must accept vector arguments (representing different times,
or alternative values for each parameter) and return the results as a
vector. The function Vectorize
may be helpful for doing
this: see the example below.
These functions may be in an add-on package (see below for an example) or
may be user-written. If they are user-written they must be defined in the
global environment, or supplied explicitly through the dfns
argument
to flexsurvreg
. The latter may be useful if the functions are
created dynamically (as in the source of flexsurvspline
) and thus
not visible through R's scoping rules.
Arguments other than parameters must be named in the conventional way –
for example x
for the first argument of the density function or
hazard, as in dnorm(x, ...)
and q
for the first
argument of the probability function. Density functions should also have
an argument log
, after the parameters, which when TRUE
,
computes the log density, using a numerically stable additive formula if
possible.
Additional functions with names beginning with "DLd"
and
"DLS"
may be defined to calculate the derivatives of the log density
and log survival probability, with respect to the parameters of the
distribution. The parameters are expressed on the real line, for example
after log transformation if they are defined as positive. The first
argument must be named t
, representing the time, and the remaining
arguments must be named as the parameters of the density function. The
function must return a matrix with rows corresponding to times, and columns
corresponding to the parameters of the distribution. The derivatives are
used, if available, to speed up the model fitting with optim
.
"pars"
Vector of strings naming the parameters of the distribution. These must be the same names as the arguments of the density and probability functions.
"location"
Name of the main parameter governing the mean of
the distribution. This is the default parameter on which covariates are
placed in the formula
supplied to flexsurvreg
.
"transforms"
List of R
functions which transform the range of values taken by each parameter onto
the real line. For example, c(log, log)
for a distribution with two
positive parameters.
"inv.transforms"
List of R functions defining the
corresponding inverse transformations. Note these must be lists, even for
single parameter distributions they should be supplied as, e.g.
c(exp)
or list(exp)
.
"inits"
A function of the
observed survival times t
(including right-censoring times, and
using the halfway point for interval-censored times) which returns a vector
of reasonable initial values for maximum likelihood estimation of each
parameter. For example, function(t){ c(1, mean(t)) }
will always
initialize the first of two parameters at 1, and the second (a scale
parameter, for instance) at the mean of t
.
For example, suppose we want to use an extreme value survival distribution.
This is available in the CRAN package eha, which provides
conventionally-defined density and probability functions called
eha::dEV
and eha::pEV
. See the Examples below
for the custom list in this case, and the subsequent command to fit the
model.
Christopher Jackson <[email protected]>
Jackson, C. (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08
Cox, C. (2008) The generalized distribution: An umbrella for
parametric survival analysis. Statistics in Medicine 27:4301-4312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:4252-4374
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
Nelson, C. P., Lambert, P. C., Squire, I. B., & Jones, D. R. (2007). Flexible parametric models for relative survival, with application in coronary heart disease. Statistics in medicine, 26(30), 5486-5498.
flexsurvspline
for flexible survival modelling using
the spline model of Royston and Parmar.
plot.flexsurvreg
and lines.flexsurvreg
to plot
fitted survival, hazards and cumulative hazards from models fitted by
flexsurvreg
and flexsurvspline
.
## Compare generalized gamma fit with Weibull fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma") fitg fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull") fitw plot(fitg) lines(fitw, col="blue", lwd.ci=1, lty.ci=1) ## Identical AIC, probably not enough data in this simple example for a ## very flexible model to be worthwhile. ## Custom distribution ## make "dEV" and "pEV" from eha package (if installed) ## available to the working environment if (require("eha")) { custom.ev <- list(name="EV", pars=c("shape","scale"), location="scale", transforms=c(log, log), inv.transforms=c(exp, exp), inits=function(t){ c(1, median(t)) }) fitev <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.ev) fitev lines(fitev, col="purple", col.ci="purple") } ## Custom distribution: supply the hazard function only hexp2 <- function(x, rate=1){ rate } # exponential distribution hexp2 <- Vectorize(hexp2) custom.exp2 <- list(name="exp2", pars=c("rate"), location="rate", transforms=c(log), inv.transforms=c(exp), inits=function(t)1/mean(t)) flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2) flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp") ## should give same answer
## Compare generalized gamma fit with Weibull fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma") fitg fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull") fitw plot(fitg) lines(fitw, col="blue", lwd.ci=1, lty.ci=1) ## Identical AIC, probably not enough data in this simple example for a ## very flexible model to be worthwhile. ## Custom distribution ## make "dEV" and "pEV" from eha package (if installed) ## available to the working environment if (require("eha")) { custom.ev <- list(name="EV", pars=c("shape","scale"), location="scale", transforms=c(log, log), inv.transforms=c(exp, exp), inits=function(t){ c(1, median(t)) }) fitev <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.ev) fitev lines(fitev, col="purple", col.ci="purple") } ## Custom distribution: supply the hazard function only hexp2 <- function(x, rate=1){ rate } # exponential distribution hexp2 <- Vectorize(hexp2) custom.exp2 <- list(name="exp2", pars=c("rate"), location="rate", transforms=c(log), inv.transforms=c(exp), inits=function(t)1/mean(t)) flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2) flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp") ## should give same answer
This function estimates the distribution of the time between an initial and final event, in situations where individuals are only observed if they have experienced both events before a certain time, thus they are right-truncated at this time. The time of the initial event provides information about the time from initial to final event, given the truncated observation scheme, and initial events are assumed to occur with an exponential growth rate.
flexsurvrtrunc( t, tinit, rtrunc, tmax, data = NULL, method = "joint", dist, theta = NULL, fixed.theta = TRUE, inits = NULL, fixedpars = NULL, dfns = NULL, integ.opts = NULL, cl = 0.95, optim_control = list() )
flexsurvrtrunc( t, tinit, rtrunc, tmax, data = NULL, method = "joint", dist, theta = NULL, fixed.theta = TRUE, inits = NULL, fixedpars = NULL, dfns = NULL, integ.opts = NULL, cl = 0.95, optim_control = list() )
t |
Vector of time differences between an initial and final event for a set of individuals. |
|||||||||||||||||||||||||||||||||||||||||||||
tinit |
Absolute time of the initial event for each individual. |
|||||||||||||||||||||||||||||||||||||||||||||
rtrunc |
Individual-specific right truncation points on the same scale as |
|||||||||||||||||||||||||||||||||||||||||||||
tmax |
Maximum possible time between initial and final events that could have been observed. This is only used in |
|||||||||||||||||||||||||||||||||||||||||||||
data |
Data frame containing |
|||||||||||||||||||||||||||||||||||||||||||||
method |
If |
|||||||||||||||||||||||||||||||||||||||||||||
dist |
Typically, one of the strings in the first column of the following table, identifying a built-in distribution. This table also identifies the location parameters, and whether covariates on these parameters represent a proportional hazards (PH) or accelerated failure time (AFT) model. In an accelerated failure time model, the covariate speeds up or slows down the passage of time. So if the coefficient (presented on the log scale) is log(2), then doubling the covariate value would give half the expected survival time.
Alternatively, Very flexible spline-based distributions can also be fitted with
The parameterisations of the built-in distributions used here are the same
as in their built-in distribution functions: A package vignette "Distributions reference" lists the survivor functions and covariate effect parameterisations used by each built-in distribution. For the Weibull, exponential and log-normal distributions,
The Weibull parameterisation is different from that in
Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates. The object |
|||||||||||||||||||||||||||||||||||||||||||||
theta |
Initial value (or fixed value) for the exponential growth rate |
|||||||||||||||||||||||||||||||||||||||||||||
fixed.theta |
Should |
|||||||||||||||||||||||||||||||||||||||||||||
inits |
Initial values for the parameters of the parametric survival distributon. If not supplied, a heuristic is used. as is done in |
|||||||||||||||||||||||||||||||||||||||||||||
fixedpars |
Integer indices of the parameters of the survival distribution that should be fixed to their values supplied in |
|||||||||||||||||||||||||||||||||||||||||||||
dfns |
An alternative way to define a custom survival distribution (see
section “Custom distributions” below). A list whose components may
include
If |
|||||||||||||||||||||||||||||||||||||||||||||
integ.opts |
List of named arguments to pass to
|
|||||||||||||||||||||||||||||||||||||||||||||
cl |
Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95. |
|||||||||||||||||||||||||||||||||||||||||||||
optim_control |
List to supply as the |
Covariates are not currently supported.
Note that flexsurvreg
, with an rtrunc
argument, can fit models for a similar kind of data, but those models ignore the information provided by the time of the initial event.
A nonparametric estimator of survival under right-truncation is also provided in survrtrunc
. See Seaman et al. (2020) for a full comparison of the alternative models.
Seaman, S., Presanis, A. and Jackson, C. (2020) Estimating a Time-to-Event Distribution from Right-Truncated Data in an Epidemic: a Review of Methods
set.seed(1) ## simulate time to initial event X <- rexp(1000, 0.2) ## simulate time between initial and final event T <- rgamma(1000, 2, 10) ## right-truncate to keep only those with final event time ## before tmax tmax <- 40 obs <- X + T < tmax rtrunc <- tmax - X dat <- data.frame(X, T, rtrunc)[obs,] flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2, fixed.theta=FALSE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2, inits=c(1, 8)) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2, method="final") flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="weibull", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="lnorm", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gengamma", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gompertz", fixed.theta=TRUE)
set.seed(1) ## simulate time to initial event X <- rexp(1000, 0.2) ## simulate time between initial and final event T <- rgamma(1000, 2, 10) ## right-truncate to keep only those with final event time ## before tmax tmax <- 40 obs <- X + T < tmax rtrunc <- tmax - X dat <- data.frame(X, T, rtrunc)[obs,] flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2, fixed.theta=FALSE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2, inits=c(1, 8)) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", theta=0.2, method="final") flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gamma", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="weibull", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="lnorm", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gengamma", fixed.theta=TRUE) flexsurvrtrunc(t=T, rtrunc=rtrunc, tinit=X, tmax=40, data=dat, dist="gompertz", fixed.theta=TRUE)
Flexible parametric modelling of time-to-event data using the spline model of Royston and Parmar (2002).
flexsurvspline( formula, data, weights, bhazard, rtrunc, subset, k = 0, knots = NULL, bknots = NULL, scale = "hazard", timescale = "log", spline = "rp", ... )
flexsurvspline( formula, data, weights, bhazard, rtrunc, subset, k = 0, knots = NULL, bknots = NULL, scale = "hazard", timescale = "log", spline = "rp", ... )
formula |
A formula expression in conventional R linear modelling
syntax. The response must be a survival object as returned by the
specifies a model where the log cumulative hazard (by default, see
If there are no covariates, specify Time-varying covariate effects can be specified using the method described
in Therefore a model with one internal spline knot, where the equivalents of
the Weibull shape and scale parameters, but not the higher-order term
or alternatively (and more safely, see
|
data |
A data frame in which to find variables supplied in
|
weights |
Optional variable giving case weights. |
bhazard |
Optional variable giving expected hazards for relative survival models. |
rtrunc |
Optional variable giving individual right-truncation times (see |
subset |
Vector of integers or logicals specifying the subset of the observations to be used in the fit. |
k |
Number of knots in the spline. The default |
knots |
Locations of knots on the axis of log time (or time, see
|
bknots |
Locations of boundary knots, on the axis of log time (or
time, see |
scale |
If If If |
timescale |
If |
spline |
|
... |
Any other arguments to be passed to or through
|
This function works as a wrapper around flexsurvreg
by
dynamically constructing a custom distribution using
dsurvspline
, psurvspline
and
unroll.function
.
In the spline-based survival model of Royston and Parmar (2002), a
transformation of the survival function is modelled as a
natural cubic spline function of log time
plus linear effects of covariates
.
The proportional hazards model (scale="hazard"
) defines
, the
log cumulative hazard.
The proportional odds model (scale="odds"
) defines
, the log
cumulative odds.
The probit model (scale="normal"
) defines , where
is the inverse normal
distribution function
qnorm
.
With no knots, the spline reduces to a linear function, and these models are equivalent to Weibull, log-logistic and lognormal models respectively.
The spline coefficients , which are called the "ancillary parameters" above, may also be
modelled as linear functions of covariates
, as
giving a model where the effects of covariates are arbitrarily flexible functions of time: a non-proportional hazards or odds model.
Natural cubic splines are cubic splines constrained to be linear beyond
boundary knots . The spline function is
defined as
where is the
th basis function
and .
A list of class "flexsurvreg"
with the same elements as
described in flexsurvreg
, and including extra components
describing the spline model. See in particular:
k |
Number of knots. |
knots |
Location of knots on the log time axis. |
scale |
The |
res |
Matrix of maximum likelihood estimates and confidence limits.
Spline coefficients are labelled Coefficients In the Weibull model, for example, In the log-logistic model with shape In the log-normal model with log-scale mean |
loglik |
The maximised log-likelihood. This will differ from Stata, where the sum of the log uncensored survival times is added to the log-likelihood in survival models, to remove dependency on the time scale. |
Christopher Jackson <[email protected]>
Royston, P. and Parmar, M. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(1):2175-2197.
Wang W, Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517.
Jackson, C. (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08
flexsurvreg
for flexible survival modelling using
general parametric distributions.
plot.flexsurvreg
and lines.flexsurvreg
to plot
fitted survival, hazards and cumulative hazards from models fitted by
flexsurvspline
and flexsurvreg
.
## Best-fitting model to breast cancer data from Royston and Parmar (2002) ## One internal knot (2 df) and cumulative odds scale spl <- flexsurvspline(Surv(recyrs, censrec) ~ group, data=bc, k=1, scale="odds") ## Fitted survival plot(spl, lwd=3, ci=FALSE) ## Simple Weibull model fits much less well splw <- flexsurvspline(Surv(recyrs, censrec) ~ group, data=bc, k=0, scale="hazard") lines(splw, col="blue", ci=FALSE) ## Alternative way of fitting the Weibull ## Not run: splw2 <- flexsurvreg(Surv(recyrs, censrec) ~ group, data=bc, dist="weibull") ## End(Not run)
## Best-fitting model to breast cancer data from Royston and Parmar (2002) ## One internal knot (2 df) and cumulative odds scale spl <- flexsurvspline(Surv(recyrs, censrec) ~ group, data=bc, k=1, scale="odds") ## Fitted survival plot(spl, lwd=3, ci=FALSE) ## Simple Weibull model fits much less well splw <- flexsurvspline(Surv(recyrs, censrec) ~ group, data=bc, k=0, scale="hazard") lines(splw, col="blue", ci=FALSE) ## Alternative way of fitting the Weibull ## Not run: splw2 <- flexsurvreg(Surv(recyrs, censrec) ~ group, data=bc, dist="weibull") ## End(Not run)
Constructor for a mixture multi-state model based on flexsurvmix
fmixmsm(...)
fmixmsm(...)
... |
Named arguments. Each argument should be a fitted model as
returned by |
A list of flexsurvmix
objects, with the following
attribute(s):
pathways
A list of all potential pathways until absorption, for
models without cycles. For models with cycles this will have an element
has_cycle=TRUE
, plus the pathways discovered before the function
found the cycle and gave up.
Construct a multi-state model from a set of parametric survival models
fmsm(..., trans)
fmsm(..., trans)
... |
Objects returned by |
trans |
A matrix of integers specifying which models correspond to
which transitions. The |
A list containing the objects given in ...
, and with
attributes "trans"
and "statenames"
defining the transition
structure matrix and state names, and with list components named to
describe the transitions they correspond to. If any of the arguments in
...
are named, then these are used to define the transition names,
otherwise default names are chosen based on the state names.
Density, distribution function, hazards, quantile function and random generation for the generalized F distribution, using the reparameterisation by Prentice (1975).
dgenf(x, mu = 0, sigma = 1, Q, P, log = FALSE) pgenf(q, mu = 0, sigma = 1, Q, P, lower.tail = TRUE, log.p = FALSE) Hgenf(x, mu = 0, sigma = 1, Q, P) hgenf(x, mu = 0, sigma = 1, Q, P) qgenf(p, mu = 0, sigma = 1, Q, P, lower.tail = TRUE, log.p = FALSE) rgenf(n, mu = 0, sigma = 1, Q, P)
dgenf(x, mu = 0, sigma = 1, Q, P, log = FALSE) pgenf(q, mu = 0, sigma = 1, Q, P, lower.tail = TRUE, log.p = FALSE) Hgenf(x, mu = 0, sigma = 1, Q, P) hgenf(x, mu = 0, sigma = 1, Q, P) qgenf(p, mu = 0, sigma = 1, Q, P, lower.tail = TRUE, log.p = FALSE) rgenf(n, mu = 0, sigma = 1, Q, P)
x , q
|
Vector of quantiles. |
mu |
Vector of location parameters. |
sigma |
Vector of scale parameters. |
Q |
Vector of first shape parameters. |
P |
Vector of second shape parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
Vector of probabilities. |
n |
number of observations. If |
If , and
then
has the original generalized F distribution with
location parameter
, scale parameter
and shape parameters
.
In this more stable version described by Prentice (1975),
are replaced by shape parameters
, with
, and
equivalently
Define , and
,
then the probability density function of
is
The
original parameterisation is available in this package as
dgenf.orig
, for the sake of completion / compatibility. With
the above definitions,
dgenf(x, mu=mu, sigma=sigma, Q=Q, P=P) = dgenf.orig(x, mu=mu,
sigma=sigma/delta, s1=s1, s2=s2)
The generalized F distribution with P=0
is equivalent to the
generalized gamma distribution dgengamma
, so that
dgenf(x, mu, sigma, Q, P=0)
equals dgengamma(x, mu, sigma,
Q)
. The generalized gamma reduces further to several common
distributions, as described in the GenGamma
help page.
The generalized F distribution includes the log-logistic distribution (see
eha::dllogis
) as a further special case:
dgenf(x, mu=mu, sigma=sigma, Q=0, P=1) = eha::dllogis(x,
shape=sqrt(2)/sigma, scale=exp(mu))
The range of hazard trajectories available under this distribution are discussed in detail by Cox (2008). Jackson et al. (2010) give an application to modelling oral cancer survival for use in a health economic evaluation of screening.
dgenf
gives the density, pgenf
gives the distribution
function, qgenf
gives the quantile function, rgenf
generates
random deviates, Hgenf
retuns the cumulative hazard and hgenf
the hazard.
The parameters Q
and P
are usually called and
in the literature - they were made upper-case in these R functions
to avoid clashing with the conventional arguments
q
in the
probability function and p
in the quantile function.
Christopher Jackson <[email protected]>
R. L. Prentice (1975). Discrimination among some parametric models. Biometrika 62(3):607-614.
Cox, C. (2008). The generalized distribution: An umbrella for
parametric survival analysis. Statistics in Medicine 27:4301-4312.
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010). Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
Density, distribution function, quantile function and random generation for the generalized F distribution, using the less flexible original parameterisation described by Prentice (1975).
dgenf.orig(x, mu = 0, sigma = 1, s1, s2, log = FALSE) pgenf.orig(q, mu = 0, sigma = 1, s1, s2, lower.tail = TRUE, log.p = FALSE) Hgenf.orig(x, mu = 0, sigma = 1, s1, s2) hgenf.orig(x, mu = 0, sigma = 1, s1, s2) qgenf.orig(p, mu = 0, sigma = 1, s1, s2, lower.tail = TRUE, log.p = FALSE) rgenf.orig(n, mu = 0, sigma = 1, s1, s2)
dgenf.orig(x, mu = 0, sigma = 1, s1, s2, log = FALSE) pgenf.orig(q, mu = 0, sigma = 1, s1, s2, lower.tail = TRUE, log.p = FALSE) Hgenf.orig(x, mu = 0, sigma = 1, s1, s2) hgenf.orig(x, mu = 0, sigma = 1, s1, s2) qgenf.orig(p, mu = 0, sigma = 1, s1, s2, lower.tail = TRUE, log.p = FALSE) rgenf.orig(n, mu = 0, sigma = 1, s1, s2)
x , q
|
vector of quantiles. |
mu |
Vector of location parameters. |
sigma |
Vector of scale parameters. |
s1 |
Vector of first F shape parameters. |
s2 |
vector of second F shape parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
vector of probabilities. |
n |
number of observations. If |
If , and
then
has the original generalized F distribution with
location parameter
, scale parameter
and shape parameters
. The probability density
function of
is
where , and
is the beta function.
As , the distribution
of
tends towards an original generalized gamma
distribution with the following parameters:
dgengamma.orig(x, shape=1/sigma, scale=exp(mu) /
s1^sigma, k=s1)
See GenGamma.orig
for how this includes several
other common distributions as special cases.
The alternative parameterisation of the generalized F
distribution, originating from Prentice (1975) and given in this
package as GenF
, is preferred for statistical
modelling, since it is more stable as tends to
infinity, and includes a further new class of distributions with
negative first shape parameter. The original is provided here for
the sake of completion and compatibility.
dgenf.orig
gives the density, pgenf.orig
gives the
distribution function, qgenf.orig
gives the quantile function,
rgenf.orig
generates random deviates, Hgenf.orig
retuns the
cumulative hazard and hgenf.orig
the hazard.
Christopher Jackson <[email protected]>
R. L. Prentice (1975). Discrimination among some parametric models. Biometrika 62(3):607-614.
Density, distribution function, hazards, quantile function and random generation for the generalized gamma distribution, using the parameterisation originating from Prentice (1974). Also known as the (generalized) log-gamma distribution.
dgengamma(x, mu = 0, sigma = 1, Q, log = FALSE) pgengamma(q, mu = 0, sigma = 1, Q, lower.tail = TRUE, log.p = FALSE) Hgengamma(x, mu = 0, sigma = 1, Q) hgengamma(x, mu = 0, sigma = 1, Q) qgengamma(p, mu = 0, sigma = 1, Q, lower.tail = TRUE, log.p = FALSE) rgengamma(n, mu = 0, sigma = 1, Q)
dgengamma(x, mu = 0, sigma = 1, Q, log = FALSE) pgengamma(q, mu = 0, sigma = 1, Q, lower.tail = TRUE, log.p = FALSE) Hgengamma(x, mu = 0, sigma = 1, Q) hgengamma(x, mu = 0, sigma = 1, Q) qgengamma(p, mu = 0, sigma = 1, Q, lower.tail = TRUE, log.p = FALSE) rgengamma(n, mu = 0, sigma = 1, Q)
x , q
|
vector of quantiles. |
mu |
Vector of “location” parameters. |
sigma |
Vector of “scale” parameters. Note the inconsistent
meanings of the term “scale” - this parameter is analogous to the
(log-scale) standard deviation of the log-normal distribution, “sdlog” in
|
Q |
Vector of shape parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
vector of probabilities. |
n |
number of observations. If |
If , and
, then
follows the generalized gamma distribution with
probability density function
This parameterisation is preferred to the original
parameterisation of the generalized gamma by Stacy (1962) since it
is more numerically stable near to (the log-normal
distribution), and allows
. The original is available
in this package as
dgengamma.orig
, for the sake of
completion and compatibility with other software - this is
implicitly restricted to Q
>0 (or k
>0 in the original
notation). The parameters of dgengamma
and
dgengamma.orig
are related as follows.
dgengamma.orig(x, shape=shape, scale=scale, k=k) =
dgengamma(x, mu=log(scale) + log(k)/shape, sigma=1/(shape*sqrt(k)),
Q=1/sqrt(k))
The generalized gamma distribution simplifies to the gamma, log-normal and Weibull distributions with the following parameterisations:
dgengamma(x, mu, sigma, Q=0) |
= |
dlnorm(x, mu, sigma) |
dgengamma(x, mu, sigma, Q=1) |
= |
dweibull(x, shape=1/sigma, scale=exp(mu)) |
dgengamma(x, mu, sigma, Q=sigma) |
= |
dgamma(x,
shape=1/sigma^2, rate=exp(-mu) / sigma^2) |
The properties of the generalized gamma and its applications to survival analysis are discussed in detail by Cox (2007).
The generalized F distribution GenF
extends the generalized
gamma to four parameters.
dgengamma
gives the density, pgengamma
gives the
distribution function, qgengamma
gives the quantile function,
rgengamma
generates random deviates, Hgengamma
retuns the
cumulative hazard and hgengamma
the hazard.
Christopher Jackson <[email protected]>
Prentice, R. L. (1974). A log gamma model and its maximum likelihood estimation. Biometrika 61(3):539-544.
Farewell, V. T. and Prentice, R. L. (1977). A study of distributional shape in life testing. Technometrics 19(1):69-75.
Lawless, J. F. (1980). Inference in the generalized gamma and log gamma distributions. Technometrics 22(3):409-419.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007). Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:4252-4374
Stacy, E. W. (1962). A generalization of the gamma distribution. Annals of Mathematical Statistics 33:1187-92
GenGamma.orig
, GenF
,
Lognormal
, GammaDist
, Weibull
.
Density, distribution function, hazards, quantile function and random generation for the generalized gamma distribution, using the original parameterisation from Stacy (1962).
dgengamma.orig(x, shape, scale = 1, k, log = FALSE) pgengamma.orig(q, shape, scale = 1, k, lower.tail = TRUE, log.p = FALSE) Hgengamma.orig(x, shape, scale = 1, k) hgengamma.orig(x, shape, scale = 1, k) qgengamma.orig(p, shape, scale = 1, k, lower.tail = TRUE, log.p = FALSE) rgengamma.orig(n, shape, scale = 1, k)
dgengamma.orig(x, shape, scale = 1, k, log = FALSE) pgengamma.orig(q, shape, scale = 1, k, lower.tail = TRUE, log.p = FALSE) Hgengamma.orig(x, shape, scale = 1, k) hgengamma.orig(x, shape, scale = 1, k) qgengamma.orig(p, shape, scale = 1, k, lower.tail = TRUE, log.p = FALSE) rgengamma.orig(n, shape, scale = 1, k)
x , q
|
vector of quantiles. |
shape |
vector of “Weibull” shape parameters. |
scale |
vector of scale parameters. |
k |
vector of “Gamma” shape parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
vector of probabilities. |
n |
number of observations. If |
If , then
follows the original generalised gamma distribution with the
parameterisation given here (Stacy 1962). Defining
shape
,
scale
,
has
probability density
The original generalized gamma distribution simplifies to the gamma, exponential and Weibull distributions with the following parameterisations:
dgengamma.orig(x, shape, scale, k=1) |
=
|
dweibull(x, shape, scale) |
dgengamma.orig(x,
shape=1, scale, k) |
= |
dgamma(x, shape=k,
scale) |
dgengamma.orig(x, shape=1, scale, k=1) |
=
|
dexp(x, rate=1/scale) |
Also as k tends to infinity, it tends to the log normal (as in
dlnorm
) with the following parameters (Lawless,
1980):
dlnorm(x, meanlog=log(scale) + log(k)/shape,
sdlog=1/(shape*sqrt(k)))
For more stable behaviour as the distribution tends to the log-normal, an
alternative parameterisation was developed by Prentice (1974). This is
given in dgengamma
, and is now preferred for statistical
modelling. It is also more flexible, including a further new class of
distributions with negative shape k
.
The generalized F distribution GenF.orig
, and its similar
alternative parameterisation GenF
, extend the generalized
gamma to four parameters.
dgengamma.orig
gives the density, pgengamma.orig
gives the distribution function, qgengamma.orig
gives the quantile
function, rgengamma.orig
generates random deviates,
Hgengamma.orig
retuns the cumulative hazard and
hgengamma.orig
the hazard.
Christopher Jackson <[email protected]>
Stacy, E. W. (1962). A generalization of the gamma distribution. Annals of Mathematical Statistics 33:1187-92.
Prentice, R. L. (1974). A log gamma model and its maximum likelihood estimation. Biometrika 61(3):539-544.
Lawless, J. F. (1980). Inference in the generalized gamma and log gamma distributions. Technometrics 22(3):409-419.
GenGamma
, GenF.orig
,
GenF
, Lognormal
, GammaDist
,
Weibull
.
Evaluate baseline time-to-event distribution parameters given covariate values in a flexsurvmix model
get_basepars(x, newdata, event)
get_basepars(x, newdata, event)
x |
Fitted model object |
newdata |
Data frame of alternative covariate values |
event |
Event |
Glance accepts a model object and returns a tibble with exactly one row of model summaries.
## S3 method for class 'flexsurvreg' glance(x, ...)
## S3 method for class 'flexsurvreg' glance(x, ...)
x |
Output from |
... |
Not currently used. |
A one-row tibble
containing columns:
N
Number of observations used in fitting
events
Number of events
censored
Number of censored events
trisk
Total length of time-at-risk (i.e. follow-up)
df
Degrees of freedom (i.e. number of estimated parameters)
logLik
Log-likelihood
AIC
Akaike's "An Information Criteria"
BIC
Bayesian Information Criteria
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") glance(fitg)
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") glance(fitg)
Density, distribution function, hazards, quantile function and random generation for the Gompertz distribution with unrestricted shape.
dgompertz(x, shape, rate = 1, log = FALSE) pgompertz(q, shape, rate = 1, lower.tail = TRUE, log.p = FALSE) qgompertz(p, shape, rate = 1, lower.tail = TRUE, log.p = FALSE) rgompertz(n, shape = 1, rate = 1) hgompertz(x, shape, rate = 1, log = FALSE) Hgompertz(x, shape, rate = 1, log = FALSE)
dgompertz(x, shape, rate = 1, log = FALSE) pgompertz(q, shape, rate = 1, lower.tail = TRUE, log.p = FALSE) qgompertz(p, shape, rate = 1, lower.tail = TRUE, log.p = FALSE) rgompertz(n, shape = 1, rate = 1) hgompertz(x, shape, rate = 1, log = FALSE) Hgompertz(x, shape, rate = 1, log = FALSE)
x , q
|
vector of quantiles. |
shape , rate
|
vector of shape and rate parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
vector of probabilities. |
n |
number of observations. If |
The Gompertz distribution with shape
parameter and
rate
parameter has probability density function
and hazard
The hazard is increasing for shape and decreasing for
.
For
the Gompertz is equivalent to the exponential distribution
with constant hazard and rate
.
The probability distribution function is
Thus if is negative, letting
tend to infinity shows that
there is a non-zero probability
of living
forever. On these occasions
qgompertz
and rgompertz
will
return Inf
.
dgompertz
gives the density, pgompertz
gives the
distribution function, qgompertz
gives the quantile function,
hgompertz
gives the hazard function, Hgompertz
gives the
cumulative hazard function, and rgompertz
generates random deviates.
Some implementations of the Gompertz restrict to be strictly
positive, which ensures that the probability of survival decreases to zero
as
increases to infinity. The more flexible implementation given
here is consistent with
streg
in Stata.
The functions eha::dgompertz
and similar available in the
package eha label the parameters the other way round, so that what is
called the shape
there is called the rate
here, and what is
called 1 / scale
there is called the shape
here. The
terminology here is consistent with the exponential dexp
and
Weibull dweibull
distributions in R.
Christopher Jackson <[email protected]>
Stata Press (2007) Stata release 10 manual: Survival analysis and epidemiological tables.
Hazard and cumulative hazard functions for distributions which are built into flexsurv, and whose distribution functions are in base R.
hexp(x, rate = 1, log = FALSE) Hexp(x, rate = 1, log = FALSE) hgamma(x, shape, rate = 1, log = FALSE) Hgamma(x, shape, rate = 1, log = FALSE) hlnorm(x, meanlog = 0, sdlog = 1, log = FALSE) Hlnorm(x, meanlog = 0, sdlog = 1, log = FALSE) hweibull(x, shape, scale = 1, log = FALSE) Hweibull(x, shape, scale = 1, log = FALSE)
hexp(x, rate = 1, log = FALSE) Hexp(x, rate = 1, log = FALSE) hgamma(x, shape, rate = 1, log = FALSE) Hgamma(x, shape, rate = 1, log = FALSE) hlnorm(x, meanlog = 0, sdlog = 1, log = FALSE) Hlnorm(x, meanlog = 0, sdlog = 1, log = FALSE) hweibull(x, shape, scale = 1, log = FALSE) Hweibull(x, shape, scale = 1, log = FALSE)
x |
Vector of quantiles |
rate |
Rate parameter (exponential and gamma) |
log |
Compute log hazard or log cumulative hazard |
shape |
Shape parameter (Weibull and gamma) |
meanlog |
Mean on the log scale (log normal) |
sdlog |
Standard deviation on the log scale (log normal) |
scale |
Scale parameter (Weibull) |
For the exponential and the Weibull these are available analytically, and so are programmed here in numerically stable and efficient forms.
For the gamma and log-normal, these are simply computed as minus the log of the survivor function (cumulative hazard) or the ratio of the density and survivor function (hazard), so are not expected to be robust to extreme values or quick to compute.
Hazard (functions beginning 'h') or cumulative hazard (functions beginning 'H').
Christopher Jackson <[email protected]>
dexp
,dweibull
,dgamma
,dlnorm
,dgompertz
,dgengamma
,dgenf
Hazard ratio as a function of time from a parametric survival model
hr_flexsurvreg( x, newdata = NULL, t = NULL, start = 0, ci = TRUE, B = 1000, cl = 0.95, na.action = na.pass )
hr_flexsurvreg( x, newdata = NULL, t = NULL, start = 0, ci = TRUE, B = 1000, cl = 0.95, na.action = na.pass )
x |
Object returned by |
newdata |
A data frame with two rows, each specifying a set of covariate values.
The hazard ratio is calculated as hazard(z2)/hazard(z1), where z1 is the first row
of
|
t |
Times to calculate fitted values for. By default, these are the sorted unique observation (including censoring) times in the data - for left-truncated datasets these are the "stop" times. |
start |
Optional left-truncation time or times. The returned
survival, hazard or cumulative hazard will be conditioned on survival up to
this time. Predicted times returned with A vector of the same length as |
ci |
Set to |
B |
Number of simulations from the normal asymptotic distribution of
the estimates used to calculate confidence intervals or standard errors.
Decrease for greater
speed at the expense of accuracy, or set |
cl |
Width of symmetric confidence intervals, relative to 1. |
na.action |
Function determining what should be done with missing values in |
A data frame with estimate and confidence limits for the hazard ratio, and
one row for each of the times requested in t
.
Add fitted survival (or hazard or cumulative hazard) curves from a
flexsurvreg
model fit to an existing plot.
## S3 method for class 'flexsurvreg' lines( x, newdata = NULL, X = NULL, type = "survival", t = NULL, est = TRUE, ci = NULL, B = 1000, cl = 0.95, col = "red", lty = 1, lwd = 2, col.ci = NULL, lty.ci = 2, lwd.ci = 1, ... )
## S3 method for class 'flexsurvreg' lines( x, newdata = NULL, X = NULL, type = "survival", t = NULL, est = TRUE, ci = NULL, B = 1000, cl = 0.95, col = "red", lty = 1, lwd = 2, col.ci = NULL, lty.ci = 2, lwd.ci = 1, ... )
x |
Output from |
newdata |
Covariate values to produce fitted curves for, as a data
frame, as described in |
X |
Covariate values to produce fitted curves for, as a matrix, as
described in |
type |
|
t |
Vector of times to plot fitted values for. |
est |
Plot fitted curves ( |
ci |
Plot confidence intervals for fitted curves. |
B |
Number of simulations controlling accuracy of confidence
intervals, as used in |
cl |
Width of confidence intervals, by default 0.95 for 95% intervals. |
col |
Colour of the fitted curve(s). |
lty |
Line type of the fitted curve(s). |
lwd |
Line width of the fitted curve(s). |
col.ci |
Colour of the confidence limits, defaulting to the same as for the fitted curve. |
lty.ci |
Line type of the confidence limits. |
lwd.ci |
Line width of the confidence limits, defaulting to the same as for the fitted curve. |
... |
Other arguments to be passed to the generic |
Equivalent to plot.flexsurvreg(...,add=TRUE)
.
C. H. Jackson [email protected]
Density, distribution function, hazards, quantile function and random generation for the log-logistic distribution.
dllogis(x, shape = 1, scale = 1, log = FALSE) pllogis(q, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE) qllogis(p, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE) rllogis(n, shape = 1, scale = 1) hllogis(x, shape = 1, scale = 1, log = FALSE) Hllogis(x, shape = 1, scale = 1, log = FALSE)
dllogis(x, shape = 1, scale = 1, log = FALSE) pllogis(q, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE) qllogis(p, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE) rllogis(n, shape = 1, scale = 1) hllogis(x, shape = 1, scale = 1, log = FALSE) Hllogis(x, shape = 1, scale = 1, log = FALSE)
x , q
|
vector of quantiles. |
shape , scale
|
vector of shape and scale parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
p |
vector of probabilities. |
n |
number of observations. If |
The log-logistic distribution with shape
parameter
and
scale
parameter has probability
density function
and hazard
for . The hazard is decreasing for shape
, and unimodal for
.
The probability distribution function is
If , the mean is
, and if
the variance is
, where
, otherwise these are undefined.
dllogis
gives the density, pllogis
gives the
distribution function, qllogis
gives the quantile function,
hllogis
gives the hazard function, Hllogis
gives the
cumulative hazard function, and rllogis
generates random
deviates.
Various different parameterisations of this distribution are
used. In the one used here, the interpretation of the parameters
is the same as in the standard Weibull distribution
(dweibull
). Like the Weibull, the survivor function
is a transformation of from the non-negative real line to [0,1],
but with a different link function. Covariates on
represent time acceleration factors, or ratios of expected
survival.
The same parameterisation is also used in
eha::dllogis
in the eha package.
Christopher Jackson <[email protected]>
Stata Press (2007) Stata release 10 manual: Survival analysis and epidemiological tables.
Log likelihood from a flexsurvreg model
## S3 method for class 'flexsurvreg' logLik(object, ...)
## S3 method for class 'flexsurvreg' logLik(object, ...)
object |
A fitted model object of class
|
... |
Other arguments (currently unused). |
Log-likelihood (numeric) with additional attributes df
(degrees of freedom, or number
of parameters that were estimated), and number of observations nobs
(including observed
events and censored observations).
Mean and restricted mean survival time functions for distributions which are built into flexsurv.
mean_exp(rate = 1) rmst_exp(t, rate = 1, start = 0) mean_gamma(shape, rate = 1) rmst_gamma(t, shape, rate = 1, start = 0) rmst_genf(t, mu, sigma, Q, P, start = 0) mean_genf(mu, sigma, Q, P) rmst_genf.orig(t, mu, sigma, s1, s2, start = 0) mean_genf.orig(mu, sigma, s1, s2) rmst_gengamma(t, mu = 0, sigma = 1, Q, start = 0) mean_gengamma(mu = 0, sigma = 1, Q) rmst_gengamma.orig(t, shape, scale = 1, k, start = 0) mean_gengamma.orig(shape, scale = 1, k) rmst_gompertz(t, shape, rate = 1, start = 0) mean_gompertz(shape, rate = 1) mean_llogis(shape = 1, scale = 1) rmst_llogis(t, shape = 1, scale = 1, start = 0) mean_lnorm(meanlog = 0, sdlog = 1) rmst_lnorm(t, meanlog = 0, sdlog = 1, start = 0) mean_weibull(shape, scale = 1) rmst_weibull(t, shape, scale = 1, start = 0) rmst_weibullPH(t, shape, scale = 1, start = 0) mean_weibullPH(shape, scale = 1)
mean_exp(rate = 1) rmst_exp(t, rate = 1, start = 0) mean_gamma(shape, rate = 1) rmst_gamma(t, shape, rate = 1, start = 0) rmst_genf(t, mu, sigma, Q, P, start = 0) mean_genf(mu, sigma, Q, P) rmst_genf.orig(t, mu, sigma, s1, s2, start = 0) mean_genf.orig(mu, sigma, s1, s2) rmst_gengamma(t, mu = 0, sigma = 1, Q, start = 0) mean_gengamma(mu = 0, sigma = 1, Q) rmst_gengamma.orig(t, shape, scale = 1, k, start = 0) mean_gengamma.orig(shape, scale = 1, k) rmst_gompertz(t, shape, rate = 1, start = 0) mean_gompertz(shape, rate = 1) mean_llogis(shape = 1, scale = 1) rmst_llogis(t, shape = 1, scale = 1, start = 0) mean_lnorm(meanlog = 0, sdlog = 1) rmst_lnorm(t, meanlog = 0, sdlog = 1, start = 0) mean_weibull(shape, scale = 1) rmst_weibull(t, shape, scale = 1, start = 0) rmst_weibullPH(t, shape, scale = 1, start = 0) mean_weibullPH(shape, scale = 1)
rate |
Rate parameter (exponential and gamma) |
t |
Vector of times to which restricted mean survival time is evaluated |
start |
Optional left-truncation time or times. The returned restricted mean survival will be conditioned on survival up to this time. |
shape |
Shape parameter (Weibull, gamma, log-logistic, generalized gamma [orig], generalized F [orig]) |
mu |
Mean on the log scale (generalized gamma, generalized F) |
sigma |
Standard deviation on the log scale (generalized gamma, generalized F) |
Q |
Vector of first shape parameters (generalized gamma, generalized F) |
P |
Vector of second shape parameters (generalized F) |
s1 |
Vector of first F shape parameters (generalized F [orig]) |
s2 |
vector of second F shape parameters (generalized F [orig]) |
scale |
Scale parameter (Weibull, log-logistic, generalized gamma [orig], generalized F [orig]) |
k |
vector of shape parameters (generalized gamma [orig]). |
meanlog |
Mean on the log scale (log normal) |
sdlog |
Standard deviation on the log scale (log normal) |
For the exponential, Weibull, log-logistic, lognormal, and gamma, mean survival is provided analytically. Restricted mean survival for the exponential distribution is also provided analytically. Mean and restricted means for other distributions are calculated via numeric integration.
mean survival (functions beginning 'mean') or restricted mean survival (functions beginning 'rmst_').
Christopher Jackson <[email protected]>
dexp
,dweibull
,dgamma
,dlnorm
,dgompertz
,dgengamma
,dgenf
This returns the mean of each event-specific parametric time-to-event distribution in the mixture model, which is the mean time to event conditionally on that event being the one that happens.
mean_flexsurvmix(x, newdata = NULL, B = NULL)
mean_flexsurvmix(x, newdata = NULL, B = NULL)
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
Mean times to next event conditionally on each alternative event, given the specified covariate values.
Calculate the mean time from the start of the process to a final (or "absorbing") state in a mixture multi-state model. Models with cycles are not supported.
meanfinal_fmixmsm(x, newdata = NULL, final = FALSE, B = NULL)
meanfinal_fmixmsm(x, newdata = NULL, final = FALSE, B = NULL)
x |
Object returned by |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
final |
If |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
A data frame of mean times to absorption, by covariate values and pathway (or by final state)
Returns a list of data frames, with each component containing the data that were used for the model fit for that mixture component.
## S3 method for class 'flexsurvmix' model.frame(formula, ...)
## S3 method for class 'flexsurvmix' model.frame(formula, ...)
formula |
Fitted model object from |
... |
Further arguments (currently unused). |
A list of data frames
flexsurvreg
objects.Extract the data from a model fitted with flexsurvreg
.
## S3 method for class 'flexsurvreg' model.frame(formula, ...) ## S3 method for class 'flexsurvreg' model.matrix(object, par = NULL, ...)
## S3 method for class 'flexsurvreg' model.frame(formula, ...) ## S3 method for class 'flexsurvreg' model.matrix(object, par = NULL, ...)
formula |
A fitted model object, as returned by
|
... |
Further arguments (not used). |
object |
A fitted model object, as returned by
|
par |
String naming the parameter whose linear model matrix is desired. The default value of |
model.frame
returns a data frame with all the original
variables used for the model fit.
model.matrix
returns a design matrix for a part of the model that
includes covariates. The required part is indicated by the "par"
argument (see above).
C. H. Jackson [email protected]
flexsurvreg
, model.frame
,
model.matrix
.
Cumulative transition-specific intensity/hazard functions for fully-parametric multi-state or competing risks models, using a piecewise-constant approximation that will allow prediction using the functions in the mstate package.
msfit.flexsurvreg( object, t, newdata = NULL, variance = TRUE, tvar = "trans", trans, B = 1000 )
msfit.flexsurvreg( object, t, newdata = NULL, variance = TRUE, tvar = "trans", trans, B = 1000 )
object |
Output from The model should have been fitted to data consisting of one row for each observed transition and additional rows corresponding to censored times to competing transitions. This is the "long" format, or counting process format, as explained in the flexsurv vignette. The model should contain a categorical covariate indicating the transition.
In Alternatively, if the parameters (including covariate effects) are assumed to be different between different transitions, then a list of transition-specific models can be formed. This list has one component for each permitted transition in the multi-state model. This is more computationally efficient, particularly for larger models and datasets. See the example below, and the vignette. |
t |
Vector of times. These do not need to be the same as the observed
event times, and since the model is parametric, they can be outside the
range of the data. A grid of more frequent times will provide a better
approximation to the cumulative hazard trajectory for prediction with
|
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. This must be specified if
there are other covariates. The variable names should be the same as those
in the fitted model formula. There must be either one value per covariate
(the typical situation) or |
variance |
Calculate the variances and covariances of the transition
cumulative hazards ( |
tvar |
Name of the categorical variable in the model formula that
represents the transition number. This should have been defined as a factor,
with factor levels that
correspond to elements of |
trans |
Matrix indicating allowed transitions in the multi-state
model, in the format understood by mstate: a matrix of integers whose
|
B |
Number of simulations from the normal asymptotic distribution used to calculate variances. Decrease for greater speed at the expense of accuracy. |
An object of class "msfit"
, in the same form as the objects
used in the mstate package. The msfit
method
from mstate returns the equivalent cumulative intensities for Cox
regression models fitted with coxph
.
C. H. Jackson [email protected]
Liesbeth C. de Wreede, Marta Fiocco, Hein Putter (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, 38(7), 1-30. doi:10.18637/jss.v038.i07
Mandel, M. (2013). "Simulation based confidence intervals for functions with complicated derivatives." The American Statistician 67(2):76-81
flexsurv provides alternative functions designed
specifically for predicting from parametric multi-state models without
calling mstate. These include pmatrix.fs
and
pmatrix.simfs
for the transition probability matrix, and
totlos.fs
and totlos.simfs
for expected total
lengths of stay in states. These are generally more efficient than going
via mstate.
## 3 state illness-death model for bronchiolitis obliterans ## Compare clock-reset / semi-Markov multi-state models ## Simple exponential model (reduces to Markov) bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) mexp <- msfit.flexsurvreg(bexp, t=seq(0,12,by=0.1), trans=tmat, tvar="trans", variance=FALSE) ## Cox semi-parametric model within each transition bcox <- coxph(Surv(years, status) ~ strata(trans), data=bosms3) if (require("mstate")){ mcox <- mstate::msfit(bcox, trans=tmat) ## Flexible parametric spline-based model bspl <- flexsurvspline(Surv(years, status) ~ trans + gamma1(trans), data=bosms3, k=3) mspl <- msfit.flexsurvreg(bspl, t=seq(0,12,by=0.1), trans=tmat, tvar="trans", variance=FALSE) ## Compare fit: exponential model is OK but the spline is better plot(mcox, lwd=1, xlim=c(0, 12), ylim=c(0,4)) cols <- c("black","red","green") for (i in 1:3){ lines(mexp$Haz$time[mexp$Haz$trans==i], mexp$Haz$Haz[mexp$Haz$trans==i], col=cols[i], lwd=2, lty=2) lines(mspl$Haz$time[mspl$Haz$trans==i], mspl$Haz$Haz[mspl$Haz$trans==i], col=cols[i], lwd=3) } legend("topright", lwd=c(1,2,3), lty=c(1,2,1), c("Cox", "Exponential", "Flexible parametric"), bty="n") } ## Fit a list of models, one for each transition ## More computationally efficient, but only valid if parameters ## are different between transitions. ## Not run: bexp.list <- vector(3, mode="list") for (i in 1:3) { bexp.list[[i]] <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==i), data=bosms3, dist="exp") } ## The list of models can be passed to this and other functions, ## as if it were a single multi-state model. msfit.flexsurvreg(bexp.list, t=seq(0,12,by=0.1), trans=tmat) ## End(Not run)
## 3 state illness-death model for bronchiolitis obliterans ## Compare clock-reset / semi-Markov multi-state models ## Simple exponential model (reduces to Markov) bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) mexp <- msfit.flexsurvreg(bexp, t=seq(0,12,by=0.1), trans=tmat, tvar="trans", variance=FALSE) ## Cox semi-parametric model within each transition bcox <- coxph(Surv(years, status) ~ strata(trans), data=bosms3) if (require("mstate")){ mcox <- mstate::msfit(bcox, trans=tmat) ## Flexible parametric spline-based model bspl <- flexsurvspline(Surv(years, status) ~ trans + gamma1(trans), data=bosms3, k=3) mspl <- msfit.flexsurvreg(bspl, t=seq(0,12,by=0.1), trans=tmat, tvar="trans", variance=FALSE) ## Compare fit: exponential model is OK but the spline is better plot(mcox, lwd=1, xlim=c(0, 12), ylim=c(0,4)) cols <- c("black","red","green") for (i in 1:3){ lines(mexp$Haz$time[mexp$Haz$trans==i], mexp$Haz$Haz[mexp$Haz$trans==i], col=cols[i], lwd=2, lty=2) lines(mspl$Haz$time[mspl$Haz$trans==i], mspl$Haz$Haz[mspl$Haz$trans==i], col=cols[i], lwd=3) } legend("topright", lwd=c(1,2,3), lty=c(1,2,1), c("Cox", "Exponential", "Flexible parametric"), bty="n") } ## Fit a list of models, one for each transition ## More computationally efficient, but only valid if parameters ## are different between transitions. ## Not run: bexp.list <- vector(3, mode="list") for (i in 1:3) { bexp.list[[i]] <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==i), data=bosms3, dist="exp") } ## The list of models can be passed to this and other functions, ## as if it were a single multi-state model. msfit.flexsurvreg(bexp.list, t=seq(0,12,by=0.1), trans=tmat) ## End(Not run)
Number of observations contributing to a fitted flexible survival model
## S3 method for class 'flexsurvreg' nobs(object, cens = TRUE, ...)
## S3 method for class 'flexsurvreg' nobs(object, cens = TRUE, ...)
object |
Output from |
cens |
Include censored observations in the number. |
... |
Further arguments passed to or from other methods. Currently unused. |
By default, this matches the behaviour of the nobs
method for survreg
objects, including both censored and uncensored observations.
If a weighted flexsurvreg
analysis was done, then this function returns the sum of the weights.
This returns the mod$N
component of the fitted
model object mod
. See flexsurvreg
,
flexsurvspline
for full documentation of all components.
C. H. Jackson [email protected]
Produce a matrix of alternative parameter estimates under sampling
uncertainty, at covariate values supplied by the user. Used by
summary.flexsurvreg
for obtaining confidence intervals around
functions of parameters.
normboot.flexsurvreg( x, B, newdata = NULL, X = NULL, transform = FALSE, raw = FALSE, tidy = FALSE, rawsim = NULL )
normboot.flexsurvreg( x, B, newdata = NULL, X = NULL, transform = FALSE, raw = FALSE, tidy = FALSE, rawsim = NULL )
x |
A fitted model from |
B |
Number of samples. |
newdata |
Data frame or list containing the covariate values to
evaluate the parameters at. If there are covariates in the model, at least
one of |
X |
Alternative (less convenient) format for covariate values: a
matrix with one row, with one column for each covariate or factor contrast.
Formed from all the "model matrices", one for each named parameter of the
distribution, with intercepts excluded, |
transform |
|
raw |
Return samples of the baseline parameters and the covariate effects, rather than the default of adjusting the baseline parameters for covariates. |
tidy |
If |
rawsim |
allows input of raw samples from a previous run of
|
If newdata
includes only one covariate combination, a matrix
will be returned with B
rows, and one column for each named
parameter of the survival distribution.
If more than one covariate combination is requested (e.g. newdata
is
a data frame with more than one row), then a list of matrices will be
returned, one for each covariate combination.
C. H. Jackson [email protected]
Mandel, M. (2013). "Simulation based confidence intervals for functions with complicated derivatives." The American Statistician (in press).
fite <- flexsurvreg(Surv(futime, fustat) ~ age, data = ovarian, dist="exp") normboot.flexsurvreg(fite, B=10, newdata=list(age=50)) normboot.flexsurvreg(fite, B=10, X=matrix(50,nrow=1)) normboot.flexsurvreg(fite, B=10, newdata=list(age=0)) ## closer to... fite$res
fite <- flexsurvreg(Surv(futime, fustat) ~ age, data = ovarian, dist="exp") normboot.flexsurvreg(fite, B=10, newdata=list(age=50)) normboot.flexsurvreg(fite, B=10, X=matrix(50,nrow=1)) normboot.flexsurvreg(fite, B=10, newdata=list(age=0)) ## closer to... fite$res
These quantities are variously known as transition probabilities, or state
occupancy probabilities, or values of the "cumulative incidence" function,
or values of the "subdistribution" function. They are the probabilities that
an individual has experienced an event of a particular kind by time
t
.
p_flexsurvmix(x, newdata = NULL, startname = "start", t = 1, B = NULL)
p_flexsurvmix(x, newdata = NULL, startname = "start", t = 1, B = NULL)
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
startname |
Name of the state where individuals start. This considers the model as a multi-state model where people start in this state, and may transition to one of the competing events. |
t |
Vector of times |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
Note that "cumulative incidence" is a misnomer, as "incidence" typically means a hazard, and the quantities computed here are not cumulative hazards, but probabilities.
A data frame with transition probabilities by time, covariate value and destination state.
List of maximum likelihood estimates of transition-specific parameters in a flexible parametric multi-state model, at given covariate values.
pars.fmsm(x, trans, newdata = NULL, tvar = "trans")
pars.fmsm(x, trans, newdata = NULL, tvar = "trans")
x |
A multi-state model fitted with
|
trans |
Matrix indicating allowed transitions. See
|
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. See
|
tvar |
Variable in the data representing the transition type. Not
required if |
A list with one component for each permitted transition. Each component has one element for each parameter of the parametric distribution that generates the corresponding event in the multi-state model.
Christopher Jackson [email protected].
This returns an estimate of the probability density for the time to each competing event, at a vector of times supplied by the user.
pdf_flexsurvmix(x, newdata = NULL, t = NULL)
pdf_flexsurvmix(x, newdata = NULL, t = NULL)
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
t |
Vector of times at which to evaluate the probability density |
A data frame with each row giving the fitted density dens
for
a combination of covariate values, time and competing event.
This requires the model to be Markov, and is not valid for semi-Markov
models, as it works by wrapping pmatrix.fs
to calculate the
transition probability over a very large time. As it also works on a
fmsm
object formed from transition-specific time-to-event models,
it therefore only works on competing risks models, defined by just one starting
state with multiple destination states representing competing events.
For these models, this function returns the probability governing which
competing event happens next. However this function simply wraps pmatrix.fs
,
so for other models, pmatrix.fs
or pmatrix.simfs
can be used with a
large forecast time t
.
pfinal_fmsm(x, newdata = NULL, fromstate, maxt = 1e+05, B = 0, cores = NULL)
pfinal_fmsm(x, newdata = NULL, fromstate, maxt = 1e+05, B = 0, cores = NULL)
x |
Object returned by |
newdata |
Data frame of covariate values, with one column per covariate, and one row per alternative value. |
fromstate |
State from which to calculate the transition probability
state. This should refer to the name of a row of the transition matrix
|
maxt |
Large time to use for forecasting final state probabilities.
The transition probability from zero to this time is used. Note
|
B |
Number of simulations to use to calculate 95% confidence intervals
based on the asymptotic normal distribution of the basic parameter
estimates. If |
cores |
Number of processor cores to use. If |
A data frame with one row per covariate value and destination state,
giving the state in column state
, and probability in column
val
. Additional columns lower
and upper
for the
confidence limits are returned if B=0
.
Plot fitted survival, cumulative hazard or hazard from a parametric model against nonparametric estimates to diagnose goodness-of-fit. Alternatively plot a user-defined function of the model parameters against time.
## S3 method for class 'flexsurvreg' plot( x, newdata = NULL, X = NULL, type = "survival", fn = NULL, t = NULL, start = 0, est = TRUE, ci = NULL, B = 1000, cl = 0.95, col.obs = "black", lty.obs = 1, lwd.obs = 1, col = "red", lty = 1, lwd = 2, col.ci = NULL, lty.ci = 2, lwd.ci = 1, ylim = NULL, add = FALSE, ... )
## S3 method for class 'flexsurvreg' plot( x, newdata = NULL, X = NULL, type = "survival", fn = NULL, t = NULL, start = 0, est = TRUE, ci = NULL, B = 1000, cl = 0.95, col.obs = "black", lty.obs = 1, lwd.obs = 1, col = "red", lty = 1, lwd = 2, col.ci = NULL, lty.ci = 2, lwd.ci = 1, ylim = NULL, add = FALSE, ... )
x |
Output from |
newdata |
Data frame containing covariate values to produce fitted
values for. See If there are only factor covariates in the model, then Kaplan-Meier (or
nonparametric hazard...) curves are plotted for all distinct groups, and
by default, fitted curves are also plotted for these groups. To plot
Kaplan-Meier and fitted curves for only a subset of groups, use
If there are any continuous covariates, then a single population Kaplan-Meier curve is drawn. By default, a single fitted curve is drawn with the covariates set to their mean values in the data - for categorical covariates, the means of the 0/1 indicator variables are taken. |
X |
Alternative way to supply covariate values, as a model matrix.
See |
type |
Ignored if |
fn |
Custom function of the parameters to summarise against time. The
first two arguments of the function must be |
t |
Vector of times to plot fitted values for, see
|
start |
Left-truncation points, see |
est |
Plot fitted curves ( |
ci |
Plot confidence intervals for fitted curves. By default, this is
|
B |
Number of simulations controlling accuracy of confidence
intervals, as used in |
cl |
Width of confidence intervals, by default 0.95 for 95% intervals. |
col.obs |
Colour of the nonparametric curve. |
lty.obs |
Line type of the nonparametric curve. |
lwd.obs |
Line width of the nonparametric curve. |
col |
Colour of the fitted parametric curve(s). |
lty |
Line type of the fitted parametric curve(s). |
lwd |
Line width of the fitted parametric curve(s). |
col.ci |
Colour of the fitted confidence limits, defaulting to the same as for the fitted curve. |
lty.ci |
Line type of the fitted confidence limits. |
lwd.ci |
Line width of the fitted confidence limits. |
ylim |
y-axis limits: vector of two elements. |
add |
If |
... |
Other options to be passed to |
Some standard plot arguments such as "xlim","xlab"
may not
work. This function was designed as a quick check of model fit. Users
wanting publication-quality graphs are advised to set up an empty plot with
the desired axes first (e.g. with plot(...,type="n",...)
), then use
suitable lines
functions to add lines.
If case weights were used to fit the model, these are used when producing nonparametric estimates of survival and cumulative hazard, but not for the hazard estimates.
C. H. Jackson [email protected]
Plot standardized metrics such as the marginal survival, restricted mean survival and hazard, based on a fitted flexsurv model.
## S3 method for class 'standsurv' plot(x, contrast = FALSE, ci = FALSE, expected = FALSE, ...)
## S3 method for class 'standsurv' plot(x, contrast = FALSE, ci = FALSE, expected = FALSE, ...)
x |
A standsurv object returned by |
contrast |
Should contrasts of standardized metrics be plotted. Defaults to FALSE |
ci |
Should confidence intervals be plotted (if calculated in
|
expected |
Should the marginal expected survival / hazard also be
plotted? This can only be invoked if |
... |
Not currently used |
A ggplot showing the standardized metric calculated by
standsurv
over time. Modification of the plot is
possible by adding further ggplot objects, see Examples.
## Use bc dataset, with an age variable appended ## mean age is higher in those with smaller observed survival times newbc <- bc newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE), sd = 5) ## Fit a Weibull flexsurv model with group and age as covariates weib_age <- flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist="weibull") ## Calculate standardized survival and the difference in standardized survival ## for the three levels of group across a grid of survival times standsurv_weib_age <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length=100), contrast = "difference", ci=TRUE, boot = TRUE, B=10, seed=123) plot(standsurv_weib_age) plot(standsurv_weib_age) + ggplot2::theme_bw() + ggplot2::ylab("Survival") + ggplot2::xlab("Time (years)") + ggplot2::guides(color=ggplot2::guide_legend(title="Prognosis"), fill=ggplot2::guide_legend(title="Prognosis")) plot(standsurv_weib_age, contrast=TRUE, ci=TRUE) + ggplot2::ylab("Difference in survival")
## Use bc dataset, with an age variable appended ## mean age is higher in those with smaller observed survival times newbc <- bc newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE), sd = 5) ## Fit a Weibull flexsurv model with group and age as covariates weib_age <- flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist="weibull") ## Calculate standardized survival and the difference in standardized survival ## for the three levels of group across a grid of survival times standsurv_weib_age <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length=100), contrast = "difference", ci=TRUE, boot = TRUE, B=10, seed=123) plot(standsurv_weib_age) plot(standsurv_weib_age) + ggplot2::theme_bw() + ggplot2::ylab("Survival") + ggplot2::xlab("Time (years)") + ggplot2::guides(color=ggplot2::guide_legend(title="Prognosis"), fill=ggplot2::guide_legend(title="Prognosis")) plot(standsurv_weib_age, contrast=TRUE, ci=TRUE) + ggplot2::ylab("Difference in survival")
plot.survrtrunc
creates a new plot, while lines.survrtrunc
adds lines to an exising plot.
## S3 method for class 'survrtrunc' plot(x, ...) ## S3 method for class 'survrtrunc' lines(x, ...)
## S3 method for class 'survrtrunc' plot(x, ...) ## S3 method for class 'survrtrunc' lines(x, ...)
x |
Object of class |
... |
Other arguments to be passed to |
The transition probability matrix for time-inhomogeneous Markov multi-state
models fitted to time-to-event data with flexsurvreg
. This
has entry giving the probability that an individual is in state
at time
, given they are in state
at time
.
pmatrix.fs( x, trans = NULL, t = 1, newdata = NULL, condstates = NULL, ci = FALSE, tvar = "trans", sing.inf = 1e+10, B = 1000, cl = 0.95, tidy = FALSE, ... )
pmatrix.fs( x, trans = NULL, t = 1, newdata = NULL, condstates = NULL, ci = FALSE, tvar = "trans", sing.inf = 1e+10, B = 1000, cl = 0.95, tidy = FALSE, ... )
x |
A model fitted with
|
trans |
Matrix indicating allowed transitions. See
|
t |
Time or vector of times to predict state occupancy probabilities for. |
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. See
|
condstates |
xInstead of the unconditional probability of being in state This is used, for example, in competing risks situations, e.g. if the competing states are death or recovery from a disease, and we want to compute the probability a patient has died, given they have died or recovered. If these are absorbing states, then as |
ci |
Return a confidence interval calculated by simulating from the
asymptotic normal distribution of the maximum likelihood estimates. Turned
off by default, since this is computationally intensive. If turned on,
users should increase |
tvar |
Variable in the data representing the transition type. Not
required if |
sing.inf |
If there is a singularity in the observed hazard, for
example a Weibull distribution with |
B |
Number of simulations from the normal asymptotic distribution used to calculate variances. Decrease for greater speed at the expense of accuracy. |
cl |
Width of symmetric confidence intervals, relative to 1. |
tidy |
If TRUE then return the results as a tidy data frame |
... |
Arguments passed to |
This is computed by solving the Kolmogorov forward differential equation numerically, using the methods in the deSolve package. The equation is
where is the transition probability matrix for time
, and
is the transition hazard or intensity as a function of
.
The initial condition is
.
Note that the package msm has a similar method pmatrix.msm
.
pmatrix.fs
should give the same results as pmatrix.msm
when
both of these conditions hold:
the time-to-event distribution is exponential for all
transitions, thus the flexsurvreg
model was fitted with
dist="exp"
and the model is time-homogeneous.
the msm
model was fitted with exacttimes=TRUE
, thus all the event times are
known, and there are no time-dependent covariates.
msm only allows exponential or piecewise-exponential time-to-event distributions, while flexsurvreg allows more flexible models. msm however was designed in particular for panel data, where the process is observed only at arbitrary times, thus the times of transition are unknown, which makes flexible models difficult.
This function is only valid for Markov ("clock-forward") multi-state
models, though no warning or error is currently given if the model is not
Markov. See pmatrix.simfs
for the equivalent for semi-Markov
("clock-reset") models.
The transition probability matrix, if t
is of length 1. If t
is longer, return a list of matrices, or a data frame if tidy
is TRUE.
If ci=TRUE
, each element has attributes "lower"
and
"upper"
giving matrices of the corresponding confidence limits.
These are formatted for printing but may be extracted using attr()
.
Christopher Jackson [email protected].
pmatrix.simfs
, totlos.fs
,
msfit.flexsurvreg
.
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # more likely to be dead (state 3) as time moves on, or if start with # BOS (state 2) pmatrix.fs(bexp, t=c(5,10), trans=tmat)
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # more likely to be dead (state 3) as time moves on, or if start with # BOS (state 2) pmatrix.fs(bexp, t=c(5,10), trans=tmat)
The transition probability matrix for semi-Markov multi-state models fitted
to time-to-event data with flexsurvreg
. This has
entry giving the probability that an individual is in state
at time
, given they are in state
at time
.
pmatrix.simfs( x, trans, t = 1, newdata = NULL, ci = FALSE, tvar = "trans", tcovs = NULL, M = 1e+05, B = 1000, cl = 0.95, cores = NULL, tidy = FALSE )
pmatrix.simfs( x, trans, t = 1, newdata = NULL, ci = FALSE, tvar = "trans", tcovs = NULL, M = 1e+05, B = 1000, cl = 0.95, cores = NULL, tidy = FALSE )
x |
A model fitted with
|
trans |
Matrix indicating allowed transitions. See
|
t |
Time to predict state occupancy probabilities for. This can be a single number or a vector of different numbers. |
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. See
|
ci |
Return a confidence interval calculated by simulating from the
asymptotic normal distribution of the maximum likelihood estimates. This
is turned off by default, since two levels of simulation are required. If
turned on, users should adjust |
tvar |
Variable in the data representing the transition type. Not
required if |
tcovs |
Predictable time-dependent covariates such as age, see
|
M |
Number of individuals to simulate in order to approximate the transition probabilities. Users should adjust this to obtain the required precision. |
B |
Number of simulations from the normal asymptotic distribution used to calculate confidence limits. Decrease for greater speed at the expense of accuracy. |
cl |
Width of symmetric confidence intervals, relative to 1. |
cores |
Number of processor cores used when calculating confidence limits by repeated simulation. The default uses single-core processing. |
tidy |
If |
This is computed by simulating a large number of individuals M
using
the maximum likelihood estimates of the fitted model and the function
sim.fmsm
. Therefore this requires a random sampling function
for the parametric survival model to be available: see the "Details"
section of sim.fmsm
. This will be available for all built-in
distributions, though users may need to write this for custom models.
Note the random sampling method for flexsurvspline
models is
currently very inefficient, so that looping over the M
individuals
will be very slow.
pmatrix.fs
is a more efficient method based on solving the
Kolmogorov forward equation numerically, which requires the multi-state
model to be Markov. No error or warning is given if running
pmatrix.simfs
with a Markov model, but this is still invalid.
The transition probability matrix. If ci=TRUE
, there are
attributes "lower"
and "upper"
giving matrices of the
corresponding confidence limits. These are formatted for printing but may
be extracted using attr()
.
Christopher Jackson [email protected].
pmatrix.fs
,sim.fmsm
,totlos.simfs
,
msfit.flexsurvreg
.
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # more likely to be dead (state 3) as time moves on, or if start with # BOS (state 2) pmatrix.simfs(bexp, t=5, trans=tmat) pmatrix.simfs(bexp, t=10, trans=tmat) # these results should converge to those in help(pmatrix.fs), as M # increases here and ODE solving precision increases there, since with # an exponential distribution, the semi-Markov model is the same as the # Markov model.
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # more likely to be dead (state 3) as time moves on, or if start with # BOS (state 2) pmatrix.simfs(bexp, t=5, trans=tmat) pmatrix.simfs(bexp, t=10, trans=tmat) # these results should converge to those in help(pmatrix.fs), as M # increases here and ODE solving precision increases there, since with # an exponential distribution, the semi-Markov model is the same as the # Markov model.
Probability of each pathway taken through a mixture multi-state model
ppath_fmixmsm(x, newdata = NULL, final = FALSE, B = NULL)
ppath_fmixmsm(x, newdata = NULL, final = FALSE, B = NULL)
x |
Object returned by |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
final |
If |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
Data frame of pathway probabilities by covariate value and pathway.
Predict outcomes from flexible survival models at the covariate values
specified in newdata
.
## S3 method for class 'flexsurvreg' predict( object, newdata, type = "response", times, start = 0, conf.int = FALSE, conf.level = 0.95, se.fit = FALSE, p = c(0.1, 0.9), B = 1000, ... )
## S3 method for class 'flexsurvreg' predict( object, newdata, type = "response", times, start = 0, conf.int = FALSE, conf.level = 0.95, se.fit = FALSE, p = c(0.1, 0.9), B = 1000, ... )
object |
Output from |
newdata |
Data frame containing covariate values at which to produce
fitted values. There must be a column for every covariate in the model
formula used to fit If |
type |
Character vector for the type of predictions desired.
|
times |
Vector of time horizons at which to compute fitted values.
Only applies when If not specified, predictions for For |
start |
Optional left-truncation time or times. The returned
survival, hazard, or cumulative hazard will be conditioned on survival up
to this time. |
conf.int |
Logical. Should confidence intervals be returned?
Default is |
conf.level |
Width of symmetric confidence intervals, relative to 1. |
se.fit |
Logical. Should standard errors of fitted values be returned?
Default is |
p |
Vector of quantiles at which to return fitted values when
|
B |
Number of simulations from the normal asymptotic distribution of
the estimates used to calculate confidence intervals or standard errors.
Decrease for greater
speed at the expense of accuracy, or set |
... |
Not currently used. |
A tibble
with same number of rows as
newdata
and in the same order. If multiple predictions are
requested, a tibble
containing a single list-column
of data frames.
For the list-column of data frames - the dimensions of each data frame
will be identical. Rows are added for each value of times
or
p
requested.
This function is a wrapper around summary.flexsurvreg
,
designed to help flexsurv to integrate with the "tidymodels"
ecosystem, in particular the censored package.
summary.flexsurvreg
returns the same results but in a more
conventional format.
Matthew T. Warkentin (https://github.com/mattwarkentin)
summary.flexsurvreg
,
residuals.flexsurvreg
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") ## Simplest prediction: mean or median, for covariates defined by original dataset predict(fitg) predict(fitg, type = "quantile", p = 0.5) ## Simple prediction for user-defined covariate values predict(fitg, newdata = data.frame(age = c(40, 50, 60))) predict(fitg, type = "quantile", p = 0.5, newdata = data.frame(age = c(40,50,60))) ## Predict multiple quantiles and unnest require(tidyr) pr <- predict(fitg, type = "survival", times = c(600, 800)) tidyr::unnest(pr, .pred)
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") ## Simplest prediction: mean or median, for covariates defined by original dataset predict(fitg) predict(fitg, type = "quantile", p = 0.5) ## Simple prediction for user-defined covariate values predict(fitg, newdata = data.frame(age = c(40, 50, 60))) predict(fitg, type = "quantile", p = 0.5, newdata = data.frame(age = c(40,50,60))) ## Predict multiple quantiles and unnest require(tidyr) pr <- predict(fitg, type = "survival", times = c(600, 800)) tidyr::unnest(pr, .pred)
Probabilities of competing events from a flexsurvmix model
probs_flexsurvmix(x, newdata = NULL, B = NULL)
probs_flexsurvmix(x, newdata = NULL, B = NULL)
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
A data frame containing the probability that each of the competing
events will occur next, by event and by any covariate values specified in
newdata
.
Calculate the quantiles of the time from the start of the process to each possible final (or "absorbing") state in a mixture multi-state model. Models with cycles are not supported.
qfinal_fmixmsm( x, newdata = NULL, final = FALSE, B = NULL, n = 10000, probs = c(0.025, 0.5, 0.975) )
qfinal_fmixmsm( x, newdata = NULL, final = FALSE, B = NULL, n = 10000, probs = c(0.025, 0.5, 0.975) )
x |
Object returned by |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
final |
If |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
n |
Number of individual-level simulations to use to characterise the time-to-event distributions |
probs |
Quantiles to calculate, by default, |
Data frame of quantiles of the time to final state by pathway and covariate value, or by final state and covariate value.
Generic function to find the quantiles of a distribution, given the equivalent probability distribution function.
qgeneric(pdist, p, matargs = NULL, scalarargs = NULL, ...)
qgeneric(pdist, p, matargs = NULL, scalarargs = NULL, ...)
pdist |
Probability distribution function, for example,
|
p |
Vector of probabilities to find the quantiles for. |
matargs |
Character vector giving the elements of |
scalarargs |
Character vector naming scalar arguments of the distribution function that cannot be vectorised. This is used for the arguments |
... |
The remaining arguments define parameters of the distribution
This may also contain the standard arguments If the distribution is bounded above or below, then this should contain
arguments |
This function is used by default for custom distributions for which a quantile function is not provided.
It works by finding the root of the equation .
Starting from the interval
, the interval width is expanded by
50% until
is of opposite sign at either end. The root is then
found using
uniroot
.
This assumes a suitably smooth, continuous distribution.
Vector of quantiles of the distribution at p
.
Christopher Jackson <[email protected]>
qnorm(c(0.025, 0.975), 0, 1) qgeneric(pnorm, c(0.025, 0.975), mean=0, sd=1) # must name the arguments
qnorm(c(0.025, 0.975), 0, 1) qgeneric(pnorm, c(0.025, 0.975), mean=0, sd=1) # must name the arguments
This returns the quantiles of each event-specific parametric time-to-event distribution in the mixture model, which describes the time to the event conditionally on that event being the one that happens.
quantile_flexsurvmix(x, newdata = NULL, B = NULL, probs = c(0.025, 0.5, 0.975))
quantile_flexsurvmix(x, newdata = NULL, B = NULL, probs = c(0.025, 0.5, 0.975))
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
probs |
Vector of alternative quantiles, by default |
Calculates residuals for flexsurvreg
or flexsurvspline
model fits.
## S3 method for class 'flexsurvreg' residuals(object, type = "response", ...)
## S3 method for class 'flexsurvreg' residuals(object, type = "response", ...)
object |
Output from |
type |
Character string for the type of residual desired. Currently only |
... |
Not currently used. |
Residuals of type = "response"
are calculated as the naive difference between the observed survival and the covariate-specific predicted mean survival from predict.flexsurvreg
, ignoring whether the event time is observed or censored.
type="coxsnell"
returns the Cox-Snell residual, defined as the estimated cumulative hazard at each data point. To check the fit of the
A more fully featured utility for this is provided in the function coxsnell_flexsurvreg
.
Numeric vector with the same length as nobs(object)
.
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") residuals(fitg, type="response")
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") residuals(fitg, type="response")
This returns the restricted mean of each event-specific parametric time-to-event
distribution in the mixture model, which is the mean time to event
conditionally on that event being the one that happens, and conditionally
on the event time being less than some time horizon tot
.
rmst_flexsurvmix(x, newdata = NULL, tot = Inf, B = NULL)
rmst_flexsurvmix(x, newdata = NULL, tot = Inf, B = NULL)
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
tot |
Time horizon to compute the restricted mean until. |
B |
Number of simulations to use to compute 95% confidence intervals,
based on the asymptotic multivariate normal distribution of the basic
parameter estimates. If |
Restricted mean times to next event conditionally on each alternative event, given the specified covariate values.
Generic function to find the restricted mean of a distribution, given the equivalent probability distribution function, using numeric integration.
rmst_generic(pdist, t, start = 0, matargs = NULL, scalarargs = NULL, ...)
rmst_generic(pdist, t, start = 0, matargs = NULL, scalarargs = NULL, ...)
pdist |
Probability distribution function, for example,
|
t |
Vector of times at which rmst is evaluated |
start |
Optional left-truncation time or times. The returned restricted mean survival will be conditioned on survival up to this time. |
matargs |
Character vector giving the elements of |
scalarargs |
Character vector naming scalar arguments of the distribution function that cannot be vectorised. This is used, for example, for the arguments |
... |
The remaining arguments define parameters of the distribution
|
This function is used by default for custom distributions for which an
rmst
function is not provided.
This assumes a suitably smooth, continuous distribution.
Vector of restricted mean survival times of the distribution at
p
.
Christopher Jackson <[email protected]>
rmst_lnorm(500, start=250, meanlog=7.4225, sdlog = 1.1138) rmst_generic(plnorm, 500, start=250, meanlog=7.4225, sdlog = 1.1138) # must name the arguments
rmst_lnorm(500, start=250, meanlog=7.4225, sdlog = 1.1138) rmst_generic(plnorm, 500, start=250, meanlog=7.4225, sdlog = 1.1138) # must name the arguments
Simulate changes of state and transition times from a semi-Markov
multi-state model fitted using flexsurvreg
.
sim.fmsm( x, trans = NULL, t, newdata = NULL, start = 1, M = 10, tvar = "trans", tcovs = NULL, tidy = FALSE )
sim.fmsm( x, trans = NULL, t, newdata = NULL, start = 1, M = 10, tvar = "trans", tcovs = NULL, tidy = FALSE )
x |
A model fitted with Alternatively |
trans |
Matrix indicating allowed transitions. See
|
t |
Time, or vector of times for each of the |
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. See
|
start |
Starting state, or vector of starting states for each of the
|
M |
Number of individual trajectories to simulate. |
tvar |
Variable in the data representing the transition type. Not
required if |
tcovs |
Names of "predictable" time-dependent covariates in
|
tidy |
If |
sim.fmsm
relies on the presence of a function to sample random
numbers from the parametric survival distribution used in the fitted model
x
, for example rweibull
for Weibull models. If
x
was fitted using a custom distribution, called dist
say,
then there must be a function called (something like) rdist
either
in the working environment, or supplied through the dfns
argument to
flexsurvreg
. This must be in the same format as standard R
functions such as rweibull
, with first argument n
, and
remaining arguments giving the parameters of the distribution. It must be
vectorised with respect to the parameter arguments.
This function is only valid for semi-Markov ("clock-reset") models, though no warning or error is currently given if the model is not of this type. An equivalent for time-inhomogeneous Markov ("clock-forward") models has currently not been implemented.
If tidy=TRUE
, a data frame with one row for each simulated transition, giving the individual ID id
, start state start
, end state end
, transition label trans
, time of the transition since the start of the process (time
), and time since the previous transition (delay
).
If tidy=FALSE
, a list of two matrices named st
and t
. The rows of
each matrix represent simulated individuals. The columns of t
contain the times when the individual changes state, to the corresponding
states in st
.
The first columns will always contain the starting states and the starting
times. The last column of t
represents either the time when the
individual moves to an absorbing state, or right-censoring in a transient
state at the time given in the t
argument to sim.fmsm
.
Christopher Jackson [email protected].
bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) sim.fmsm(bexp, M=10, t=5, trans=tmat)
bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) sim.fmsm(bexp, M=10, t=5, trans=tmat)
Estimates the probability of each final outcome ("absorbing" state), and the mean and quantiles of the time to that outcome for people who experience it, by simulating a large sample of individuals from the model. This can be used for both Markov and semi-Markov models.
simfinal_fmsm( x, newdata = NULL, probs = c(0.025, 0.5, 0.975), t = 1000, M = 1e+05, B = 0, cores = NULL )
simfinal_fmsm( x, newdata = NULL, probs = c(0.025, 0.5, 0.975), t = 1000, M = 1e+05, B = 0, cores = NULL )
x |
Object returned by |
newdata |
Data frame of covariate values, with one column per covariate, and one row per alternative value. |
probs |
Quantiles to calculate, by default, |
t |
Maximum time to simulate to, passed to |
M |
Number of individuals to simulate. |
B |
Number of simulations to use to calculate 95% confidence intervals
based on the asymptotic normal distribution of the basic parameter
estimates. If |
cores |
Number of processor cores to use. If |
For a competing risks model, i.e. a model defined by just one starting state and multiple destination states representing competing events, this returns the probability governing the next event that happens, and the distribution of the time to each event conditionally on that event happening.
A tidy data frame with rows for each combination of covariate values
and quantity of interest. The quantity of interest is identified in the
column quantity
, and the value of the quantity is in val
,
with additional columns lower
and upper
giving 95%
confidence intervals for the quantity, if B>0
.
Reformat simulated multi-state data with one row per simulated transition
simfs_bytrans(simfs)
simfs_bytrans(simfs)
simfs |
Output from |
Data frame with four columns giving transition start state, transition end state, transition name and the time taken by the transition.
Simulate times to competing events from a mixture multi-state model
simt_flexsurvmix(x, newdata = NULL, n)
simt_flexsurvmix(x, newdata = NULL, n)
x |
Fitted model object returned from |
newdata |
Data frame or list of covariate values. If omitted for a model with covariates, a default is used, defined by all combinations of factors if the only covariates in the model are factors, or all covariate values of zero if there are any non-factor covariates in the model. |
n |
Number of simulations |
Data frame with n*m
rows and a column for each competing
event, where m
is the number of alternative covariate values, that
is the number of rows of newdata
. The simulated time represents
the time to that event conditionally on that event being the one that
occurs. This function doesn't simulate which event occurs.
Simulate censored time-to-event data from a fitted flexsurvreg model
## S3 method for class 'flexsurvreg' simulate( object, nsim = 1, seed = NULL, newdata = NULL, start = NULL, censtime = NULL, tidy = FALSE, ... )
## S3 method for class 'flexsurvreg' simulate( object, nsim = 1, seed = NULL, newdata = NULL, start = NULL, censtime = NULL, tidy = FALSE, ... )
object |
Object returned by |
nsim |
Number of simulations per row in |
seed |
Random number seed. This is returned with the result of this
function, as described in |
newdata |
Data frame defining alternative sets of covariate values to simulate with. If omitted, this defaults to the data originally used to fit the model. |
start |
Optional left-truncation time or times. The returned
survival, hazard or cumulative hazard will be conditioned on survival up to
this time. Predicted times returned with A vector of the same length as |
censtime |
A right-censoring time, or vector of times matching the rows
of |
tidy |
If If In either case, the simulated time and indicator for whether the time is an event time (rather than a time of right-censoring) are returned in different columns. |
... |
Other arguments (not currently used). |
A data frame, with format determined by whether tidy
was specified.
fit <- flexsurvreg(formula = Surv(futime, fustat) ~ rx, data = ovarian, dist="weibull") fit2 <- flexsurvspline(formula = Surv(futime, fustat) ~ rx, data = ovarian, k=3) nd = data.frame(rx=1:2) simulate(fit, seed=1002, newdata=nd) simulate(fit, seed=1002, newdata=nd, start=500) simulate(fit2, nsim=3, seed=1002, newdata=nd) simulate(fit2, nsim=3, seed=1002, newdata=nd, start=c(500,1000))
fit <- flexsurvreg(formula = Surv(futime, fustat) ~ rx, data = ovarian, dist="weibull") fit2 <- flexsurvspline(formula = Surv(futime, fustat) ~ rx, data = ovarian, k=3) nd = data.frame(rx=1:2) simulate(fit, seed=1002, newdata=nd) simulate(fit, seed=1002, newdata=nd, start=500) simulate(fit2, nsim=3, seed=1002, newdata=nd) simulate(fit2, nsim=3, seed=1002, newdata=nd, start=c(500,1000))
Returns a tidy data.frame of marginal survival probabilities, or hazards, restricted mean survival, or quantiles of the marginal survival function at user-defined time points and covariate patterns. Standardization is performed over any undefined covariates in the model. The user provides the data to standardize over. Contrasts can be calculated resulting in estimates of the average treatment effect or the average treatment effect in the treated if a treated subset of the data are supplied.
standsurv( object, newdata = NULL, at = list(list()), atreference = 1, type = "survival", t = NULL, ci = FALSE, se = FALSE, boot = FALSE, B = NULL, cl = 0.95, trans = "log", contrast = NULL, trans.contrast = NULL, seed = NULL, rmap, ratetable, scale.ratetable = 365.25, n.gauss.quad = 100, quantiles = 0.5, interval = c(1e-08, 500) )
standsurv( object, newdata = NULL, at = list(list()), atreference = 1, type = "survival", t = NULL, ci = FALSE, se = FALSE, boot = FALSE, B = NULL, cl = 0.95, trans = "log", contrast = NULL, trans.contrast = NULL, seed = NULL, rmap, ratetable, scale.ratetable = 365.25, n.gauss.quad = 100, quantiles = 0.5, interval = c(1e-08, 500) )
object |
Output from |
newdata |
Data frame containing covariate values to produce marginal
values for. If not specified then the fitted model data.frame is used.
There must be a column for every covariate in the model formula
for which the user wishes to standardize over. These are in the same format
as the original data, with factors as a single variable, not 0/1 contrasts.
Any covariates that are to be fixed should be specified in |
at |
A list of scenarios in which specific covariates are fixed to
certain values. Each element of |
atreference |
The reference scenario for making contrasts. Default is 1
(i.e. the first element of |
type |
|
t |
Times to calculate marginal values at. |
ci |
Should confidence intervals be calculated? Defaults to FALSE |
se |
Should standard errors be calculated? Defaults to FALSE |
boot |
Should bootstrapping be used to calculate standard error and confidence intervals? Defaults to FALSE, in which case the delta method is used |
B |
Number of bootstrap simulations from the normal asymptotic
distribution of the estimates used to calculate confidence intervals or
standard errors. Decrease for greater speed at the expense of accuracy. Only
specify if |
cl |
Width of symmetric confidence intervals, relative to 1. |
trans |
Transformation to apply when calculating standard errors via the delta method to obtain confidence intervals. The default transformation is "log". Other possible names are "none", "loglog", "logit". |
contrast |
Contrasts between standardized measures defined by |
trans.contrast |
Transformation to apply when calculating standard errors for contrasts via the delta method to obtain confidence intervals. The default transformation is "none" for differences in survival, hazard, quantiles, or RMST, and "log" for ratios of survival, hazard, quantiles or RMST. |
seed |
The random seed to use (for bootstrapping confidence intervals) |
rmap |
An list that maps data set names to expected ratetable names. This must be specified if all-cause survival and hazards are required after fitting a relative survival model. This can also be specified if expected rates are required for plotting purposes. See the details section below. |
ratetable |
A table of expected event rates
(see |
scale.ratetable |
Transformation from the time scale of the fitted
flexsurv model to the time scale in |
n.gauss.quad |
Number of Gaussian quadrature points used for integrating the all-cause survival function when calculating RMST in a relative survival framework (default = 100) |
quantiles |
If |
interval |
Interval of survival times for quantile root finding. Default is c(1e-08, 500). |
The syntax of standsurv
follows closely that of Stata's
standsurv
command written by Paul Lambert and Michael Crowther. The
function calculates standardized (marginal) measures including standardized
survival functions, standardized restricted mean survival times, quantiles
and the hazard of standardized survival. The standardized survival is defined as
The hazard of the standardized survival is a weighted average of individual hazard functions at time t, weighted by the survival function at this time:
Marginal expected survival and hazards can be calculated by providing a
population-based lifetable of class ratetable in ratetable
and a
mapping between stratification factors in the lifetable and the user dataset
using rmap
. If these stratification factors are not in the fitted
survival model then the user must specify them in newdata
along with
the covariates of the model. The marginal expected survival is calculated
using the "Ederer" method that assumes no censoring as this is most relevant
approach for forecasting (see
survexp
). A worked example is given below.
Marginal all-cause survival and hazards can be calculated after fitting a relative survival model, which utilise the expected survival from a population ratetable. See Rutherford et al. (Chapter 6) for further details.
A tibble
containing one row for each
time-point. The column naming convention is at{i}
for the ith scenario
with corresponding confidence intervals (if specified) named at{i}_lci
and at{i}_uci
. Contrasts are named contrast{k}_{j}
for the
comparison of the kth versus the jth at
scenario.
In addition tidy long-format data.frames are returned in the attributes
standsurv_at
and standsurv_contrast
. These can be passed to
ggplot
for plotting purposes (see plot.standsurv
).
Michael Sweeting <[email protected]>
Paul Lambert, 2021. "STANDSURV: Stata module to compute standardized (marginal) survival and related functions," Statistical Software Components S458991, Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s458991.html
Rutherford, MJ, Lambert PC, Sweeting MJ, Pennington B, Crowther MJ, Abrams KR, Latimer NR. 2020. "NICE DSU Technical Support Document 21: Flexible Methods for Survival Analysis" https://nicedsu.sites.sheffield.ac.uk/tsds/flexible-methods-for-survival-analysis-tsd
## mean age is higher in those with smaller observed survival times newbc <- bc set.seed(1) newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE), sd = 5) ## Fit a Weibull flexsurv model with group and age as covariates weib_age <- flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist="weibull") ## Calculate standardized survival and the difference in standardized survival ## for the three levels of group across a grid of survival times standsurv_weib_age <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length.out=100), contrast = "difference", ci=FALSE) standsurv_weib_age ## Calculate hazard of standardized survival and the marginal hazard ratio ## for the three levels of group across a grid of survival times ## 10 bootstraps for confidence intervals (this should be larger) ## Not run: haz_standsurv_weib_age <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length.out=100), type="hazard", contrast = "ratio", boot = TRUE, B=10, ci=TRUE) haz_standsurv_weib_age plot(haz_standsurv_weib_age, ci=TRUE) ## Hazard ratio plot shows a decreasing marginal HR ## Whereas the conditional HR is constant (model is a PH model) plot(haz_standsurv_weib_age, contrast=TRUE, ci=TRUE) ## Calculate standardized survival from a Weibull model together with expected ## survival matching to US lifetables # age at diagnosis in days. This is required to match to US ratetable, whose # timescale is measured in days newbc$agedays <- floor(newbc$age * 365.25) ## Create some random diagnosis dates centred on 01/01/2010 with SD=1 year ## These will be used to match to expected rates in the lifetable newbc$diag <- as.Date(floor(rnorm(dim(newbc)[1], mean = as.Date("01/01/2010", "%d/%m/%Y"), sd=365)), origin="1970-01-01") ## Create sex (assume all are female) newbc$sex <- factor("female") standsurv_weib_expected <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length.out=100), rmap=list(sex = sex, year = diag, age = agedays), ratetable = survival::survexp.us, scale.ratetable = 365.25, newdata = newbc) ## Plot marginal survival with expected survival superimposed plot(standsurv_weib_expected, expected=TRUE) ## End(Not run)
## mean age is higher in those with smaller observed survival times newbc <- bc set.seed(1) newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE), sd = 5) ## Fit a Weibull flexsurv model with group and age as covariates weib_age <- flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist="weibull") ## Calculate standardized survival and the difference in standardized survival ## for the three levels of group across a grid of survival times standsurv_weib_age <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length.out=100), contrast = "difference", ci=FALSE) standsurv_weib_age ## Calculate hazard of standardized survival and the marginal hazard ratio ## for the three levels of group across a grid of survival times ## 10 bootstraps for confidence intervals (this should be larger) ## Not run: haz_standsurv_weib_age <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length.out=100), type="hazard", contrast = "ratio", boot = TRUE, B=10, ci=TRUE) haz_standsurv_weib_age plot(haz_standsurv_weib_age, ci=TRUE) ## Hazard ratio plot shows a decreasing marginal HR ## Whereas the conditional HR is constant (model is a PH model) plot(haz_standsurv_weib_age, contrast=TRUE, ci=TRUE) ## Calculate standardized survival from a Weibull model together with expected ## survival matching to US lifetables # age at diagnosis in days. This is required to match to US ratetable, whose # timescale is measured in days newbc$agedays <- floor(newbc$age * 365.25) ## Create some random diagnosis dates centred on 01/01/2010 with SD=1 year ## These will be used to match to expected rates in the lifetable newbc$diag <- as.Date(floor(rnorm(dim(newbc)[1], mean = as.Date("01/01/2010", "%d/%m/%Y"), sd=365)), origin="1970-01-01") ## Create sex (assume all are female) newbc$sex <- factor("female") standsurv_weib_expected <- standsurv(weib_age, at = list(list(group="Good"), list(group="Medium"), list(group="Poor")), t=seq(0,7, length.out=100), rmap=list(sex = sex, year = diag, age = agedays), ratetable = survival::survexp.us, scale.ratetable = 365.25, newdata = newbc) ## Plot marginal survival with expected survival superimposed plot(standsurv_weib_expected, expected=TRUE) ## End(Not run)
Return fitted survival, cumulative hazard or hazard at a series of times
from a fitted flexsurvreg
or flexsurvspline
model.
## S3 method for class 'flexsurvreg' summary( object, newdata = NULL, X = NULL, type = "survival", fn = NULL, t = NULL, quantiles = 0.5, start = 0, cross = TRUE, ci = TRUE, se = FALSE, B = 1000, cl = 0.95, tidy = FALSE, na.action = na.pass, ... )
## S3 method for class 'flexsurvreg' summary( object, newdata = NULL, X = NULL, type = "survival", fn = NULL, t = NULL, quantiles = 0.5, start = 0, cross = TRUE, ci = TRUE, se = FALSE, B = 1000, cl = 0.95, tidy = FALSE, na.action = na.pass, ... )
object |
Output from |
newdata |
Data frame containing covariate values to produce fitted values for. Or a list that can be coerced to such a data frame. There must be a column for every covariate in the model formula, and one row for every combination of covariates the fitted values are wanted for. These are in the same format as the original data, with factors as a single variable, not 0/1 contrasts. If this is omitted, if there are any continuous covariates, then a single summary is provided with all covariates set to their mean values in the data - for categorical covariates, the means of the 0/1 indicator variables are taken. If there are only factor covariates in the model, then all distinct groups are used by default. |
X |
Alternative way of defining covariate values to produce fitted
values for. Since version 0.4, Columns of For “factor” (categorical) covariates, the values of the contrasts
representing factor levels (as returned by the |
type |
Ignored if |
fn |
Custom function of the parameters to summarise against time.
This has optional first two arguments |
t |
Times to calculate fitted values for. By default, these are the sorted unique observation (including censoring) times in the data - for left-truncated datasets these are the "stop" times. |
quantiles |
If |
start |
Optional left-truncation time or times. The returned
survival, hazard or cumulative hazard will be conditioned on survival up to
this time. Predicted times returned with A vector of the same length as |
cross |
If If |
ci |
Set to |
se |
Set to |
B |
Number of simulations from the normal asymptotic distribution of
the estimates used to calculate confidence intervals or standard errors.
Decrease for greater
speed at the expense of accuracy, or set |
cl |
Width of symmetric confidence intervals, relative to 1. |
tidy |
If |
na.action |
Function determining what should be done with missing values in |
... |
Further arguments passed to or from other methods. Currently unused. |
Time-dependent covariates are not currently supported. The covariate values are assumed to be constant through time for each fitted curve.
If tidy=FALSE
, a list with one component for each unique
covariate value (if there are only categorical covariates) or one component
(if there are no covariates or any continuous covariates). Each of these
components is a matrix with one row for each time in t
, giving the
estimated survival (or cumulative hazard, or hazard) and 95% confidence
limits. These list components are named with the covariate names and
values which define them.
If tidy=TRUE
, a data frame is returned instead. This is formed by
stacking the above list components, with additional columns to identify the
covariate values that each block corresponds to.
If there are multiple summaries, an additional list component named
X
contains a matrix with the exact values of contrasts (dummy
covariates) defining each summary.
The plot.flexsurvreg
function can be used to quickly plot
these model-based summaries against empirical summaries such as
Kaplan-Meier curves, to diagnose model fit.
Confidence intervals are obtained by sampling randomly from the asymptotic normal distribution of the maximum likelihood estimates and then taking quantiles (see, e.g. Mandel (2013)).
C. H. Jackson [email protected]
Mandel, M. (2013). "Simulation based confidence intervals for functions with complicated derivatives." The American Statistician (in press).
This function extracts quantities of interest from the untruncated
version of a model with individual-specific right truncation points
fitted by flexsurvrtrunc
. Note that covariates are
currently not supported by flexsurvrtrunc
.
## S3 method for class 'flexsurvrtrunc' summary( object, type = "survival", fn = NULL, t = NULL, quantiles = 0.5, ci = TRUE, se = FALSE, B = 1000, cl = 0.95, ... )
## S3 method for class 'flexsurvrtrunc' summary( object, type = "survival", fn = NULL, t = NULL, quantiles = 0.5, ci = TRUE, se = FALSE, B = 1000, cl = 0.95, ... )
object |
Output from |
type |
Ignored if |
fn |
Custom function of the parameters to summarise against time.
This has optional first argument |
t |
Times to calculate fitted values for. By default, these are the sorted unique observation (including censoring) times in the data - for left-truncated datasets these are the "stop" times. |
quantiles |
If |
ci |
Set to |
se |
Set to |
B |
Number of simulations from the normal asymptotic distribution of
the estimates used to calculate confidence intervals or standard errors.
Decrease for greater
speed at the expense of accuracy, or set |
cl |
Width of symmetric confidence intervals, relative to 1. |
... |
Further arguments passed to or from other methods. Currently unused. |
Estimates the survivor function from right-truncated, uncensored data by reversing time, interpreting the data as left-truncated, applying the Kaplan-Meier / Lynden-Bell estimator and transforming back.
survrtrunc(t, rtrunc, tmax, data = NULL, eps = 0.001, conf.int = 0.95)
survrtrunc(t, rtrunc, tmax, data = NULL, eps = 0.001, conf.int = 0.95)
t |
Vector of observed times from an initial event to a final event. |
rtrunc |
Individual-specific right truncation points, so that each
individual's survival time |
tmax |
Maximum possible time to event that could have been observed. |
data |
Data frame to find |
eps |
Small number that is added to |
conf.int |
Confidence level, defaulting to 0.95. |
Note that this does not estimate the untruncated survivor function - instead it estimates the survivor function truncated above at a time defined by the maximum possible time that might have been observed in the data.
Define as the time of the initial event,
as the time of the
final event, then we wish to determine the distribution of
.
Observations are only recorded if . Then the
distribution of
in the resulting sample is right-truncated by
rtrunc
.
Equivalently, the distribution of is left-truncated, since
it is only observed if
. Then the standard
Kaplan-Meier type estimator as implemented in
survfit
is used (as described by Lynden-Bell, 1971)
and the results transformed back.
This situation might happen in a disease epidemic, where is the date
of disease onset for an individual,
is the date of death, and we
wish to estimate the distribution of the time
from onset to death,
given we have only observed people who have died by the date
.
If the estimated survival is unstable at the highest times, then consider
replacing tmax
by a slightly lower value, then if necessary, removing
individuals with t > tmax
, so that the estimand is changed to the
survivor function truncated over a slightly narrower interval.
A list with components:
time
Time points where the estimated survival changes.
surv
Estimated survival at time
, truncated above at
tmax
.
se.surv
Standard error of survival.
std.err
Standard error of -log(survival). Named this way for consistency with survfit
.
lower
Lower confidence limits for survival.
upper
Upper confidence limits for survival.
D. Lynden-Bell (1971) A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices of the Royal Astronomical Society, 155:95–118.
Seaman, S., Presanis, A. and Jackson, C. (2020) Review of methods for estimating distribution of time to event from right-truncated data.
## simulate some event time data set.seed(1) X <- rweibull(100, 2, 10) T <- rweibull(100, 2, 10) ## truncate above tmax <- 20 obs <- X + T < tmax rtrunc <- tmax - X dat <- data.frame(X, T, rtrunc)[obs,] sf <- survrtrunc(T, rtrunc, data=dat, tmax=tmax) plot(sf, conf.int=TRUE) ## Kaplan-Meier estimate ignoring truncation is biased sfnaive <- survfit(Surv(T) ~ 1, data=dat) lines(sfnaive, conf.int=TRUE, lty=2, col="red") ## truncate above the maximum observed time tmax <- max(X + T) + 10 obs <- X + T < tmax rtrunc <- tmax - X dat <- data.frame(X, T, rtrunc)[obs,] sf <- survrtrunc(T, rtrunc, data=dat, tmax=tmax) plot(sf, conf.int=TRUE) ## estimates identical to the standard Kaplan-Meier sfnaive <- survfit(Surv(T) ~ 1, data=dat) lines(sfnaive, conf.int=TRUE, lty=2, col="red")
## simulate some event time data set.seed(1) X <- rweibull(100, 2, 10) T <- rweibull(100, 2, 10) ## truncate above tmax <- 20 obs <- X + T < tmax rtrunc <- tmax - X dat <- data.frame(X, T, rtrunc)[obs,] sf <- survrtrunc(T, rtrunc, data=dat, tmax=tmax) plot(sf, conf.int=TRUE) ## Kaplan-Meier estimate ignoring truncation is biased sfnaive <- survfit(Surv(T) ~ 1, data=dat) lines(sfnaive, conf.int=TRUE, lty=2, col="red") ## truncate above the maximum observed time tmax <- max(X + T) + 10 obs <- X + T < tmax rtrunc <- tmax - X dat <- data.frame(X, T, rtrunc)[obs,] sf <- survrtrunc(T, rtrunc, data=dat, tmax=tmax) plot(sf, conf.int=TRUE) ## estimates identical to the standard Kaplan-Meier sfnaive <- survfit(Surv(T) ~ 1, data=dat) lines(sfnaive, conf.int=TRUE, lty=2, col="red")
Probability density, distribution, quantile, random generation, hazard,
cumulative hazard, mean and restricted mean functions for the Royston/Parmar
spline model. These functions have all parameters of the distribution collected
together in a single argument gamma
. For the equivalent functions with
one argument per parameter, see Survsplinek
.
dsurvspline( x, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, log = FALSE ) psurvspline( q, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, lower.tail = TRUE, log.p = FALSE ) qsurvspline( p, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, lower.tail = TRUE, log.p = FALSE ) rsurvspline( n, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 ) Hsurvspline( x, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 ) hsurvspline( x, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 ) rmst_survspline( t, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, start = 0 ) mean_survspline( gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 )
dsurvspline( x, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, log = FALSE ) psurvspline( q, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, lower.tail = TRUE, log.p = FALSE ) qsurvspline( p, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, lower.tail = TRUE, log.p = FALSE ) rsurvspline( n, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 ) Hsurvspline( x, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 ) hsurvspline( x, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 ) rmst_survspline( t, gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0, start = 0 ) mean_survspline( gamma, beta = 0, X = 0, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", offset = 0 )
x , q , t
|
Vector of times. |
gamma |
Parameters describing the baseline spline function, as
described in |
beta |
Vector of covariate effects. Not supported and ignored since version 2.3, and this argument will be removed in 2.4. |
X |
Matrix of covariate values. Not supported and ignored since version 2.3, and this argument will be removed in 2.4. |
knots |
Locations of knots on the axis of log time, supplied in
increasing order. Unlike in This may in principle be supplied as a matrix, in the same way as for
|
scale |
|
timescale |
|
spline |
|
offset |
An extra constant to add to the linear predictor
|
log , log.p
|
Return log density or probability. |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
Vector of probabilities. |
n |
Number of random numbers to simulate. |
start |
Optional left-truncation time or times. The returned restricted mean survival will be conditioned on survival up to this time. |
dsurvspline
gives the density, psurvspline
gives the
distribution function, hsurvspline
gives the hazard and
Hsurvspline
gives the cumulative hazard, as described in
flexsurvspline
.
qsurvspline
gives the quantile function, which is computed by crude
numerical inversion (using qgeneric
).
rsurvspline
generates random survival times by using
qsurvspline
on a sample of uniform random numbers. Due to the
numerical root-finding involved in qsurvspline
, it is slow compared
to typical random number generation functions.
Christopher Jackson <[email protected]>
Royston, P. and Parmar, M. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(1):2175-2197.
Wang W, Yan J (2021). Shape-Restricted Regression Splines with R Package splines2. Journal of Data Science, 19(3), 498-517.
## reduces to the weibull regscale <- 0.786; cf <- 1.82 a <- 1/regscale; b <- exp(cf) dweibull(1, shape=a, scale=b) dsurvspline(1, gamma=c(log(1 / b^a), a)) # should be the same ## reduces to the log-normal meanlog <- 1.52; sdlog <- 1.11 dlnorm(1, meanlog, sdlog) dsurvspline(1, gamma = c(-meanlog/sdlog, 1/sdlog), scale="normal") # should be the same
## reduces to the weibull regscale <- 0.786; cf <- 1.82 a <- 1/regscale; b <- exp(cf) dweibull(1, shape=a, scale=b) dsurvspline(1, gamma=c(log(1 / b^a), a)) # should be the same ## reduces to the log-normal meanlog <- 1.52; sdlog <- 1.11 dlnorm(1, meanlog, sdlog) dsurvspline(1, gamma = c(-meanlog/sdlog, 1/sdlog), scale="normal") # should be the same
Probability density, distribution, quantile, random generation, hazard,
cumulative hazard, mean and restricted mean functions for the Royston/Parmar
spline model, with one argument per parameter. For the equivalent functions with all parameters collected together in a single argument, see Survspline
.
mean_survspline0( gamma0, gamma1, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline1( gamma0, gamma1, gamma2, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline2( gamma0, gamma1, gamma2, gamma3, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline3( gamma0, gamma1, gamma2, gamma3, gamma4, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline4( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline5( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline6( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline7( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) rmst_survspline0( t, gamma0, gamma1, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline1( t, gamma0, gamma1, gamma2, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline2( t, gamma0, gamma1, gamma2, gamma3, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline3( t, gamma0, gamma1, gamma2, gamma3, gamma4, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline4( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline5( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline6( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline7( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) dsurvspline0( x, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline1( x, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline2( x, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline3( x, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline4( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline5( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline6( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline7( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) psurvspline0( q, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline1( q, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline2( q, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline3( q, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline4( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline5( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline6( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline7( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline0( p, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline1( p, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline2( p, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline3( p, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline4( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline5( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline6( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline7( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) rsurvspline0( n, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline1( n, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline2( n, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline3( n, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline4( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline5( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline6( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline7( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline0( x, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline1( x, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline2( x, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline3( x, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline4( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline5( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline6( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline7( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline0( x, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline1( x, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline2( x, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline3( x, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline4( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline5( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline6( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline7( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp" )
mean_survspline0( gamma0, gamma1, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline1( gamma0, gamma1, gamma2, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline2( gamma0, gamma1, gamma2, gamma3, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline3( gamma0, gamma1, gamma2, gamma3, gamma4, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline4( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline5( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline6( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) mean_survspline7( gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp" ) rmst_survspline0( t, gamma0, gamma1, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline1( t, gamma0, gamma1, gamma2, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline2( t, gamma0, gamma1, gamma2, gamma3, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline3( t, gamma0, gamma1, gamma2, gamma3, gamma4, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline4( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline5( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline6( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) rmst_survspline7( t, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots = c(-10, 10), scale = "hazard", timescale = "log", spline = "rp", start = 0 ) dsurvspline0( x, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline1( x, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline2( x, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline3( x, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline4( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline5( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline6( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) dsurvspline7( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp", log = FALSE ) psurvspline0( q, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline1( q, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline2( q, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline3( q, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline4( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline5( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline6( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) psurvspline7( q, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline0( p, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline1( p, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline2( p, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline3( p, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline4( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline5( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline6( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) qsurvspline7( p, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp", lower.tail = TRUE, log.p = FALSE ) rsurvspline0( n, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline1( n, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline2( n, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline3( n, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline4( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline5( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline6( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp" ) rsurvspline7( n, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline0( x, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline1( x, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline2( x, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline3( x, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline4( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline5( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline6( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp" ) hsurvspline7( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline0( x, gamma0, gamma1, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline1( x, gamma0, gamma1, gamma2, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline2( x, gamma0, gamma1, gamma2, gamma3, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline3( x, gamma0, gamma1, gamma2, gamma3, gamma4, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline4( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline5( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline6( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, knots, scale = "hazard", timescale = "log", spline = "rp" ) Hsurvspline7( x, gamma0, gamma1, gamma2, gamma3, gamma4, gamma5, gamma6, gamma7, gamma8, knots, scale = "hazard", timescale = "log", spline = "rp" )
gamma0 , gamma1 , gamma2 , gamma3 , gamma4 , gamma5 , gamma6 , gamma7 , gamma8
|
Parameters describing the baseline spline function, as
described in |
knots |
Locations of knots on the axis of log time, supplied in
increasing order. Unlike in This may in principle be supplied as a matrix, in the same way as for
|
scale |
|
timescale |
|
spline |
|
start |
Optional left-truncation time or times. The returned restricted mean survival will be conditioned on survival up to this time. |
x , q , t
|
Vector of times. |
log , log.p
|
Return log density or probability. |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
Vector of probabilities. |
n |
Number of random numbers to simulate. |
These functions go up to 7 spline knots, or 9 parameters.
Christopher Jackson <[email protected]>
Tidy summarizes information about the components of the model into a tidy data frame.
## S3 method for class 'flexsurvreg' tidy( x, conf.int = FALSE, conf.level = 0.95, pars = "all", transform = "none", ... )
## S3 method for class 'flexsurvreg' tidy( x, conf.int = FALSE, conf.level = 0.95, pars = "all", transform = "none", ... )
x |
Output from |
conf.int |
Logical. Should confidence intervals be returned? Default is |
conf.level |
The confidence level to use for the confidence interval if |
pars |
One of |
transform |
Character vector of transformations to apply to requested Users can specify one or both types of transformations:
See |
... |
Not currently used. |
flexsurvreg
models estimate two types of coefficients, baseline distribution parameters, and covariate effects which act on the baseline distribution. By design, flexsurvreg
returns distribution parameters on the same scale as is found in the relevant d/p/q/r
functions. Covariate effects are returned on the log-scale, which represents either log-time ratios (accelerated failure time models) or log-hazard ratios for proportional hazard models. By default, tidy()
will return baseline distribution parameters on their natural scale and covariate effects on the log-scale.
To transform the baseline distribution parameters to the real-value number line (the scale used for estimation), pass the character argument "baseline.real"
to transform
. To get time ratios or hazard ratios, pass "coefs.exp"
to transform
. These transformations may be done together by submitting both arguments as a character vector.
A tibble
containing the columns: term
, estimate
, std.error
, statistic
, p.value
, conf.low
, and conf.high
, by default.
statistic
and p.value
are only provided for covariate effects (NA
for baseline distribution parameters). These are computed as Wald-type test statistics with p-values from a standard normal distribution.
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") tidy(fitg) tidy(fitg, pars = "coefs", transform = "coefs.exp")
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist = "gengamma") tidy(fitg) tidy(fitg, pars = "coefs", transform = "coefs.exp")
This function is used internally by standsurv
and tidy
data.frames are automatically returned by the function.
## S3 method for class 'standsurv' tidy(x, ...)
## S3 method for class 'standsurv' tidy(x, ...)
x |
A standsurv object. |
... |
Not currently used. |
Returns additional tidy data.frames (tibbles) stored as attributes named standpred_at and standpred_contrast.
The matrix whose entry is the expected amount of time spent in
state
for a time-inhomogeneous, continuous-time Markov multi-state
process that starts in state
, up to a maximum time
. This is
defined as the integral of the corresponding transition probability up to
that time.
totlos.fs( x, trans = NULL, t = 1, newdata = NULL, ci = FALSE, tvar = "trans", sing.inf = 1e+10, B = 1000, cl = 0.95, ... )
totlos.fs( x, trans = NULL, t = 1, newdata = NULL, ci = FALSE, tvar = "trans", sing.inf = 1e+10, B = 1000, cl = 0.95, ... )
x |
A model fitted with
|
trans |
Matrix indicating allowed transitions. See
|
t |
Time or vector of times to predict up to. Must be finite. |
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. See
|
ci |
Return a confidence interval calculated by simulating from the
asymptotic normal distribution of the maximum likelihood estimates. Turned
off by default, since this is computationally intensive. If turned on,
users should increase |
tvar |
Variable in the data representing the transition type. Not
required if |
sing.inf |
If there is a singularity in the observed hazard, for
example a Weibull distribution with |
B |
Number of simulations from the normal asymptotic distribution used to calculate variances. Decrease for greater speed at the expense of accuracy. |
cl |
Width of symmetric confidence intervals, relative to 1. |
... |
Arguments passed to |
This is computed by solving a second order extension of the Kolmogorov forward differential equation numerically, using the methods in the deSolve package. The equation is expressed as a linear system
and solved for and
simultaneously, where
is the matrix of total lengths of stay,
is the transition
probability matrix for time
, and
is the transition
hazard or intensity as a function of
. The initial conditions are
and
.
Note that the package msm has a similar method totlos.msm
.
totlos.fs
should give the same results as totlos.msm
when
both of these conditions hold:
the time-to-event distribution is exponential for all
transitions, thus the flexsurvreg
model was fitted with
dist="exp"
, and is time-homogeneous.
the msm model was
fitted with exacttimes=TRUE
, thus all the event times are known, and
there are no time-dependent covariates.
msm only allows exponential or piecewise-exponential time-to-event distributions, while flexsurvreg allows more flexible models. msm however was designed in particular for panel data, where the process is observed only at arbitrary times, thus the times of transition are unknown, which makes flexible models difficult.
This function is only valid for Markov ("clock-forward") multi-state
models, though no warning or error is currently given if the model is not
Markov. See totlos.simfs
for the equivalent for semi-Markov
("clock-reset") models.
The matrix of lengths of stay , if
t
is of length
1, or a list of matrices if t
is longer.
If ci=TRUE
, each element has attributes "lower"
and
"upper"
giving matrices of the corresponding confidence limits.
These are formatted for printing but may be extracted using attr()
.
The result also has an attribute P
giving the transition probability
matrices, since these are unavoidably computed as a side effect. These are
suppressed for printing, but can be extracted with attr(...,"P")
.
Christopher Jackson [email protected].
totlos.simfs
, pmatrix.fs
,
msfit.flexsurvreg
.
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # predict 4 years spent without BOS, 3 years with BOS, before death # As t increases, this should converge totlos.fs(bexp, t=10, trans=tmat) totlos.fs(bexp, t=1000, trans=tmat) totlos.fs(bexp, t=c(5,10), trans=tmat) # Answers should match results in help(totlos.simfs) up to Monte Carlo # error there / ODE solving precision here, since with an exponential # distribution, the "semi-Markov" model there is the same as the Markov # model here
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # predict 4 years spent without BOS, 3 years with BOS, before death # As t increases, this should converge totlos.fs(bexp, t=10, trans=tmat) totlos.fs(bexp, t=1000, trans=tmat) totlos.fs(bexp, t=c(5,10), trans=tmat) # Answers should match results in help(totlos.simfs) up to Monte Carlo # error there / ODE solving precision here, since with an exponential # distribution, the "semi-Markov" model there is the same as the Markov # model here
The expected total time spent in each state for semi-Markov multi-state
models fitted to time-to-event data with flexsurvreg
. This
is defined by the integral of the transition probability matrix, though
this is not analytically possible and is computed by simulation.
totlos.simfs( x, trans, t = 1, start = 1, newdata = NULL, ci = FALSE, tvar = "trans", tcovs = NULL, group = NULL, M = 1e+05, B = 1000, cl = 0.95, cores = NULL )
totlos.simfs( x, trans, t = 1, start = 1, newdata = NULL, ci = FALSE, tvar = "trans", tcovs = NULL, group = NULL, M = 1e+05, B = 1000, cl = 0.95, cores = NULL )
x |
A model fitted with
|
trans |
Matrix indicating allowed transitions. See
|
t |
Maximum time to predict to. |
start |
Starting state. |
newdata |
A data frame specifying the values of covariates in the
fitted model, other than the transition number. See
|
ci |
Return a confidence interval calculated by simulating from the
asymptotic normal distribution of the maximum likelihood estimates. This
is turned off by default, since two levels of simulation are required. If
turned on, users should adjust |
tvar |
Variable in the data representing the transition type. Not
required if |
tcovs |
Predictable time-dependent covariates such as age, see
|
group |
Optional grouping for the states. For example, if there are
four states, and |
M |
Number of individuals to simulate in order to approximate the transition probabilities. Users should adjust this to obtain the required precision. |
B |
Number of simulations from the normal asymptotic distribution used to calculate confidence limits. Decrease for greater speed at the expense of accuracy. |
cl |
Width of symmetric confidence intervals, relative to 1. |
cores |
Number of processor cores used when calculating confidence limits by repeated simulation. The default uses single-core processing. |
This is computed by simulating a large number of individuals M
using
the maximum likelihood estimates of the fitted model and the function
sim.fmsm
. Therefore this requires a random sampling function
for the parametric survival model to be available: see the "Details"
section of sim.fmsm
. This will be available for all built-in
distributions, though users may need to write this for custom models.
Note the random sampling method for flexsurvspline
models is
currently very inefficient, so that looping over M
will be very
slow.
The equivalent function for time-inhomogeneous Markov models is
totlos.fs
. Note neither of these functions give errors or
warnings if used with the wrong type of model, but the results will be
invalid.
The expected total time spent in each state (or group of states
given by group
) up to time t
, and corresponding confidence
intervals if requested.
Christopher Jackson [email protected].
pmatrix.simfs
,sim.fmsm
,msfit.flexsurvreg
.
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # predict 4 years spent without BOS, 3 years with BOS, before death # As t increases, this should converge totlos.simfs(bexp, t=10, trans=tmat) totlos.simfs(bexp, t=1000, trans=tmat)
# BOS example in vignette, and in msfit.flexsurvreg bexp <- flexsurvreg(Surv(years, status) ~ trans, data=bosms3, dist="exp") tmat <- rbind(c(NA,1,2),c(NA,NA,3),c(NA,NA,NA)) # predict 4 years spent without BOS, 3 years with BOS, before death # As t increases, this should converge totlos.simfs(bexp, t=10, trans=tmat) totlos.simfs(bexp, t=1000, trans=tmat)
Given a function with matrix arguments, construct an equivalent function
which takes vector arguments defined by the columns of the matrix. The new
function simply uses cbind
on the vector arguments to make a matrix,
and calls the old one.
unroll.function(mat.fn, ...)
unroll.function(mat.fn, ...)
mat.fn |
A function with any number of arguments, some of which are matrices. |
... |
A series of other arguments. Their names define which
arguments of
will make a new function Calling
should give the same answer as
|
The new function, with vector arguments.
This is used by flexsurvspline
to allow spline models, which
have an arbitrary number of parameters, to be fitted using
flexsurvreg
.
The “custom distributions” facility of flexsurvreg
expects the user-supplied probability density and distribution
functions to have one explicitly named argument for each scalar
parameter, and given R vectorisation, each of those arguments
could be supplied as a vector of alternative parameter values.
However, spline models have a varying number of scalar parameters,
determined by the number of knots in the spline.
dsurvspline
and psurvspline
have an
argument called gamma
. This can be supplied as a matrix,
with number of columns n
determined by the number of knots
(plus 2), and rows referring to alternative parameter values. The
following statements are used in the source of
flexsurvspline
:
dfn <- unroll.function(dsurvspline, gamma=0:(nk-1)) pfn <- unroll.function(psurvspline, gamma=0:(nk-1))
to convert these into functions with arguments gamma0
,
gamma1
,...,gamman
, corresponding to the columns
of gamma
, where n = nk-1
, and with other arguments
in the same format.
Christopher Jackson <[email protected]>
fn <- unroll.function(ncol, x=1:3) fn(1:3, 1:3, 1:3) # equivalent to... ncol(cbind(1:3,1:3,1:3))
fn <- unroll.function(ncol, x=1:3) fn(1:3, 1:3, 1:3) # equivalent to... ncol(cbind(1:3,1:3,1:3))
Variance-covariance matrix from a flexsurvreg model
## S3 method for class 'flexsurvreg' vcov(object, ...)
## S3 method for class 'flexsurvreg' vcov(object, ...)
object |
A fitted model object of class
|
... |
Other arguments (currently unused). |
Variance-covariance matrix of the estimated parameters, on the scale that they were estimated on (for positive parameters this is the log scale).
Density, distribution function, hazards, quantile function and random generation for the Weibull distribution in its proportional hazards parameterisation.
dweibullPH(x, shape, scale = 1, log = FALSE) pweibullPH(q, shape, scale = 1, lower.tail = TRUE, log.p = FALSE) qweibullPH(p, shape, scale = 1, lower.tail = TRUE, log.p = FALSE) hweibullPH(x, shape, scale = 1, log = FALSE) HweibullPH(x, shape, scale = 1, log = FALSE) rweibullPH(n, shape, scale = 1)
dweibullPH(x, shape, scale = 1, log = FALSE) pweibullPH(q, shape, scale = 1, lower.tail = TRUE, log.p = FALSE) qweibullPH(p, shape, scale = 1, lower.tail = TRUE, log.p = FALSE) hweibullPH(x, shape, scale = 1, log = FALSE) HweibullPH(x, shape, scale = 1, log = FALSE) rweibullPH(n, shape, scale = 1)
x , q
|
Vector of quantiles. |
shape |
Vector of shape parameters. |
scale |
Vector of scale parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
Vector of probabilities. |
n |
number of observations. If |
The Weibull distribution in proportional hazards parameterisation with ‘shape’ parameter a and ‘scale’ parameter m has density given by
cumulative distribution function , survivor
function
, cumulative hazard
and
hazard
.
dweibull
in base R has the alternative 'accelerated failure
time' (AFT) parameterisation with shape a and scale b. The shape parameter
is the same in both versions. The scale parameters are related as
, equivalently m = b^-a.
In survival modelling, covariates are typically included through a linear
model on the log scale parameter. Thus, in the proportional hazards model,
the coefficients in such a model on are interpreted as log hazard
ratios.
In the AFT model, covariates on are interpreted as time
acceleration factors. For example, doubling the value of a covariate with
coefficient
would give half the expected survival time.
These coefficients are related to the log hazard ratios
as
.
dweibullPH
gives the density, pweibullPH
gives the
distribution function, qweibullPH
gives the quantile function,
rweibullPH
generates random deviates, HweibullPH
retuns the
cumulative hazard and hweibullPH
the hazard.
Christopher Jackson <[email protected]>