Title: | Bayesian Multi-State Modelling of Chronic Disease Burden Data |
---|---|
Description: | Estimation of incidence and case fatality for a chronic disease, given partial information, using a multi-state model. Given data on age-specific mortality and either incidence or prevalence, Bayesian inference is used to estimate the posterior distributions of incidence, case fatality, and functions of these such as prevalence. The methods are described in Jackson et al. (2023) <doi:10.1093/jrsssa/qnac015>. |
Authors: | Christopher Jackson [aut, cre, cph]
|
Maintainer: | Christopher Jackson <[email protected]> |
License: | GPL-3 |
Version: | 1.1.0 |
Built: | 2025-01-24 05:02:00 UTC |
Source: | https://github.com/chjackson/disbayes |
Bayesian evidence synthesis for chronic disease epidemiology
Stan Development Team (2019). RStan: the R interface to Stan. R package version 2.19.2. https://mc-stan.org
Estimate the number of events and denominator that contain roughly equivalent information to an estimate and uncertainty interval for a proportion, by interpreting the estimate and interval as a Beta posterior arising from a vague Beta(0.5,0.5) prior updated with the data consisting of that number and denominator.
ci2num(est, lower, upper, epsilon = 0.5, denom0 = 1000)
ci2num(est, lower, upper, epsilon = 0.5, denom0 = 1000)
est |
Point estimate |
lower |
Lower 95% credible limit |
upper |
Upper 95% credible limit |
epsilon |
If any of |
denom0 |
Denominator to use as a default when the point estimate is exactly 0 or 1 (which is not compatible with the beta distribution). Should correspond to a guess of the population size used to produce the estimate,
which should be no greater than the actual population of the area, and usually less. Should be either a scalar, or a vector of the same length as |
Based on fitting a Beta distribution by least squares, using the method provided by the SHELF package.
Requires that the estimate and upper and lower limits are all distinct (except that est=0
is allowed and handled specially for convenience, see denom0
). Vectors of estimates and limits may be supplied.
A data frame with elements num
and denom
corresponding to the supplied estimate and limits.
Oakley (2020). SHELF: Tools to Support the Sheffield Elicitation Framework. R package version 1.7.0. https://CRAN.R-project.org/package=SHELF
est <- 3.00 / 100 upper <- 3.52 / 100 lower <- 2.60 / 100 ci2num(est, lower, upper)
est <- 3.00 / 100 upper <- 3.52 / 100 lower <- 2.60 / 100 ci2num(est, lower, upper)
A test of the hypothesis that the direct data on a disease outcome give the same
information about that outcome as an indirect evidence synthesis obtained from a fitted disbayes
model. The outcome may be annual incidence, mortality, remission probabilities,
or prevalence.
conflict_disbayes(x, varname)
conflict_disbayes(x, varname)
x |
A fitted |
varname |
Either |
Hierarchical models are not currently supported in this function.
A data frame with columns indicating age, gender and area.
p1
is a "one-sided" p-value for the null hypothesis that against
the alternative that
,
p2
is the two-sided p-value for the null hypothesis that against
the alternative that
is not equal to
,
where is the rate informed only by direct data, and
is the rate
informed by evidence synthesis. Therefore if the evidence synthesis excludes the
direct data, then these are interpreted as "conflict" p-values (see Presanis et al. 2013).
In each case, a small p-value favours the alternative hypothesis.
Presanis, A. M., Ohlssen, D., Spiegelhalter, D. J. and De Angelis, D. (2013) Conflict diagnostics in directed acyclic graphs, with applications in Bayesian evidence synthesis. Statistical Science, 28, 376-397.
Estimates a three-state disease model from incomplete data. It is designed to estimate case fatality and incidence, given data on mortality and at least one of incidence and prevalence. Remission may also be included in the data and modelled.
disbayes( data, inc_num = NULL, inc_denom = NULL, inc_prob = NULL, inc_lower = NULL, inc_upper = NULL, prev_num = NULL, prev_denom = NULL, prev_prob = NULL, prev_lower = NULL, prev_upper = NULL, mort_num = NULL, mort_denom = NULL, mort_prob = NULL, mort_lower = NULL, mort_upper = NULL, rem_num = NULL, rem_denom = NULL, rem_prob = NULL, rem_lower = NULL, rem_upper = NULL, age = "age", cf_model = "smooth", inc_model = "smooth", rem_model = "const", prev_zero = FALSE, inc_trend = NULL, cf_trend = NULL, cf_init = 0.01, eqage = 30, eqagehi = NULL, sprior = c(1, 1, 1), hp_fixed = NULL, rem_prior = c(1.1, 1), inc_prior = c(2, 0.1), cf_prior = c(2, 0.1), method = "opt", draws = 1000, iter = 10000, stan_control = NULL, bias_model = NULL, ... )
disbayes( data, inc_num = NULL, inc_denom = NULL, inc_prob = NULL, inc_lower = NULL, inc_upper = NULL, prev_num = NULL, prev_denom = NULL, prev_prob = NULL, prev_lower = NULL, prev_upper = NULL, mort_num = NULL, mort_denom = NULL, mort_prob = NULL, mort_lower = NULL, mort_upper = NULL, rem_num = NULL, rem_denom = NULL, rem_prob = NULL, rem_lower = NULL, rem_upper = NULL, age = "age", cf_model = "smooth", inc_model = "smooth", rem_model = "const", prev_zero = FALSE, inc_trend = NULL, cf_trend = NULL, cf_init = 0.01, eqage = 30, eqagehi = NULL, sprior = c(1, 1, 1), hp_fixed = NULL, rem_prior = c(1.1, 1), inc_prior = c(2, 0.1), cf_prior = c(2, 0.1), method = "opt", draws = 1000, iter = 10000, stan_control = NULL, bias_model = NULL, ... )
data |
Data frame containing some of the variables below. The variables below are provided as character strings naming columns in this data frame. For each disease measure available, one of the following three combinations of variables must be specified: (1) numerator and denominator (2) estimate and denominator (3) estimate with lower and upper credible limits. Mortality must be supplied, and at least one of incidence and prevalence. If remission is assumed to be possible, then remission data should also be supplied (see below). Estimates refer to the probability of having some event within a year, rather than rates. Rates per year $r$ can be converted to probabilities $p$ as $p = 1 - exp(-r)$, assuming the rate is constant within the year. For estimates based on registry data assumed to cover the whole population, then the denominator will be the population size. |
inc_num |
Numerator for the incidence data, assumed to represent the
observed number of new cases within a year among a population of size
|
inc_denom |
Denominator for the incidence data. The function Note that to include extra uncertainty beyond that implied by a published interval, the numerator and denominator could be multiplied by a constant, for example, multiplying both the numerator and denominator by 0.5 would give the data source half its original weight. |
inc_prob |
Estimate of the incidence probability |
inc_lower |
Lower credible limit for the incidence estimate |
inc_upper |
Upper credible limit for the incidence estimate |
prev_num |
Numerator for the estimate of prevalence, i.e. number of people currently with a disease. |
prev_denom |
Denominator for the estimate of prevalence (e.g. the size of the survey used to obtain the prevalence estimate) |
prev_prob |
Estimate of the prevalence probability |
prev_lower |
Lower credible limit for the prevalence estimate |
prev_upper |
Upper credible limit for the prevalence estimate |
mort_num |
Numerator for the estimate of the mortality probability, i.e number of deaths attributed to the disease under study within a year |
mort_denom |
Denominator for the estimate of the mortality probability (e.g. the population size, if the estimates were obtained from a comprehensive register) |
mort_prob |
Estimate of the mortality probability |
mort_lower |
Lower credible limit for the mortality estimate |
mort_upper |
Upper credible limit for the mortality estimate |
rem_num |
Numerator for the estimate of the remission probability, i.e number of people observed to recover from the disease within a year. Remission data should be supplied if remission is permitted in the model, either as a numerator and denominator or as an estimate and lower credible interval. Conversely, if no remission data are supplied, then remission is assumed to be impossible. These "data" may represent a prior judgement rather than observation - lower denominators or wider credible limits represent weaker prior information. |
rem_denom |
Denominator for the estimate of the remission probability |
rem_prob |
Estimate of the remission probability |
rem_lower |
Lower credible limit for the remission estimate |
rem_upper |
Upper credible limit for the remission estimate |
age |
Variable in the data indicating the year of age. This must start at age zero, but can end at any age. |
cf_model |
Model for how case fatality rate varies with age.
|
inc_model |
Model for how incidence rates vary with age.
|
rem_model |
Model for how remission rates vary with age, which are typically less well-informed by data, compared to incidence and case fatality.
|
prev_zero |
If |
inc_trend |
Matrix of constants representing trends in incidence
through calendar time by year of age. There are To produce this format from a long data frame, filter to the appropriate
outcome and subgroup, and use
|
cf_trend |
Matrix of constants representing trends in case fatality
through calendar time by year of age, in the same format as
|
cf_init |
Initial guess at a typical case fatality value, for an average age. |
eqage |
Case fatalities (and incidence and remission rates) are assumed to be equal for all ages below this age, inclusive, when using the smoothed model. |
eqagehi |
Case fatalities (and incidence and remission rates) are assumed to be equal for all ages above this age, inclusive, when using the smoothed model. |
sprior |
Rates of the exponential prior distributions used to penalise the coefficients of the spline model. The default of 1 should adapt appropriately to the data, but Higher values give stronger smoothing, or lower values give weaker smoothing, if required. This can be a named vector with names This can also be an unnamed vector of three elements, where the first refers to the spline model for incidence, the second for case fatality, the third for remission. If one of the rates (e.g. remission) is not being modelled with a spline, any number can be supplied here and it is just ignored. |
hp_fixed |
A list with one named element for each hyperparameter to be fixed. The value should be either
If the element is either The hyperparameters that can be fixed are
For example, to fix the case fatality smoothness to 1.2 and fix the incidence
smoothness to its posterior mode,
specify |
rem_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the remission rate, used in both |
inc_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the incidence rate. Only used if |
cf_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the case fatality rate. Only used if |
method |
String indicating the inference method, defaulting to
If If If the optimisation fails to converge (non-zero return code), try increasing the
number of iterations from the default 1000, e.g. If there is an error message that mentions If |
draws |
Number of draws from the normal approximation to the posterior
when using |
iter |
Number of iterations for MCMC sampling, or maximum number of iterations for optimization. |
stan_control |
( |
bias_model |
Experimental model for bias in the incidence estimates due
to conflicting information between the different data sources. If
Otherwise there are assumed to be two alternative curves of incidence by age (denoted 2 and 1) where curve 2 is related to curve 1 via a constant hazard ratio that is estimated from the data, given a standard normal prior on the log scale. Three distinct curves would not be identifiable from the data. If
If |
... |
Further arguments passed to |
A list including the following components
call
: Function call that was used.
fit
: An object containing posterior samples from the fitted model,
in the stanfit
format returned by the stan
function in the rstan package.
method
: Optimisation method that was chosen.
nage
: Number of years of age in the data
dat
: A list containing the input data in the form of numerators
and denominators.
stan_data
: Full list of data supplied to Stan
stan_inits
: Full list of parameter initial values supplied to Stan
hp_fixed
Values of any hyperparameters that are fixed during the main model fit.
Use the tidy.disbayes
method to return summary statistics
from the fitted models, simply by calling tidy()
on the fitted model.
Jackson C, Zapata-Diomedi B, Woodcock J. (2023) "Bayesian multistate modelling of incomplete chronic disease burden data" Journal of the Royal Statistical Society, Series A, 186(1), 1-19 doi:10.1093/jrsssa/qnac015
A variant of disbayes
in which data from different areas can be
related in a hierarchical model and, optionally, the effect of gender can be
treated as additive with the effect of area. This is much more computationally
intensive than the basic model in disbayes
. Time trends are not
supported in this function.
disbayes_hier( data, group, gender = NULL, inc_num = NULL, inc_denom = NULL, inc_prob = NULL, inc_lower = NULL, inc_upper = NULL, prev_num = NULL, prev_denom = NULL, prev_prob = NULL, prev_lower = NULL, prev_upper = NULL, mort_num = NULL, mort_denom = NULL, mort_prob = NULL, mort_lower = NULL, mort_upper = NULL, rem_num = NULL, rem_denom = NULL, rem_prob = NULL, rem_lower = NULL, rem_upper = NULL, age = "age", cf_init = 0.01, eqage = 30, eqagehi = NULL, cf_model = "default", inc_model = "smooth", rem_model = "const", prev_zero = FALSE, sprior = c(1, 1, 1), hp_fixed = NULL, nfold_int_guess = 5, nfold_int_upper = 100, nfold_slope_guess = 5, nfold_slope_upper = 100, mean_int_prior = c(0, 10), mean_slope_prior = c(5, 5), gender_int_priorsd = 0.82, gender_slope_priorsd = 0.82, inc_prior = c(1.1, 0.1), rem_prior = c(1.1, 1), method = "opt", draws = 1000, iter = 10000, stan_control = NULL, ... )
disbayes_hier( data, group, gender = NULL, inc_num = NULL, inc_denom = NULL, inc_prob = NULL, inc_lower = NULL, inc_upper = NULL, prev_num = NULL, prev_denom = NULL, prev_prob = NULL, prev_lower = NULL, prev_upper = NULL, mort_num = NULL, mort_denom = NULL, mort_prob = NULL, mort_lower = NULL, mort_upper = NULL, rem_num = NULL, rem_denom = NULL, rem_prob = NULL, rem_lower = NULL, rem_upper = NULL, age = "age", cf_init = 0.01, eqage = 30, eqagehi = NULL, cf_model = "default", inc_model = "smooth", rem_model = "const", prev_zero = FALSE, sprior = c(1, 1, 1), hp_fixed = NULL, nfold_int_guess = 5, nfold_int_upper = 100, nfold_slope_guess = 5, nfold_slope_upper = 100, mean_int_prior = c(0, 10), mean_slope_prior = c(5, 5), gender_int_priorsd = 0.82, gender_slope_priorsd = 0.82, inc_prior = c(1.1, 0.1), rem_prior = c(1.1, 1), method = "opt", draws = 1000, iter = 10000, stan_control = NULL, ... )
data |
Data frame containing some of the variables below. The variables below are provided as character strings naming columns in this data frame. For each disease measure available, one of the following three combinations of variables must be specified: (1) numerator and denominator (2) estimate and denominator (3) estimate with lower and upper credible limits. Mortality must be supplied, and at least one of incidence and prevalence. If remission is assumed to be possible, then remission data should also be supplied (see below). Estimates refer to the probability of having some event within a year, rather than rates. Rates per year $r$ can be converted to probabilities $p$ as $p = 1 - exp(-r)$, assuming the rate is constant within the year. For estimates based on registry data assumed to cover the whole population, then the denominator will be the population size. |
group |
Variable in the data representing the area (or other grouping factor). |
gender |
If |
inc_num |
Numerator for the incidence data, assumed to represent the
observed number of new cases within a year among a population of size
|
inc_denom |
Denominator for the incidence data. The function Note that to include extra uncertainty beyond that implied by a published interval, the numerator and denominator could be multiplied by a constant, for example, multiplying both the numerator and denominator by 0.5 would give the data source half its original weight. |
inc_prob |
Estimate of the incidence probability |
inc_lower |
Lower credible limit for the incidence estimate |
inc_upper |
Upper credible limit for the incidence estimate |
prev_num |
Numerator for the estimate of prevalence, i.e. number of people currently with a disease. |
prev_denom |
Denominator for the estimate of prevalence (e.g. the size of the survey used to obtain the prevalence estimate) |
prev_prob |
Estimate of the prevalence probability |
prev_lower |
Lower credible limit for the prevalence estimate |
prev_upper |
Upper credible limit for the prevalence estimate |
mort_num |
Numerator for the estimate of the mortality probability, i.e number of deaths attributed to the disease under study within a year |
mort_denom |
Denominator for the estimate of the mortality probability (e.g. the population size, if the estimates were obtained from a comprehensive register) |
mort_prob |
Estimate of the mortality probability |
mort_lower |
Lower credible limit for the mortality estimate |
mort_upper |
Upper credible limit for the mortality estimate |
rem_num |
Numerator for the estimate of the remission probability, i.e number of people observed to recover from the disease within a year. Remission data should be supplied if remission is permitted in the model, either as a numerator and denominator or as an estimate and lower credible interval. Conversely, if no remission data are supplied, then remission is assumed to be impossible. These "data" may represent a prior judgement rather than observation - lower denominators or wider credible limits represent weaker prior information. |
rem_denom |
Denominator for the estimate of the remission probability |
rem_prob |
Estimate of the remission probability |
rem_lower |
Lower credible limit for the remission estimate |
rem_upper |
Upper credible limit for the remission estimate |
age |
Variable in the data indicating the year of age. This must start at age zero, but can end at any age. |
cf_init |
Initial guess at a typical case fatality value, for an average age. |
eqage |
Case fatalities (and incidence and remission rates) are assumed to be equal for all ages below this age, inclusive, when using the smoothed model. |
eqagehi |
Case fatalities (and incidence and remission rates) are assumed to be equal for all ages above this age, inclusive, when using the smoothed model. |
cf_model |
The following alternative models for case fatality are supported:
In all models, case fatality is a smooth function of age. |
inc_model |
Model for how incidence varies with age.
|
rem_model |
Model for how remission varies with age. Currently
supported models are |
prev_zero |
If |
sprior |
Rates of the exponential prior distributions used to penalise the coefficients of the spline model. The default of 1 should adapt appropriately to the data, but Higher values give stronger smoothing, or lower values give weaker smoothing, if required. This can be a named vector with names This can also be an unnamed vector of three elements, where the first refers to the spline model for incidence, the second for case fatality, the third for remission. If one of the rates (e.g. remission) is not being modelled with a spline, any number can be supplied here and it is just ignored. |
hp_fixed |
A list with one named element for each hyperparameter to be fixed. The value should be either
If the element is either The hyperparameters that can be fixed are
For example, to fix the case fatality smoothness to 1.2, fix the incidence
smoothness to its posterior mode, and estimate all the other hyperparameters,
specify |
nfold_int_guess |
Prior guess at the ratio of case fatality between a high risk (97.5% quantile) and low risk (2.5% quantile) area. |
nfold_int_upper |
Prior upper 95% credible limit for the ratio in average case fatality between a high risk (97.5% quantile) and low risk (2.5% quantile) area. |
nfold_slope_guess , nfold_slope_upper
|
This argument and the next argument define the prior distribution for the variance in the random linear effects of age on log case fatality. They define a prior guess and upper 95% credible limit for the ratio of case fatality slopes between a high trend (97.5% quantile) and low risk (2.5% quantile) area. (Note that the model is not exactly linear, since departures from linearity are defined through a spline model. See the Jackson et al. paper for details.). |
mean_int_prior |
Vector of two elements giving the prior mean and standard deviation respectively for the mean random intercept for log case fatality. |
mean_slope_prior |
Vector of two elements giving the prior mean and standard deviation respectively for the mean random slope for log case fatality. |
gender_int_priorsd |
Prior standard deviation for the additive effect of gender on log case fatality |
gender_slope_priorsd |
Prior standard deviation for the additive effect of gender on the linear age slope of log case fatality |
inc_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the incidence rate. Only used if |
rem_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the remission rate, used in both |
method |
String indicating the inference method, defaulting to
If If If the optimisation fails to converge (non-zero return code), try increasing the
number of iterations from the default 1000, e.g. If there is an error message that mentions If |
draws |
Number of draws from the normal approximation to the posterior
when using |
iter |
Number of iterations for MCMC sampling, or maximum number of iterations for optimization. |
stan_control |
( |
... |
Further arguments passed to |
A list including the following components
call
: Function call that was used.
fit
: An object containing posterior samples from the fitted model,
in the stanfit
format returned by the stan
function in the rstan package.
method
: Optimisation method that was chosen.
nage
: Number of years of age in the data
narea
: Number of areas (or other grouping variable that defines the hierarchical model).
ng
: Number of genders (or other categorical variable whose effect is treated as
additive with the area effect).
groups
: Names of the areas (or other grouping variable), taken from the factor levels in the
original data.
genders
: Names of the genders (or other categorical variable), taken from the factor levels in the
original data.
dat
: A list containing the input data in the form of numerators
and denominators.
stan_data
: Full list of data supplied to Stan
stan_inits
: Full list of parameter initial values supplied to Stan
trend
: Whether a time trend was modelled
hp_fixed
Values of any hyperparameters that are fixed during the main model fit.
Jackson C, Zapata-Diomedi B, Woodcock J. "Bayesian multistate modelling of incomplete chronic disease burden data" https://arxiv.org/abs/2111.14100
Ischemic heart disease in England
ihdengland
ihdengland
A data frame with columns:
sex
: "male"
or "female"
.
ageyr
. Year of age.
location
. Name of the location, which is either a city region or region in England.
num_mort
. Numerator behind the estimate of mortality
num_inc
. Numerator behind the estimate of incidence
num_prev
. Numerator behind the estimate of prevalence
denom_mort
. Denominator behind the estimate of mortality
denom_inc
. Denominator behind the estimate of incidence
denom_prev
. Denominator behind the estimate of prevalence
The data were processed to
* change the geography to refer to England city regions and the remaining English regions,
* change counts by 5-year age groups to estimated 1-year counts,
* obtain estimated numerators and denominators from the published point estimates and uncertainty intervals.
A point estimate of the risk is equivalent to the numerator divided by the denominator. The denominator is
related to the extent of uncertainty around this estimate, and obtained using the Bayesian method
implemented in ci2num
.
The script given in https://github.com/chjackson/disbayes/blob/master/data-raw/gbd_process.Rmd shows these steps.
Global Burden of Disease, 2017
Jackson C, Zapata-Diomedi B, Woodcock J. "Bayesian multistate modelling of incomplete chronic disease burden data" https://arxiv.org/abs/2111.14100.
Trends in ischemic heart disease in England
ihdtrends
ihdtrends
A data frame with columns:
gender
: "male"
or "female"
.
age
: Year of age.
year
: Calendar year.
p2017
: Estimated ratio between the outcome in the calendar
year and the outcome in 2017.
outcome
: Outcome referred to (incidence or case fatality).
The data were interpolated and smoothed to produce a matrix by year of age and calendar year, using the script at https://github.com/chjackson/disbayes/blob/master/data-raw/trends.r.
Scarborough, P., Wickramasinghe, K., Bhatnagar, P. and Rayner, M. (2011) Trends in coronary heart disease, 1961-2001. British Heart Foundation.
Smolina, K., Wright, F. L., Rayner, M. and Goldacre, M. J. (2012) Determinants of the decline in mortality from acute myocardial infarction in England between 2002 and 2010: linked national database study. BMJ, 344.
British Heart Foundation (2020) Heart and Circulatory Disease Statistics 2020. British Heart Foundation.
Extract observation-specific contributions from a disbayes leave-one-out cross validation
loo_indiv(x, agg = FALSE) looi_disbayes(x, agg = FALSE)
loo_indiv(x, agg = FALSE) looi_disbayes(x, agg = FALSE)
x |
For |
agg |
If |
A data frame with one row per observed age-specific mortality, incidence, prevalence and/or remission age-specific data-point, containing leave-one-out cross validation statistics representing how well the model would predict that observation if it were left out of the fit.
These are computed with the loo package.
loo_indiv
acts on the objects that are returned by running loo
on disbayes
objects. looi_disbayes
acts directly on disbayes
objects. Both of those functions return a data frame with LOO contributions for each data point.
looi_disbayes()
: Observation-level leave-one-out cross validation statistics for a disbayes model
Leave-one-out cross validation for disbayes models
## S3 method for class 'disbayes' loo(x, outcome = "overall", ...)
## S3 method for class 'disbayes' loo(x, outcome = "overall", ...)
x |
A model fitted by |
outcome |
Either |
... |
Other arguments (currently unused). |
An object of class "loo"
as defined by the loo package.
loo_indiv
to return tidied observation-specific contributions
to the overall model fit computed here.
Posterior medians and 95
## S3 method for class 'disbayes' plot(x, variable = "cf", ...)
## S3 method for class 'disbayes' plot(x, variable = "cf", ...)
x |
Object returned by |
variable |
Name of the variable of interest to plot against age, by default case fatality rates. |
... |
Other arguments. Currently unused |
A ggplot2
object that can be printed to show the plot, or customised by adding geom
s.
Better plots can be drawn by tidy
ing the object returned by disbayes
, and using ggplot2
directly on the tidy data frame that this produces. See the vignette for examples.
Posterior medians and 95% credible intervals for a quantity of interest are plotted against year of age.
## S3 method for class 'disbayes_hier' plot(x, variable = "cf", ci = FALSE, ...)
## S3 method for class 'disbayes_hier' plot(x, variable = "cf", ci = FALSE, ...)
x |
Object returned by |
variable |
Name of the variable of interest to plot against age, by default case fatality rates. |
ci |
Show 95% credible intervals with ribbons. |
... |
Other arguments. Currently unused |
A ggplot2
object that can be printed to show the plot, or
customised by adding geom
s.
Better plots can be drawn by tidy
ing the object returned by disbayes
, and using ggplot2
directly on the tidy data frame that this produces. See the vignette for examples.
Create tidy data for a check of observed against fitted outcome probability estimates from disbayes
plotfit_data_disbayes(x)
plotfit_data_disbayes(x)
x |
Fitted model from |
A data frame containing observed data in the form of outcome probabilities, as extracted by tidy_obsdat
, and estimates of the corresponding probability parameters from the fitted model.
The data behind the plot can be produced using plotfit_data_disbayes
,
to enable customised plots to be produced by hand with ggplot2
.
plotfit_disbayes(x, agemin = 50)
plotfit_disbayes(x, agemin = 50)
x |
Fitted model from |
agemin |
Minimum age to show on the horizontal axis. |
A ggplot2
object containing the plot.
Extract observed data from a disbayes model fit
tidy_obsdat(x)
tidy_obsdat(x)
x |
Fitted |
A data frame with columns num
and denom
giving the incidence, prevalence and mortality (and remission if used) numerators and denominators used in the model fit. The column var
indicates which of incidence, prevalence etc. the numbers refer to. The column prob
is derived as num
divided by denom
. Columns lower
and upper
define credible intervals for the "data-based" point estimate prob
, obtained from the Beta posterior assuming a Beta(0.5, 0.5) prior.
This "data-based" point estimate can be compared with estimates from the model using the functions plotfit_data_disbayes
and plotfit_disbayes
.
Simply call this after fitting disbayes, as, e.g.
res <- disbayes(...) tidy(res)
## S3 method for class 'disbayes' tidy(x, startyear = 1, ...) ## S3 method for class 'disbayes_hier' tidy(x, ...)
## S3 method for class 'disbayes' tidy(x, startyear = 1, ...) ## S3 method for class 'disbayes_hier' tidy(x, ...)
x |
Object returned by |
startyear |
Only used for models with time trends. Numeric year represented by year 1 in the data. For example, set this to 1918 to convert years 1-100 to years 1918-2017. |
... |
Other arguments (currently unused) |
A data frame with one row per model parameter, giving summary statistics for the posterior distribution for that parameter. For array parameters, e.g. those that depend on age or area, then the age and area are returned in separate columns, to make it easier to summarise and plot the results, e.g. using ggplot2.
Model parameters might include, depending on the model specification,
cf
, inc
, rem
: Case fatality, incidence, remission rates
inc_prob
, rem_prob
, mort_prob
, cf_prob
: Annual incidence, remission, mortality and case fatality risks (probabilities).
prev_prob
Prevalence (probability).
state_probs
State occupancy probabilities.
beta
, beta_inc
Coefficients of the spline basis for case fatality and incidence respectively.
lambda_cf
, lambda_inc
Smoothness parameters of the spline functions.
prevzero
Prevalence at age zero
cfbase
Case fatality at the baseline age (only in models where case fatality is increasing).
dcf
Annual increments in case fatality (only in models where case fatality is increasing).
bias_loghr
Log hazard ratio describing bias in case fatality between datasets (only in models where bias_model
has been set).
For models with time trends:
cf_yr
, inc_yr
, state_probs_yr
Case fatality rates, incidence rates and state occupancy probabilities in years prior to the current year. cf
and inc
refer to the rates for the current year, the one represented in the data.
Only for hierarchical models:
mean_inter
, mean_slope
,sd_inter
,sd_slope
. Mean and standard deviation for random effects distribution for the intercept and slope of log case fatality.
lambda_cf_male
, lambda_inc_male
. Smoothness of the additive gender effect on case fatality and incidence.
bareat
Area-level contribution to spline basis coefficients.
barea
Normalised spline basis coefficients.
tidy(disbayes_hier)
: Tidy method for hierarchical disbayes models