---
title: "Misclassification multi-state models in msmbayes"
author: "Christopher Jackson <chris.jackson@mrc-bsu.cam.ac.uk>"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_document:
    toc: true
    toc_float: true
    theme: simplex
    number-sections: true
resource_files:
  - ../man/figures/twostatephase.png
  - ../man/figures/threestatephase.png
  - ../man/figures/twostate.png
vignette: >
  %\VignetteIndexEntry{Misclassification multi-state models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

In a **hidden Markov model**, an individual moves between a set of latent, unobserved states according to a Markov process.  The observed data are generated conditionally on the latent states. 


# Multi-state models with misclassification

Hidden Markov models can be used to account for misclassification of states in a multi-state model. 

To fit a misclassification multi-state models in `msmbayes`, the structure of allowed misclassifications is supplied in the `E` argument (the "e" stands for "emission"). This is a matrix with off-diagonal entries: 

* 1 if true state [row number] can be misclassified as [column number]

* 0 if true state [row number] cannot be misclassified as [column number]

The diagonal entries of `E` are ignored (as for the `Q` argument).

The following example is discussed in the [`msm` user guide](https://cran.r-project.org/web/packages/msm/vignettes/msm-manual.pdf) (Section 2.14).  We model progression between three states of CAV (a disease experienced by heart transplant recipients), and allow death from any of these states.   True state 1 can be misclassified as 2, true state 2 can be misclassified as 1 or 3, and true state 3 can be misclassified as 2. 

For speed in this demo, we use Stan's `"optimize"` method, which uses the simple normal approximation to the posterior.  MCMC would probably be more sensible in a real application.

```{r,results="hide"}
library(msmbayes)
library(msm)
Qcav <- rbind(c(0, 1, 0, 1),
              c(0, 0, 1, 1), 
              c(0, 0, 0, 1),
              c(0, 0, 0, 0))
Ecav <- rbind(c(0, 1, 0, 0),
              c(1, 0, 1, 0),
              c(0, 1, 0, 0),
              c(0, 0, 0, 0))
draws <- msmbayes(data=cav, state="state", time="years", subject="PTNUM", 
                  Q=Qcav, E=Ecav, fit_method="optimize")
```

```{r}
qmatrix(draws)
```

The function `edf` extracts the misclassification (or "emission") probabilities in a tidy data frame form. 
```{r}
edf(draws)
```

An identical non-Bayesian model is fitted using `msm()`.

> **Note**: this is different from the model fitted in the `msm` manual, since "exact death times" are not supported in `msmbayes`.  Also note that `msm` requires informative initial values for the non-zero intensities and misclassification probabilities here. For hidden Markov models, `msm` is not smart enough to determine good initial values automatically given the transition structure.

```{r}
Qcav <- rbind(c(0, 0.148, 0, 0.0171), c(0, 0, 0.202, 0.081), 
              c(0, 0, 0, 0.126), c(0, 0, 0, 0))
Ecav <- rbind(c(0, 0.1, 0, 0), c(0.1, 0, 0.1, 0),
              c(0, 0.1, 0, 0), c(0, 0, 0, 0))
cav.msm <- msm(state ~ years, subject=PTNUM, data=cav, qmatrix=Qcav, ematrix=Ecav)
qmatrix.msm(cav.msm, ci="none")
ematrix.msm(cav.msm, ci="none")
```

The parameter estimates from `msm` are close to those from `msmbayes`, with any differences explainable by the influence of the weak prior.


## Specifiying prior distributions for misclassification probabilities

In `msmbayes`, normal prior distributions are assumed for the _log odds_ of 
misclassification.   Denote the misclassification error probability 
by $e_{rs}$, the probability that an individual in state $r$ is observed
in state $s$.  The corresponding log odds of misclassification is $log(e_{rs} / e_{rr})$, the log odds of being misclassified in state $s$, relative to no 
misclassification. 

The default normal(0,1) for these log odds parameters is intended to
give a roughly uniform distribution on the scale of probabilities.

To specify the mean and SD of these normal priors by hand, use
`msmprior` as follows, with `loe(r,s)` indicating the log-odds of
misclassication of state $r$ as state $s$.

```{r}
priors <- list(msmprior("loe(1,2)", mean=-2, sd=0.2))
draws_prior <- msmbayes(data=cav, state="state", time="years", subject="PTNUM", 
                  Q=Qcav, E=Ecav, fit_method="optimize", priors=priors)
edf(draws_prior)
```

If there is only one potential misclassification $s$ for some state $r$, 
then the log odds of misclassification is just the standard logit of $e_{rs}$.
In the above model, the prior median 95\% credible interval implied by the 
normal(-2, 0.2) prior can be deduced by taking the inverse logit of the 
prior quantiles.  This prior is fairly tight around a misclassification 
probability of 0.1, and appears to have the effect of pulling it away
from the value estimated from the data.

```{r}
plogis(qnorm(c(0.025, 0.5, 0.975), -2, 0.2))
```
With multiple misclassification possibilities per true state, a multinomial
logit transform is needed.  To deduce the prior beliefs about probabilities implied by a
particular prior mean and SD, a simple approach is to use simulation.  For a
particular true state $r$, simulate from the normal priors for all
potential observed states $s$, then use an inverse multinomial logit
transform to deduce the corresponding sample for the set of $e_{rs}$,
which satisfies $\sum_s e_{rs} = 1$.

## Fixed misclassification probabilities

Misclassification error probabilities in multi-state models for intermittently-observed
data are often not identifiable from data.  Typically, background information about the 
observation process is needed.  If there is good evidence about the error proabilities, it may be sufficient to fix these at constant values.  In `msmbayes`, this can be done with the `Efix` argument.  This is a matrix matching the dimensions of `E`, but with any 
fixed error probabilities supplied in the appropriate places, and zero elsewhere.
The following model fixes the probability that true state 1 is misclassified as 2 to 0.1. 


```{r}
Efix <- rbind(c(0, 0.1, 0, 0), c(0, 0, 0, 0),
              c(0, 0, 0, 0), c(0, 0, 0, 0))
draws_fix <- msmbayes(data=cav, state="state", time="years", subject="PTNUM", 
                      Q=Qcav, E=Ecav, Efix=Efix, fit_method="optimize")
```

Using a prior is a compromise between fixing these parameters and attempting to identify them
from data.   An advantage of the Bayesian approach is that, as long as the computational
algorithm works, we have a valid posterior.   Then if the marginal posterior for this parameter is the same as the prior, we can deduce there is no information in the data about this particular parameter. If this prior is defensible, we still have a useful model for the data.