06. Sensitivity analysis and sampling: Solutions

Click here to return to the practical.

(1) One-way sensitivity analyses

First, let’s run an ODE model. Download and open up the SIRmodel.R

(a) Which functions are in here?

Answer: solveODE() and SIR_model()

(b) What are the arguments of the first function?

Answer: (1) the parameter values lists, (2) Argument to plot everything, (3) number of row to plot (4) number of cols to plot

(c) What is the output of the first function?

Answer: maximum prevalence through the epidemic

# First let's clear our workspace, remove plots and load the libraries we need
rm(list=ls())
# dev.off()
library(deSolve)
library(ggplot2)

# Let's read in these functions so we have them to hand
source("code/SIRmodel.R")

# Let's choose a beta value of 0.4 and a gamma value of 0.2
max.prevalence <- solveODE(parameters <- c(beta = 0.4, gamma = 0.2))

Warning in par(new = TRUE): calling par(new=TRUE) with no plot

print(max.prevalence)

[1] 15.84403

# Now let's look at the effect of the maximum prevalence of the epidemic across 
# gamma = 0.1 -1.0 (increment on 0.1)
gamma.vec <- seq(0.1, 1.0, by = 0.1)

# initialise max/prevalence container
max.prevalence <- vector()

# Add in a loop to make this happen
for (gamma.val in gamma.vec){
  mp = solveODE(parameters = c(beta = 0.4, gamma = gamma.val),
                                         plot.all.results = FALSE)
  max.prevalence = c(max.prevalence, mp)
}

Now we have our max.prevalence, we need to plot this against our infectiousness duration - plot max.prevalence as a function of the infectious duration.

par(new=FALSE)
par(mfrow=c(1,1))

plot(1/gamma.vec, 
     max.prevalence, 
     type = "b",
     xlab = "Infectiousness Duration (days)",
     ylab = "Maximum Prevalence",
     main = "One-way uncertainty analysis")

# Now try to increase the resolution of gamma to get a better idea of the 
# relationship but remember to clear max.prevalence first!

# You could try and replace gamma.vec = seq(0.1, 1.0, by = 0.1) with
inf.duration = 1:10
gamma.vec = 1/inf.duration

(d) Describe in words the qualitative relationship

Answer: There is no epidemic until the infectiousness duration is > 2 days (R0 > 1) after that there is a linear increase in the maximum prevalence until gamma = 6, then there is a diminishing increase in the maximum prevalence.

(2) Monte Carlo Sampling

Now suppose that we have a previous epidemiological study that suggested that R0 has a mean value of 5, but uncertainty within the range of [-1, +1]. However, we still don’t know whether the infectiousness period is 1 day or 10 days. We will now use the functions in SIRmodel_R0.R to make a similar plot as above, but this time, incorporate the uncertainty of R0 for each discrete value of gamma. We’re going to first use a direct Monte Carlo Sampling method.

# Read in our set of functions in SIRmodel_R0.R
source("code/SIRmodel_R0.R")

# First, let's set a fixed seed for the random number generator
# this will allow us to run the code again and retrieve the same 'simulation'
set.seed(2019)

# Now, draw R0 1,000 times from a suitable distribution (e.g. normal)
r0.all = rnorm(1000, 5, 0.5)
size.df = length(r0.all) * length(gamma.vec)

# initialise max.prevalence again, this time it needs to be a dataframe 
# or a matrix
max.prevalence = data.frame(r0.value = vector(mode = "numeric", 
                                            length = size.df),
                            gamma = vector(mode = "numeric", 
                                           length = size.df),
                            max.prev = vector(mode = "numeric", 
                                              length = size.df))
index = 0

# create a loop over each of these R0 values in turn
for (r0.val in r0.all){
# create a loop over each of these Gamma values in turn
    for (gamma.val in gamma.vec){
    index = index + 1
    mp = solveODE_2(parameters = c(R0 = r0.val, gamma = gamma.val))
    max.prevalence[index, "r0.value"] = r0.val
    max.prevalence[index, "gamma"] = gamma.val
    max.prevalence[index, "max.prev"] = mp
    }
}

# Take a look at max.prevalence by using the 'head() function
head(max.prevalence)

  r0.value     gamma max.prev
1 5.369261 1.0000000 42.96408
2 5.369261 0.5000000 49.88705
3 5.369261 0.3333333 49.48095
4 5.369261 0.2500000 49.90858
5 5.369261 0.2000000 50.19772
6 5.369261 0.1666667 49.91605

(e) How have we saved the output?

Answer: using ‘wide’ formatting – see the ggplot pre-course material

# Now plot this output using the R function MCplot() in SIRmodel_R0.R
MCplot(max.prevalence)

Warning: Removed 10 rows containing non-finite outside the scale range
(`stat_ydensity()`).

(f) What conclusions can you draw from the plot?

Answer: Increasing the rate of recovery reduces the max prevalence. However, the uncertainty in R0 has a larger impact on the maximum prevalence than infectious duration. In fact, until the infectiousness duration decreases below 4 days, the value of R0 is the important parameter in determining prevalence.

(3) Latin hypercube sampling (LHS) vs Monte Carlo Sampling

# First let's load in the library we'll need for later
library(lhs)

Warning: package 'lhs' was built under R version 4.3.3

# We're going to first sample directly from a full distribution uniform 
# distribution from 0 to 1. How many samples will we need to take? 
# Let's try a few options and see how well they do

par(mfrow=c(3,2))
hist(rnorm(10))
hist(rnorm(100))
hist(rnorm(1000))
hist(rnorm(10000))
hist(rnorm(20000))

# Now let's plot the sample sizes against the variance of the sample distribution

plot(
    c(10,100,1000,10000,50000,100000),
    c(var(rnorm(10)), var(rnorm(100)), var(rnorm(1000)), var(rnorm(10000)), 
      var(rnorm(50000)), var(rnorm(100000))),
    ylab = "variance", main = "Variance of sampled normal"
    )
abline(h = 1)

# Let's now use 100 samples to see the difference between a Monte Carlo 
# sampling and a LHS sampling approach.
# Pick some small number of samples
n <- 100

# First we're going to sample 100 times from a random sample
mc_unif <- runif(n)

# 100 lh samples across 1 parameter
latin_unif <- randomLHS(n, 1)

# plot these two distribution
# dev.off()
par(mfrow=c(3,2))
hist(mc_unif)
hist(latin_unif)

# You can see how the Latin Hypercube does a great job of sampling evenly 
# across the distribution. Let's now sample from a Normal distribution using 
# a random monte carlo sample across the whole distribution.
mc_norm <- rnorm(n, mean = 0, sd = 1)

# How do we sample using an LHS? We use the previous numbers generated from the 
# uniform LHS to draw samples from the Normal using the Inverse Cumulative 
# Sampling.
latin_norm <- qnorm(latin_unif, mean = 0, sd = 1)

# plot these two normal distributions
hist(mc_norm)
hist(latin_norm)

# the latin hypercube sample looks much better! Why does this work?
# first let's look at the norm probability distibution
x <- seq(-6, 6, by = 0.1) # random variable X
normdens <- dnorm(x, mean = 0, sd = 1) # prob distribution, f(X)
normcumul <- pnorm(x, mean = 0, sd = 1) # cumulative distribution, F(X)

plot(x, normdens, "l")
plot(x, normcumul, "l")

# Most of the density is in the middle range of values (-1 to 1).

# So we want a method to sample from this more often than the other areas in 
# the distribution. Specifically, we want to sample values from X proportionally 
# to the probability of those values occuring. Let's generate some samples 
# between 0-1. These can be values on our Y-axis. Then, if we ask what is the 
# value of the cumulative distribution that corresponds to these uniform 
# values we are taking the inverse for illustration let's just choose 10 points.
ex_latin <- randomLHS(10, 1)

# which X values are given by using these as the Y value (denoted by "X"s)?
abline(h = ex_latin, col = "red")
points(qnorm(ex_latin, mean = 0, sd = 1), y=rep(0,10), pc = "x")

You can see that the samples are clustered around the middle: in areas of X with higher density, the gradient of the cumluative distribution (F(X)) will be very steep, causing more values between 0 and 1 to map to this range of X with high density That is, \(F^{-1}(R) = X\) where R is a uniform random number between 0 and 1.

So, you can sample from any distribution whose cumluative function is ‘invertable’ by plugging in uniform random numbers to the inverse cumulative function of your new distribution. For more information check out: https://en.wikipedia.org/wiki/Inverse_transform_sampling

Likewise, to perform LHS on a uni- or mulitvariate non-uniform distribution, we can transform our LHS samples from a uniform distribution as above.