06. Sensitivity analysis and sampling
(1) One-way sensitivity analyses
First, let’s run an ODE model. Download and open up the SIRmodel.R
(a) Which functions are in here?
Answer:
(b) What are the arguments of the first function?
Answer:
(c) What is the output of the first function?
Answer:
# First let's clear our workspace, remove plots and load the libraries we need
rm(list=ls())
dev.off()
library(deSolve)
library(ggplot2)
# Let's read in these functions so we have them to hand
<YOUR CODE HERE>
# Let's choose a beta value of 0.4 and a gamma value of 0.2
<- solveODE(parameters <- c(beta = 0.4, gamma = 0.2))
max.prevalence print(max.prevalence)
# Now let's look at the effect of the maximum prevalence of the epidemic across
# gamma = 0.1 -1.0 (increment on 0.1)
<- <YOUR CODE HERE>
gamma.vec
# initialise max/prevalence container
<- vector()
max.prevalence
# Add in a loop to make this happen
<YOUR CODE HERE>
<- solveODE(parameters = c(beta = 0.4, gamma = gamma.val),
mp plot.all.results = FALSE)
<- c(max.prevalence, mp)
max.prevalence
<YOUR CODE HERE>
# Now we have our max.prevalence, we need to plot this against our
# infectiousness duration - plot max.prevalence as a function of the
# \*infectiousness duration\*
par(new=FALSE)
par(mfrow=c(1,1))
plot(<YOUR CODE HERE>,
<YOUR CODE HERE>,
type = "b",
xlab = "Infectiousness Duration (days)",
ylab = "Maximum Prevalence",
main = "One-way uncertainty analysis")
# Now try to increase the resolution of gamma to get a better idea of the
# relationship but remember to clear max.prevalence first!
<YOUR CODE HERE>
(d) Describe in words the qualitative relationship
Answer:
(2) Monte Carlo Sampling
Now suppose that we have a previous epidemiological study that suggested that R0 has a mean value of 5, but uncertainty within the range of [-1, +1]. However, we still don’t know whether the infectiousness period is 1 day or 10 days. We will now use the functions in SIRmodel_R0.R to make a similar plot as above, but this time, incorporate the uncertainty of R0 for each discrete value of gamma. We’re going to first use a direct Monte Carlo Sampling method.
# Read in our set of functions in SIRmodel_R0.R
<YOUR CODE HERE>
# First, let's set a fixed seed for the random number generator
# this will allow us to run the code again and retrieve the same 'simulation'
set.seed(2019)
# Now, draw R0 1,000 times from a suitable distribution (e.g. normal)
= <YOUR CODE HERE>
r0.all = length(r0.all) * length(gamma.vec)
size.df
# initialise max.prevalence again, this time it needs to be a dataframe
# or a matrix
= data.frame(r0.value = vector(mode = "numeric",
max.prevalence length = size.df),
gamma = vector(mode= "numeric", length = size.df),
max.prev = vector(mode= "numeric", length = size.df))
= 0
index
# create a loop over each of these R0 values in turn
for (r0.val in r0.all){
# create a loop over each of these Gamma values in turn
for (gamma.val in gamma.vec){
= index + 1
index = solveODE_2(parameters = c(R0 = r0.val, gamma = gamma.val))
mp "r0.value"] = r0.val
max.prevalence[index, "gamma"] = gamma.val
max.prevalence[index, "max.prev"] = mp
max.prevalence[index,
}
}
# Take a look at max.prevalence by using the 'head() function
head(max.prevalence)
(e) How have we saved the output?
Answer:
function MCplot() in SIRmodel_R0.R
Now plot this output using the R MCplot(max.prevalence)
(f) What conclusions can you draw from the plot?
Answer:
(3) Latin hypercube sampling (LHS) vs Monte Carlo Sampling
# First let's load in the library we'll need for later
library(lhs)
# We're going to first sample directly from a full distribution uniform
# distribution from 0 to 1. How many samples will we need to take?
# Let's try a few options and see how well they do
par(mfrow=c(3,2))
hist(rnorm(10))
hist(rnorm(100))
hist(rnorm(1000))
hist(rnorm(10000))
hist(rnorm(20000))
# Now let's plot the sample sizes against the variance of the sample distribution
plot(
c(10,100,1000,10000,50000,100000),
c(var(rnorm(10)), var(rnorm(100)), var(rnorm(1000)), var(rnorm(10000)),
var(rnorm(50000)), var(rnorm(100000))),
ylab = "variance", main = "Variance of sampled normal"
)abline(h = 1)
# Let's now use 100 samples to see the difference between a Monte Carlo
# sampling and a LHS sampling approach
# Pick some small number of samples
<- 100
n
# First we're going to sample 100 times from a random sample
<- runif(n)
mc_unif
# 100 lh samples across 1 parameter
<- randomLHS(n, 1)
latin_unif
# plot these two distribution
dev.off()
par(mfrow=c(3,2))
hist(mc_unif)
hist(latin_unif)
# You can see how the Latin Hypercube does a great job of sampling evenly across
# the distribution. Let's now sample from a Normal distribution using a random
# monte carlo sample across the whole distribution.
<- rnorm(n, mean = 0, sd = 1)
mc_norm
# How do we sample using an LHS? We use the previous numbers generated from the
# uniform LHS to draw samples from the Normal using the Inverse Cumulative
# Sampling
<- qnorm(latin_unif, mean = 0, sd = 1)
latin_norm
# plot these two normal distributions
hist(mc_norm)
hist(latin_norm)
# the latin hypercube sample looks much better! Why does this work?
# first let's look at the norm probability distibution
<- seq(-6,6, by =0.1) # random variable X
x <- dnorm(x, mean = 0, sd = 1) # prob distribution, f(X)
normdens <- pnorm(x, mean = 0, sd = 1) # cumulative distribution, F(X)
normcumul
plot(x, normdens, "l")
plot(x, normcumul, "l")
# Most of the density is in the middle range of values (-1 to 1).
# So we want a method to sample from this more often than the other areas in
# the distribution. Specifically, we want to sample values from X proportionally
# to the probability of those values occuring. Let's generate some samples
# between 0-1. These can be values on our Y-axis. Then, if we ask what is the
# value of the cumulative distribution that corresponds to these uniform
# values we are taking the inverse for illustration let's just choose 10 points.
<- randomLHS(10, 1)
ex_latin
# which X values are given by using these as the Y value (denoted by "X"s)?
abline(h = ex_latin, col = "red")
points(qnorm(ex_latin, mean = 0, sd = 1), y=rep(0,10), pc = "x")
# You can see that the samples are clustered around the middle: in areas of X
# with higher density, the gradient of the cumluative distribution (F(X)) will
# be very steep, causing more values between 0 and 1 to map to this range of X
# with high density That is, F\^{-1}(R) = X where R is a uniform random number
# between 0 and 1.
# So, you can sample from any distribution whose cumluative function is
# 'invertable' by plugging in uniform random numbers to the inverse cumulative
# function of your new distribution. For more information check out:
# https://en.wikipedia.org/wiki/Inverse_transform_sampling
# Likewise, to perform LHS on a uni- or mulitvariate non-uniform distribution,
# we can transform our LHS samples from a uniform distribution as above.
Solutions to this practical can be accessed here.