Process Based Models

Bayesian Methods for Ecological and Environmental Modelling

David Cameron
UKCEH Edinburgh

What we will cover

11:30 am-1 pm: Session 6b/7a Process Based Models

  • What is a process based model (PBM)?
  • Why considering uncertainties in PBM predictions may be important
  • Sources of uncertainty in the Bayesian inference of PBMs
  • Worked example: Bayesian inference of a simple PBM (VSEM) using BayesianTools

1-2 pm: Lunch break

2-3:30 pm: Session 6b/7a Process Based Models

  • Practical 7a: Process Based Models
  • Why it is important to represent all sources of uncertainty in Bayesian inference?
  • Recap and Summary

What is a process based model? (PBM)

Model structure based on mechanistic process understanding

Often deterministic

Good for extrapolation (eg forecasting) and “what if” type questions

Examples:

  • General Circulation Models GCMs (Climate modelling)
  • Numerical Weather Prediction Models (Weather forcasting)
  • Even Newton’s Laws of Motion

PBM inputs and outputs

Model Tuning Approach

Aim to understand/predict something in nature

Build a model representing our current best understanding of the processes involved

  • Model includes uncertain “tunable” parameters

Collect evidence/data to test/validate model against

Model does not agree with data/evidence

  • Try to improve model performance against data by “tuning” parameters

Problems with Model Tuning

Errors in models and data are acknowledged but often ignored

  • Data collected is considered ‘truth’ but we know this is wrong
  • Tuning an imperfect model against data where errors are ignored often leads to overfitting

Predictions are made without quantifying their reliability

  • Not useful for decision making

Does it matter that we ignore uncertainty?

Storm of 1987

“…earlier on today apparently a woman rang the BBC and said she’d heard there was a hurricane on the way. Well if you are watching don’t worry, there isn’t…”

Storm of 1987

“…earlier on today apparently a woman rang the BBC and said she’d heard there was a hurricane on the way. Well if you are watching don’t worry, there isn’t…”

18 died in Britain, four in France

Devastation costs more than one billion pounds

An estimated 15 million trees were lost

Worst storm since 1703, one in 200 year storm for southern Britain

A public enquiry was held and an internal enquiry was conducted by the Met Office

“…unlike 1987, we now run multiple forecasts (called ensembles) that help us to understand the probabilities involved in a forecast and give a better estimate of the risk of high impact weather events.”

Climate Uncertainty

Model predictions without uncertainty could miss less likely but high impact events

Updated process of analysis

Uncertainty in PBMs

Sources of uncertainty

“random” uncertainty in observations

Variability at scales that we are currently not interested in modelling/predicting.

human or instrument error in taking measurements

representative error

Do observational samples adequately represent the site/system we are interested in - experimental design

Is measurement site (a)typical of (eg pine forest) sites in Europe?

Sources of uncertainty

uncertainty in model driving/input data

Climate/weather data, N deposition, management etc.

model initial value uncertainty

For example which initial soil/tree values are consistent with each other and present day observations

missing or erroneous model processes/structure

All PBMs are wrong

lack of knowledge of model parameter values

The Very Simple Ecosystem Model (VSEM)

Toy model (designed as part of a theory paper)

Included in BayesianTools (R package that we will use)

Very simplified form of more complex models

To give a concrete example of inferring the parameters of a PBM

VSEM Photosynthesis

Gross/Net primary productivity

\[\begin{align} GPP &= PAR \times LUE \times (1 - \exp^{(-KEXT \times LAI)})\\ NPP &= (1-GAMMA) \times GPP \end{align}\]

Net Ecosystem Exchange

\[\begin{align} NEE &= (\frac{C_s}{\tau_s} + GAMMA \times GPP) - GPP \end{align}\]

PAR - Photosynthetically active radiation

KEXT - Beer’s law light extinction coeff

LUE - Light use efficiency

LAI - Leaf area index (LAR * Cv)

LAR - Leaf area ratio

Cv/Cs - Vegetative/Soil carbon

GAMMA - Fraction of GPP that is Autotrophic Respiration

VSEM state equations

\[\begin{align} \frac{dC_v}{dt} &= A_v \times NPP &- \frac{C_v}{\tau_v} \\ \frac{dC_r}{dt} &= (1.0-A_v) \times NPP &- \frac{C_r}{\tau_r}\\ \frac{dC_s}{dt} &= \frac{C_r}{\tau_r} + \frac{C_v}{\tau_v} &- \frac{C_s}{\tau_s} \end{align}\]

\(C_v\), \(C_r\) and \(C_s\) : Carbon in vegetation, root and soil pools

Run VSEM

library(BayesianTools)
refPars               <- VSEMgetDefaults()
knitr::kable(refPars)
best lower upper
KEXT 0.500 2e-01 1e+00
LAR 1.500 2e-01 3e+00
LUE 0.002 5e-04 4e-03
GAMMA 0.400 2e-01 6e-01
tauV 1440.000 5e+02 3e+03
tauS 27370.000 4e+03 5e+04
tauR 1440.000 5e+02 3e+03
Av 0.500 2e-01 1e+00
Cv 3.000 0e+00 4e+02
Cs 15.000 0e+00 1e+03
Cr 3.000 0e+00 2e+02
ndays                 <- 1000
PAR                   <- VSEMcreatePAR(1:ndays)
out <- VSEM(refPars$best, PAR)
plot(PAR,main="Photosythetically Active Radiation (PAR)",xlab="day",type="l")

Worked example Bayesian inference of VSEM

Inference of model parameters and initial conditions of carbon pools (Cv, Cs, Cr)

The model structure is fixed i.e. is not being inferred

model inputs and parameters

PAR <- VSEMcreatePAR(1:1000)

refPars <- VSEMgetDefaults()

refPars[12,] <- c(0.1, 0.01, 0.7)
rownames(refPars)[12] <- "error-sd"

parSel = c(1,3,5,6,8,9,10,11,12)

Create virtual observations

We will not use measured data from the field, but virtual data.

referenceData <- VSEM(refPars$best[1:11], PAR) 
referenceData[,1] = 1000 * referenceData[,1] 

aobs = referenceData
aobs[,1] = abs(referenceData[,1]) 
maobs = colMeans(aobs)

obs = referenceData
obs[,1] <- referenceData[,1] + rnorm(length(referenceData[,1]), sd = maobs[1]*refPars$best[12])
obs[,2] <- referenceData[,2] + rnorm(length(referenceData[,1]), sd = maobs[2]*refPars$best[12])
obs[,3] <- referenceData[,3] + rnorm(length(referenceData[,1]), sd = maobs[3]*refPars$best[12])
obs[,4] <- referenceData[,4] + rnorm(length(referenceData[,1]), sd = maobs[4]*refPars$best[12])

Why BayesianTools

PBMs often preexist in a language (R, python, FORTRAN, C) Could be costly to recode in an Bayesian package’s bespoke language

  • complex models could be hundreds/thousands of lines of code

For PBMs very useful to be able to code a bespoke likelihood function as we need control over calling the PBM

Few MCMC packages allow this (eg BayesianTools, NIMBLE)

Likelihood

Gaussian likelihood Logarithm helps avoid numerical rounding issues

likelihood <- function(par, sum = TRUE){
  x = refPars$best
  x[parSel] = par

  predicted <- VSEM(x[1:11], PAR) # replace here VSEM with your model 
  predicted[,1] = 1000 * predicted[,1] # this is just rescaling

  diff       <- c(predicted[,1] - obs[,1])
  llValues1  <- dnorm(diff, sd = x[12]*maobs[1], log = T)
  diff       <- c(predicted[,2] - obs[,2])
  llValues2  <- dnorm(diff, sd = x[12]*maobs[2], log = T)
  diff       <- c(predicted[,3] - obs[,3])
  llValues3  <- dnorm(diff, sd = x[12]*maobs[3], log = T)
  diff       <- c(predicted[,4] - obs[,4])
  llValues4  <- dnorm(diff, sd = x[12]*maobs[4], log = T)

  return(sum(llValues1,llValues2,llValues3,llValues4))
}

Prior

Should reflect prior thinking about uncertainty in parameters

prior <- createUniformPrior(lower = refPars$lower[parSel], 
                            upper = refPars$upper[parSel], 
                            best = refPars$best[parSel])

Run inference

More than one chain helps test for convergence

bayesianSetup <- createBayesianSetup(likelihood, prior, 
                       names = rownames(refPars)[parSel])
settings <- list(iterations = 10000, nrChains = 2)

out <- runMCMC(bayesianSetup = bayesianSetup, 
               sampler = "DEzs", settings = settings)

 Running DEzs-MCMC, chain  1 iteration 300 of 10002 . Current logp  -1599451 -2147310 -1640868 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 600 of 10002 . Current logp  -423972.1 -80019.52 -313745.3 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 900 of 10002 . Current logp  -25951.1 -33853.91 -159706.7 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 1200 of 10002 . Current logp  -25951.1 -21184.14 -28965.66 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 1500 of 10002 . Current logp  -19649.1 -21032.64 -28131.36 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 1800 of 10002 . Current logp  -12977.06 -18106.21 -22769.04 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 2100 of 10002 . Current logp  -11552.43 -10125.7 -14424.29 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 2400 of 10002 . Current logp  -11036.06 -10125.7 -10988.93 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 2700 of 10002 . Current logp  -10602.23 -9865.423 -9802.112 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 3000 of 10002 . Current logp  -9530.129 -9397.72 -9440.233 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 3300 of 10002 . Current logp  -7936.504 -9036.6 -8971.231 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 3600 of 10002 . Current logp  -7822.063 -8689.781 -8844.316 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 3900 of 10002 . Current logp  -7822.063 -8618.651 -7819.166 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 4200 of 10002 . Current logp  -7728.498 -8307.705 -7819.166 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 4500 of 10002 . Current logp  -7716.096 -7650.427 -7819.166 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 4800 of 10002 . Current logp  -7465.004 -7607.794 -7819.166 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 5100 of 10002 . Current logp  -7360.114 -7552.411 -7626.691 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 5400 of 10002 . Current logp  -7098.607 -7403.257 -7501.555 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 5700 of 10002 . Current logp  -6885.381 -7347.741 -7501.555 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 6000 of 10002 . Current logp  -6729.716 -6849.728 -7358.918 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 6300 of 10002 . Current logp  -6636.659 -6418.898 -6962.166 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 6600 of 10002 . Current logp  -6636.659 -6103.755 -6258.434 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 6900 of 10002 . Current logp  -6507.101 -6062.098 -6258.434 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 7200 of 10002 . Current logp  -6507.101 -5727.578 -6223.425 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 7500 of 10002 . Current logp  -6421.235 -5373.797 -6188.045 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 7800 of 10002 . Current logp  -6172.55 -5333.331 -5917.347 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 8100 of 10002 . Current logp  -6142.167 -5095.25 -5577.027 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 8400 of 10002 . Current logp  -5905.989 -5095.25 -5575.172 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 8700 of 10002 . Current logp  -5349.416 -5095.25 -5111.368 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 9000 of 10002 . Current logp  -4580.162 -4882.1 -4638.954 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 9300 of 10002 . Current logp  -4580.162 -4528.333 -4450.176 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 9600 of 10002 . Current logp  -4528.954 -4528.333 -4450.176 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 9900 of 10002 . Current logp  -4489.343 -4528.333 -4450.176 . Please wait! 

 Running DEzs-MCMC, chain  1 iteration 10002 of 10002 . Current logp  -4466.919 -4528.333 -4451.468 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 300 of 10002 . Current logp  -1188447 -1066974 -1116146 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 600 of 10002 . Current logp  -340675.3 -255932.1 -151302.4 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 900 of 10002 . Current logp  -111984 -190482.3 -135431.9 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 1200 of 10002 . Current logp  -108961.4 -190482.3 -83607 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 1500 of 10002 . Current logp  -93395.46 -91151.33 -82271.07 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 1800 of 10002 . Current logp  -78825.4 -74253.66 -82271.07 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 2100 of 10002 . Current logp  -78825.4 -74253.66 -77963.75 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 2400 of 10002 . Current logp  -72028.23 -66245.41 -71561.56 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 2700 of 10002 . Current logp  -64383.83 -45432.23 -65768.17 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 3000 of 10002 . Current logp  -58078.18 -28803.95 -45160.74 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 3300 of 10002 . Current logp  -35154.03 -27324.57 -35551.86 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 3600 of 10002 . Current logp  -13692.47 -13706.02 -13750.76 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 3900 of 10002 . Current logp  -13692.47 -12633.47 -13750.76 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 4200 of 10002 . Current logp  -13144.22 -12633.47 -13750.76 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 4500 of 10002 . Current logp  -13144.22 -12633.47 -13750.76 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 4800 of 10002 . Current logp  -13144.22 -12449.6 -13406.1 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 5100 of 10002 . Current logp  -12641.37 -12423.98 -12474.06 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 5400 of 10002 . Current logp  -12572.69 -12414.59 -12474.06 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 5700 of 10002 . Current logp  -12497.06 -12414.59 -12417 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 6000 of 10002 . Current logp  -12497.06 -12314.71 -12215.28 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 6300 of 10002 . Current logp  -12422.75 -12160 -12136.82 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 6600 of 10002 . Current logp  -12035.86 -11913.64 -11687.49 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 6900 of 10002 . Current logp  -11304.04 -10915.78 -11197.72 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 7200 of 10002 . Current logp  -10671.14 -10840.22 -11162.41 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 7500 of 10002 . Current logp  -10622.56 -10655.76 -11162.41 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 7800 of 10002 . Current logp  -10172.52 -10643.32 -11162.41 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 8100 of 10002 . Current logp  -10131.24 -10484.43 -11058.06 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 8400 of 10002 . Current logp  -10102.17 -10484.43 -11033.41 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 8700 of 10002 . Current logp  -10102.17 -10484.43 -10709.72 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 9000 of 10002 . Current logp  -9923.608 -10013.9 -10464.91 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 9300 of 10002 . Current logp  -9890.634 -10013.9 -10462.87 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 9600 of 10002 . Current logp  -9782.153 -9979.372 -10462.87 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 9900 of 10002 . Current logp  -9707.137 -9979.372 -10451.37 . Please wait! 

 Running DEzs-MCMC, chain  2 iteration 10002 of 10002 . Current logp  -9707.137 -9979.372 -10451.37 . Please wait! 

Check for convergence

summary(out)
# # # # # # # # # # # # # # # # # # # # # # # # # 
## MCMC chain summary ## 
# # # # # # # # # # # # # # # # # # # # # # # # # 
 
# MCMC sampler:  DEzs 
# Nr. Chains:  6 
# Iterations per chain:  3334 
# Rejection rate:  0.979 
# Effective sample size:  104 
# Runtime:  4.45  sec. 
 
# Parameters
            psf       MAP     2.5%    median     97.5%
KEXT      2.134     0.999    0.500     0.963     1.000
LUE      19.336     0.002    0.001     0.003     0.004
tauV     10.565   942.569  773.456   966.040  2307.896
tauS      1.216 49264.635 6595.068 41146.995 49927.328
Av        6.697     0.663    0.213     0.607     0.906
Cv        6.278     2.924    0.101     3.498    63.329
Cs        3.924    14.362    0.984    11.336   450.243
Cr        2.504     3.781    0.748     3.665    59.356
error-sd  2.362     0.122    0.122     0.402     0.626

## DIC:  1285829 
## Convergence 
 Gelman Rubin multivariate psrf:  159.98 
 
## Correlations 
           KEXT    LUE   tauV   tauS     Av     Cv     Cs     Cr error-sd
KEXT      1.000 -0.565 -0.707 -0.134  0.520 -0.155 -0.283 -0.239   -0.436
LUE      -0.565  1.000  0.751 -0.405 -0.864  0.017 -0.095  0.004    0.495
tauV     -0.707  0.751  1.000  0.103 -0.680  0.004  0.071 -0.164    0.249
tauS     -0.134 -0.405  0.103  1.000  0.339 -0.339 -0.073 -0.336   -0.575
Av        0.520 -0.864 -0.680  0.339  1.000  0.014  0.140  0.041   -0.308
Cv       -0.155  0.017  0.004 -0.339  0.014  1.000  0.589  0.448    0.349
Cs       -0.283 -0.095  0.071 -0.073  0.140  0.589  1.000  0.642    0.243
Cr       -0.239  0.004 -0.164 -0.336  0.041  0.448  0.642  1.000    0.323
error-sd -0.436  0.495  0.249 -0.575 -0.308  0.349  0.243  0.323    1.000

Check for convergence longer chain (150k)

summary(out)
# # # # # # # # # # # # # # # # # # # # # # # # # 
## MCMC chain summary ## 
# # # # # # # # # # # # # # # # # # # # # # # # # 
 
# MCMC sampler:  DEzs 
# Nr. Chains:  6 
# Iterations per chain:  50000 
# Rejection rate:  0.943 
# Effective sample size:  719 
# Runtime:  229.801  sec. 
 
# Parameters
           psf       MAP      2.5%    median     97.5%
KEXT     1.001     0.529     0.254     0.538     0.820
LUE      1.001     0.002     0.002     0.002     0.002
tauV     1.005  1394.867  1171.209  1425.719  2844.404
tauS     1.002 25672.714 21957.750 26787.986 49039.340
Av       1.008     0.508     0.355     0.503     0.578
Cv       1.002     3.020     2.849     3.008     3.460
Cs       1.007    14.915    14.503    14.952    16.571
Cr       1.009     3.037     2.648     3.025     3.371
error-sd 1.003     0.100     0.098     0.101     0.413

## DIC:  65862.58 
## Convergence 
 Gelman Rubin multivariate psrf:  1.012 
 
## Correlations 
           KEXT    LUE   tauV   tauS     Av     Cv     Cs     Cr error-sd
KEXT      1.000 -0.386 -0.621  0.035  0.588 -0.035 -0.040 -0.019   -0.271
LUE      -0.386  1.000  0.251 -0.112 -0.174 -0.079 -0.008  0.020   -0.248
tauV     -0.621  0.251  1.000  0.507 -0.821  0.072  0.089 -0.002    0.367
tauS      0.035 -0.112  0.507  1.000 -0.371  0.014  0.045 -0.004    0.109
Av        0.588 -0.174 -0.821 -0.371  1.000  0.109  0.072  0.151   -0.263
Cv       -0.035 -0.079  0.072  0.014  0.109  1.000  0.789  0.688    0.420
Cs       -0.040 -0.008  0.089  0.045  0.072  0.789  1.000  0.711    0.484
Cr       -0.019  0.020 -0.002 -0.004  0.151  0.688  0.711  1.000    0.444
error-sd -0.271 -0.248  0.367  0.109 -0.263  0.420  0.484  0.444    1.000

Remove burn-in and take sample

knitr::kable(refPars)
best lower upper
KEXT 0.500 2e-01 1e+00
LAR 1.500 2e-01 3e+00
LUE 0.002 5e-04 4e-03
GAMMA 0.400 2e-01 6e-01
tauV 1440.000 5e+02 3e+03
tauS 27370.000 4e+03 5e+04
tauR 1440.000 5e+02 3e+03
Av 0.500 2e-01 1e+00
Cv 3.000 0e+00 4e+02
Cs 15.000 0e+00 1e+03
Cr 3.000 0e+00 2e+02
error-sd 0.100 1e-02 7e-01

Model predictions

Computational Cost

VSEM is fast but even a “simple” model needed 150k runs

Much more complex “slower/expensive” models

  • may need to be sped up
  • can be replaced with statistical approximation (model emulation)

Sometime good enough to recode model in a more efficient language eg C or FORTRAN

More efficient inference schemes might be useful eg R package NIMBLE

Practical Session

We will now move onto a practical session

The web document for this can be accessed from the main page

Recap and Summary

What is a process based model PBM?

Why considering uncertainties in PBM predictions may be important

Sources of uncertainty in the Bayesian inference of PBMs

Worked example: Inference of a simple PBM (VSEM) using BayesianTools

Practical: Why it is important to represent all sources of uncertainty in Bayesian inference?

Any Questions…?