Machine Learning

Bayesian Methods for Ecological and Environmental Modelling

Peter Levy
UKCEH Edinburgh

Conventional statistical modelling: we declare

  • the model, f (the mathematical form of the variables x)
  • the algorithm estimates the parameters, \(\theta\)

Machine Learning: we declare

  • the variables x (“features”) to use
  • the algorithm estimates parameters \(\theta\) and the model f

Machine Learning

Machine Learning

https://xkcd.com

Neural Networks

Neural Networks

  • Highly flexible, often uninterpretable
  • Still just a model, but typically have thousands of parameters
  • Not easy to do MCMC computation

Neural Networks

Ideally, we want to:

  • keep flexibility of ML
  • quantify uncertainty
  • incorporate prior knowledge

Some promising approaches

  • Bayesian Additive Regression Trees (BART)
  • Bayesian Adaptive Spline Surfaces (BASS)
  • Bayesian Gaussian Process Regression

Bayesian Additive Regression Trees (BART)

Related to random forests, gradient boosted methods, GAMs

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8347

Bayesian Additive Regression Trees (BART)

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8347

BART formal definition

\[y = \sum_{j=1}^m g(X, T_j, M_j) + \epsilon\]

  • \(m\) is the number of trees (fixed)
  • \(X\) is a matrix of covariates
  • \(T_j\) is a vector of binary split decision rules
  • \(M_j\) is a vector of terminal node weights
  • \(g()\) is a look-up function
  • \(\epsilon \sim N(0, \sigma^2)\) is an error term

Bayesian Additive Regression Trees (BART)

Very simple to implement:

library(BART)
bart_model <- wbart(x.train, y.train, x.test)

Try this in the next practical (time permitting) …