Hierarchical Modelling - Practical 1

Author

Lindsay F Banin

More tree allometry

As trees grow, they grow both horizontally in girth (and diameter) and vertically in height. But as trees age and gain size, the change (growth rate) in height declines; the relationship is non-linear (e.g. Banin et al. 2012). This is thought to happen because as trees gain height they become less limited in light, but more limited by other growth resources (water, nutrients…) and may put more energy into reproductive processes. There are remarkable similarities in the nature of these non-linear relationships (so called ‘rules of ecology’), yet they still vary across sites and across species. These grouping variables add ‘structure’ to our data i.e. these grouping create non-independence.

We will use the relationships between height and diameter within the UK tree dataset to explore the application of hierarchical models in rstanarm.

Explore the data

Load the “trees.csv” data we have already been using.

As usual, the best first step of an analysis is to get a feel for the data by plotting. We will first plot tree height against diameter (diameter at breast height; DBH), with points coloured according to the different species in the dataset. What do you observe about this plot?

Next, we will natural-log transform both the variables of tree_height and dbh. This transforms a non-linear Power law function y = aX^b to a linear function (log(y) = a + b*(logX)). What do you observe now?

df <- read.csv(url("https://raw.githubusercontent.com/NERC-CEH/beem_data/main/trees.csv"))
head(df)
  tree_number age species cultivation soil_type tree_height timber_height   dbh
1           1  44      SS           1         1       24.69         22.25 35.56
2           2  44      SS           1         1       21.64         20.12 26.67
3           3  44      SS           1         1       24.38         21.03 31.50
4           4  44      SS           1         1       23.47         18.90 21.84
5           5  23      SS           2        4b        7.62          6.40 21.08
6           6  23      SS           2        4b        8.84          5.33 12.95
  tree_mass stem_mass taper crown_mass root_mass
1    1325.2    1134.0 1.500      191.2     466.2
2     787.0     693.0 0.911       94.0     415.8
3    1055.2     893.2 1.237      162.0     327.6
4     675.0     617.4 0.629       57.6     194.4
5     185.4     130.5 2.000       54.9      65.2
6      77.4      47.7 0.772       29.7      22.9
library(ggplot2)
# Plot the raw data
p <- ggplot(df, aes(dbh, tree_height, colour = species))
p <- p + geom_point()
p

# Now natural-log transform the height and diameter at breast height (DBH)
p <- ggplot(df, aes(log(dbh), log(tree_height), colour = species))
p <- p + geom_point()
p <- p + stat_smooth(method = "lm")
p
`geom_smooth()` using formula = 'y ~ x'

The natural-log transformation will not always be the best for a non-linear relationship (for example, if your relationship exhibits an asymptote), but this is good enough for now!

The package rstanarm uses the formulation from the lme4 package, which can be used for hierarchical models using a maximum likelihood approach. Linear mixed-effects models use the function ‘lmer’ and non-linear models use the ‘nlmer’ function in the lme4 package. Before we move to the Bayesian formulation in rstanarm, let’s explore a simple model using lme4. (If this package is not familiar, we could easily dedicate a whole training session to this! But we will focus on the essential elements for now).

Creating a model using lme4

Here we will model the relationship between tree height and diameter using lme4 using the function lmer. As above, we take the log of both variables so that the relationship is linear. Much of this code will look familiar, as compared with a linear regression (lm). Here, we have an additional term which specifies how the grouping variable ‘species’ will be used in the model. In the first model (m1_lme) below, we have specified that the intercept of the height-diameter relationship varies by the group term Species using (1|species). The second model formulation allows both the intercept and slope to vary by group (diam|species). In each case, the intercepts and slopes for each species are a deviations from the parameter estimates for the intercept and slope for the whole dataset (the fixed effects), and these represent a population from a random distribution.

## Assess Tree Height and Diameter Allometric relationship using mixed effects model with lme4 package
library(lme4)
Loading required package: Matrix
height <- log(df$tree_height)
diam <- log(df$dbh)
m1_lme <- lmer(height ~ diam + (1|species), data = df, na.action = na.exclude)
summary(m1_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: height ~ diam + (1 | species)
   Data: df

REML criterion at convergence: -1527.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9978 -0.6109  0.0439  0.6544  3.5501 

Random effects:
 Groups   Name        Variance Std.Dev.
 species  (Intercept) 0.008072 0.08984 
 Residual             0.025564 0.15989 
Number of obs: 1901, groups:  species, 13

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.44100    0.05137   8.584
diam         0.72257    0.01473  49.041

Correlation of Fixed Effects:
     (Intr)
diam -0.860
ranef(m1_lme)
$species
    (Intercept)
CP -0.065595914
DF  0.074487937
EL  0.062835193
GF  0.163073601
JL  0.037563658
LC -0.181808088
LP  0.003878361
NF  0.006101305
NS -0.044023529
RC -0.082740006
SP -0.037135707
SS  0.026345056
WH  0.037018135

with conditional variances for "species" 
m2_lme <- lmer(height ~ diam + (diam|species), data = df, na.action = na.exclude)
summary(m2_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: height ~ diam + (diam | species)
   Data: df

REML criterion at convergence: -1551.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.0397 -0.6159  0.0329  0.6479  3.5857 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 species  (Intercept) 0.26186  0.5117        
          diam        0.02559  0.1600   -0.98
 Residual             0.02503  0.1582        
Number of obs: 1901, groups:  species, 13

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.63809    0.17749   3.595
diam         0.65920    0.05674  11.618

Correlation of Fixed Effects:
     (Intr)
diam -0.988
ranef(m2_lme)
$species
   (Intercept)         diam
CP  0.69731911 -0.249574016
DF  0.27465251 -0.064409200
EL -0.21312524  0.090547610
GF  0.79823959 -0.205792952
JL  0.17060288 -0.045197538
LC -0.54859640  0.120019866
LP -0.54348420  0.184344324
NF  0.08496274 -0.028042799
NS -0.16409454  0.037731355
RC -0.11114522  0.008136213
SP  0.02984401 -0.023580739
SS -0.16943106  0.062926980
WH -0.30574417  0.112890897

with conditional variances for "species" 

As we expected from the plotting, the relationships between tree_height and dbh are ‘shifted’ along the y-axis according to species. Let’s first proceed with our simpler model in rstanarm. Here, we run it using the function glmer (generalised linear mixed effects model) where we explicitly specify the family as Gaussian (Normal) - we could also run it using lmer in this case. We will return to more examples with generalised linear models later.

We can see that there are lots of similarities between the lmer models above and the stan_glmer model below, but we also add in some extra arguments to prescribe how our Bayesian analysis is run - notice the chains and iter (iterations) arguments. As these numbers get higher, the model takes longer to run, and you might want to check things are behaving before you set it off on lots of iterations. Let’s start small-ish. If these samples are inadequate and the chains are not converging you will get some useful warnings about effective sample size and R-hat (https://mc-stan.org/rstanarm/articles/rstanarm.html). If chains have not mixed well (so that between- and within-chain estimates do not agree), R-hat is larger than 1.

## Re-run using rstanarm
library(rstanarm)
Loading required package: Rcpp
This is rstanarm version 2.26.1
- See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
- Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
- For execution on a local, multicore CPU with excess RAM we recommend calling
  options(mc.cores = parallel::detectCores())
height <- log(df$tree_height)
diam <- log(df$dbh)
m1_stan <- stan_glmer(height ~ diam + (1|species), data = df, family = gaussian(link = "identity"), chains = 2,iter = 500, seed = 12345)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000204 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.04 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 500 [  0%]  (Warmup)
Chain 1: Iteration:  50 / 500 [ 10%]  (Warmup)
Chain 1: Iteration: 100 / 500 [ 20%]  (Warmup)
Chain 1: Iteration: 150 / 500 [ 30%]  (Warmup)
Chain 1: Iteration: 200 / 500 [ 40%]  (Warmup)
Chain 1: Iteration: 250 / 500 [ 50%]  (Warmup)
Chain 1: Iteration: 251 / 500 [ 50%]  (Sampling)
Chain 1: Iteration: 300 / 500 [ 60%]  (Sampling)
Chain 1: Iteration: 350 / 500 [ 70%]  (Sampling)
Chain 1: Iteration: 400 / 500 [ 80%]  (Sampling)
Chain 1: Iteration: 450 / 500 [ 90%]  (Sampling)
Chain 1: Iteration: 500 / 500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 4.363 seconds (Warm-up)
Chain 1:                0.969 seconds (Sampling)
Chain 1:                5.332 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000117 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.17 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 500 [  0%]  (Warmup)
Chain 2: Iteration:  50 / 500 [ 10%]  (Warmup)
Chain 2: Iteration: 100 / 500 [ 20%]  (Warmup)
Chain 2: Iteration: 150 / 500 [ 30%]  (Warmup)
Chain 2: Iteration: 200 / 500 [ 40%]  (Warmup)
Chain 2: Iteration: 250 / 500 [ 50%]  (Warmup)
Chain 2: Iteration: 251 / 500 [ 50%]  (Sampling)
Chain 2: Iteration: 300 / 500 [ 60%]  (Sampling)
Chain 2: Iteration: 350 / 500 [ 70%]  (Sampling)
Chain 2: Iteration: 400 / 500 [ 80%]  (Sampling)
Chain 2: Iteration: 450 / 500 [ 90%]  (Sampling)
Chain 2: Iteration: 500 / 500 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 3.946 seconds (Warm-up)
Chain 2:                0.828 seconds (Sampling)
Chain 2:                4.774 seconds (Total)
Chain 2: 
Warning: The largest R-hat is 1.08, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess
## try editing chains and iterations and look at warnings.
m1_stan <- stan_glmer(height ~ diam + (1|species), data = df, family = gaussian(link = "identity"), chains = 4,iter = 2500, seed = 12345)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000114 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.14 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 1: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 1: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 1: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 1: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 1: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 1: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 1: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 1: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 1: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 1: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 8.176 seconds (Warm-up)
Chain 1:                5.107 seconds (Sampling)
Chain 1:                13.283 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.00012 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.2 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 2: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 2: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 2: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 2: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 2: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 2: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 2: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 2: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 2: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 2: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 2: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 7.898 seconds (Warm-up)
Chain 2:                6.169 seconds (Sampling)
Chain 2:                14.067 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.00014 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.4 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 3: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 3: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 3: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 3: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 3: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 3: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 3: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 3: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 3: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 3: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 3: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 9.547 seconds (Warm-up)
Chain 3:                4.645 seconds (Sampling)
Chain 3:                14.192 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 8.9e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.89 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 4: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 4: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 4: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 4: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 4: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 4: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 4: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 4: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 4: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 4: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 4: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 8.97 seconds (Warm-up)
Chain 4:                5.859 seconds (Sampling)
Chain 4:                14.829 seconds (Total)
Chain 4: 

In the code chunk above, we have not explicitly stated our priors, which means rstanarm has used the defaults. There is some really useful information about how these defaults are chosen in this vignette: https://mc-stan.org/rstanarm/articles/priors.html, and on the priors for the covariance matrix (https://mc-stan.org/rstanarm/articles/glmer.html#priors-on-covariance-matrices-1). They are intended to be weakly informative; vague but without allowing odd behaviour in the model. Let’s take a look at what these defaults are for our model.

prior_summary(m1_stan, digits = 2)
Priors for model 'm1_stan' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 2.6, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 2.6, scale = 0.62)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 2.4)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 4)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details

Notice we have the same priors as for the linear regression model - for the intercept, slope coefficient, auxiliary (sigma) and we also have the prior for the covariance matrix, which relates to our random effect term (species).

To explicitly state our priors, we could write the model formula as:

m1_stan_wpriors <- stan_glmer(height ~ diam + (1|species), data = df, family = gaussian(link = "identity"), prior_intercept = normal(location=2.6, scale=0.62), prior = normal(location=0, scale=2.4), prior_aux = exponential(rate=4), prior_covariance = decov(shape = 1, scale = 1), chains = 4,iter = 2500, seed = 12345)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000101 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.01 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 1: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 1: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 1: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 1: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 1: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 1: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 1: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 1: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 1: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 1: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 7.716 seconds (Warm-up)
Chain 1:                5.214 seconds (Sampling)
Chain 1:                12.93 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 9e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.9 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 2: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 2: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 2: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 2: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 2: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 2: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 2: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 2: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 2: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 2: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 2: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 9.063 seconds (Warm-up)
Chain 2:                4.274 seconds (Sampling)
Chain 2:                13.337 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 9.7e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.97 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 3: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 3: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 3: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 3: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 3: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 3: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 3: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 3: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 3: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 3: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 3: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 8.827 seconds (Warm-up)
Chain 3:                5.243 seconds (Sampling)
Chain 3:                14.07 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000119 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.19 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 4: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 4: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 4: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 4: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 4: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 4: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 4: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 4: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 4: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 4: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 4: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 9.641 seconds (Warm-up)
Chain 4:                6.206 seconds (Sampling)
Chain 4:                15.847 seconds (Total)
Chain 4: 

Now let’s explore and interpret our model.

The Bayesian uncertainty interval indicates we believe after seeing the data that there is 0.950�probability that�the slope parameter for log(DBH) (or diam)�is between�the lower bound ci95[1,1]�and upper bound�ci95[1,2] - a convenient interpretation for the parameters of interest as we heard on day one!

print(m1_stan) # for further information on interpretation: ?print.stanreg
stan_glmer
 family:       gaussian [identity]
 formula:      height ~ diam + (1 | species)
 observations: 1901
------
            Median MAD_SD
(Intercept) 0.4    0.1   
diam        0.7    0.0   

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.2    0.0   

Error terms:
 Groups   Name        Std.Dev.
 species  (Intercept) 0.10    
 Residual             0.16    
Num. levels: species 13 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
summary(m1_stan, digits = 5)

Model Info:
 function:     stan_glmer
 family:       gaussian [identity]
 formula:      height ~ diam + (1 | species)
 algorithm:    sampling
 sample:       5000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1901
 groups:       species (13)

Estimates:
                                         mean     sd       10%      50%   
(Intercept)                             0.44180  0.05321  0.37250  0.44234
diam                                    0.72244  0.01449  0.70387  0.72243
b[(Intercept) species:CP]              -0.06582  0.03432 -0.10851 -0.06567
b[(Intercept) species:DF]               0.07386  0.03767  0.02704  0.07274
b[(Intercept) species:EL]               0.06258  0.04151  0.00986  0.06227
b[(Intercept) species:GF]               0.16245  0.03861  0.11333  0.16152
b[(Intercept) species:JL]               0.03737  0.03712 -0.00930  0.03718
b[(Intercept) species:LC]              -0.18568  0.06433 -0.26865 -0.18326
b[(Intercept) species:LP]               0.00339  0.03234 -0.03649  0.00319
b[(Intercept) species:NF]               0.00578  0.04536 -0.05054  0.00578
b[(Intercept) species:NS]              -0.04433  0.03315 -0.08564 -0.04474
b[(Intercept) species:RC]              -0.08337  0.05396 -0.15293 -0.08274
b[(Intercept) species:SP]              -0.03686  0.03314 -0.07800 -0.03692
b[(Intercept) species:SS]               0.02598  0.03087 -0.01279  0.02609
b[(Intercept) species:WH]               0.03669  0.03708 -0.01032  0.03670
sigma                                   0.15993  0.00256  0.15671  0.15992
Sigma[species:(Intercept),(Intercept)]  0.01022  0.00607  0.00458  0.00873
                                         90%   
(Intercept)                             0.51087
diam                                    0.74129
b[(Intercept) species:CP]              -0.02237
b[(Intercept) species:DF]               0.12227
b[(Intercept) species:EL]               0.11579
b[(Intercept) species:GF]               0.21232
b[(Intercept) species:JL]               0.08405
b[(Intercept) species:LC]              -0.10507
b[(Intercept) species:LP]               0.04411
b[(Intercept) species:NF]               0.06423
b[(Intercept) species:NS]              -0.00197
b[(Intercept) species:RC]              -0.01499
b[(Intercept) species:SP]               0.00484
b[(Intercept) species:SS]               0.06463
b[(Intercept) species:WH]               0.08391
sigma                                   0.16323
Sigma[species:(Intercept),(Intercept)]  0.01730

Fit Diagnostics:
           mean    sd      10%     50%     90%  
mean_PPD 2.62411 0.00527 2.61732 2.62427 2.63075

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                       mcse    Rhat    n_eff
(Intercept)                            0.00139 1.00091 1458 
diam                                   0.00021 0.99974 4696 
b[(Intercept) species:CP]              0.00119 1.00092  826 
b[(Intercept) species:DF]              0.00116 1.00048 1061 
b[(Intercept) species:EL]              0.00115 1.00031 1309 
b[(Intercept) species:GF]              0.00118 1.00184 1062 
b[(Intercept) species:JL]              0.00117 1.00122 1009 
b[(Intercept) species:LC]              0.00139 1.00026 2155 
b[(Intercept) species:LP]              0.00120 1.00193  732 
b[(Intercept) species:NF]              0.00118 1.00076 1486 
b[(Intercept) species:NS]              0.00119 1.00234  777 
b[(Intercept) species:RC]              0.00112 0.99977 2324 
b[(Intercept) species:SP]              0.00116 1.00192  817 
b[(Intercept) species:SS]              0.00115 1.00196  717 
b[(Intercept) species:WH]              0.00116 1.00125 1014 
sigma                                  0.00004 1.00047 4580 
Sigma[species:(Intercept),(Intercept)] 0.00017 1.00202 1280 
mean_PPD                               0.00007 0.99993 5015 
log-posterior                          0.13737 1.00239  910 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
ci95 <- posterior_interval(m1_stan, prob = 0.95)
round(ci95, 2)
                                        2.5% 97.5%
(Intercept)                             0.34  0.54
diam                                    0.69  0.75
b[(Intercept) species:CP]              -0.14  0.00
b[(Intercept) species:DF]               0.00  0.15
b[(Intercept) species:EL]              -0.02  0.14
b[(Intercept) species:GF]               0.09  0.24
b[(Intercept) species:JL]              -0.04  0.11
b[(Intercept) species:LC]              -0.32 -0.07
b[(Intercept) species:LP]              -0.06  0.07
b[(Intercept) species:NF]              -0.08  0.10
b[(Intercept) species:NS]              -0.11  0.02
b[(Intercept) species:RC]              -0.19  0.02
b[(Intercept) species:SP]              -0.10  0.03
b[(Intercept) species:SS]              -0.03  0.09
b[(Intercept) species:WH]              -0.04  0.11
sigma                                   0.15  0.17
Sigma[species:(Intercept),(Intercept)]  0.00  0.03
# For specific parameters:
ci95 <- posterior_interval(m1_stan, prob = 0.95, pars = "diam")
round(ci95, 2)
     2.5% 97.5%
diam 0.69  0.75

What do you notice about the posterior medians in comparison to the lmer model results?

See documentation (Section 3) here for some more documentation on interpreting the model; https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html#prior-distributions.

Now let’s take a look at how our modelling process has performed - are our chains behaving, and is their any odd behaviour in our histograms of the MCMC samples? Notice we have a plot for every parameter and grouping level.

For more plotting options, look at the documentation here: https://mc-stan.org/rstanarm/reference/plot.stanreg.html

plot(m1_stan, "trace") # nice hairy catepillars!

plot(m1_stan, "hist")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(m1_stan, "intervals")

bayesplot::mcmc_areas_ridges(m1_stan, regex_pars = "species", prob = 0.95)

launch_shinystan(m1_stan, ppd = FALSE)

Now let’s compare our predictions from the model to our observations using some plotting options:

y_rep <- posterior_predict(m1_stan) # ?posterior_predict
dim(y_rep)
[1] 5000 1901
pp_check(m1_stan, plotfun = "boxplot", notch = FALSE) #?pp_check

pp_check(m1_stan, plotfun = "hist", nreps = 3)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pp_check(m1_stan)

Now let’s apply the leave one out cross-validation using the loo package - are there any problematic influential points?:

library(loo)
This is loo version 2.6.0
- Online documentation and vignettes at mc-stan.org/loo
- As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session. 
- Windows 10 users: loo may be very slow if 'mc.cores' is set in your .Rprofile file (see https://github.com/stan-dev/loo/issues/94).
loo_m1_stan <- loo(m1_stan)
plot(loo_m1_stan, label_points = TRUE)

Additional resources

https://www.bayesrulesbook.com/chapter-15