Integration of Metals Data across Environmental Compartments#

Notebook Repository alt text

Primary Contact: Dr. Michael Tso

Challenge

While polltants know no boundaries, pollution analysis are often limited to a single compartment. By integrating data across compartments, we can better understand their effects and responses holistically.

However, data from many different sources are difficult to discover and integrate. The purpose of this notebook and the UK Environmental Exposure (UK-EEX) Hub is to make it easier to discover, process, analyse, and visualize the wealth of data regarding environmental pressures and responses, particularly related to chemical substances.

Approach

This notebook demonstrates various ways to integrate data from various compartments from a single substance. Cadmium (Cd), a heavy metal, is used as an example. It shows various methods of combining gridded and point data, and make them ready for statistical and machine learning analysis. All maps in this notebook are generated by the tmap R package.

Downloading data

The following datasets are used in the is notebook:

In addition, the following portals are mentioned:

Computational demands

No heavy computation is included in this notebook.

Metals Pollution#

Heavy metals are natural constituents of the Earth’s crust. They are stable and cannot be degraded or destroyed, and therefore they tend to accumulate in soils and sediments. However, human activities have drastically altered the biochemical and geochemical cycles and balance of some heavy metals. The principal man-made sources of heavy metals are industrial point sources, e.g. mines, foundries and smelters, and diffuse sources such as combustion by-products, traffic, etc. Relatively volatile heavy metals and those that become attached to air-borne particles (particulates) can be widely dispersed throughout the atmosphere, often being deposited thousands of miles from the site of initial release. In general, the smaller and lighter a particle is, the longer it will stay in the air. Larger particles (greater than 10 micrometers (µm) in diameter) tend to settle to the ground by gravity in a matter of hours whereas the smallest particles (less than 1µm in diameter ) can stay in the atmosphere for weeks and are mostly removed by precipitation.

Causal loop diagram#

Metals pollution were typically studied in compartments (e.g. air, soil, biota, water). Metals pollution is a complex issue with impacts that span across different environmental compartments and influence ecosystems and human health through the food chain. Systems thinking helps to map these connections and understand that problems like industrial runoff, waste disposal, and agricultural use are not isolated issues but are interconnected parts of a larger system.

By considering the whole system holistically, policymakers and scientists can better identify key leverage points for more effective, integrated strategies that avoid unintended consequences.

To adopt a systems thinking view, a causal loop diagram is often constructed to describe all the potential linkages between environmental pressures and responses. Here is an example:

The difficulty to study across compartments#

Using cadmium (Cd) as an example chemical, if one would like to visualize Cd pollution across air, land, water, and biota, they will need to go to different portals for each dataset, or downloading the individual datasets.

  • Air

  • Soil

  • Predatory Birds

  • River water

Here are some example screenshots of web portals:

NAEI portal.{.lightbox}

Soil and PBMS data for metals pollution.{.lightbox}

River flow data from portal{.lightbox}

The following code shows how to plot air and soil pollution maps, individually.

library(tmap)
library(terra)
library(viridis)
library(readr)
library(tidyr)
library(dplyr)
library(sf)
library(cols4all)  # for flipping colour bar in tmap

dataset_path <- 'temp/datasets'
tmap_mode("plot")
# tm_shape(cd_NAEI_raster ) +
#   tm_raster(
#     col.scale = tm_scale_continuous_log10(
#       ticks = c(5e-5,5e-4, 5e-3,5e-2) # Define log increments for the legend
#     )
#   )

cd_NAEI_raster <-rast(file.path(dataset_path,'NAEI_maps/naei_cd_2023/GeoTIFF_layers/totalcd23_2023.tif'))
cd_EUSO_raster <-rast(file.path('data/topsoil_Cd_UK_small.tif'))

NAEI_tmap <- 
  tm_shape(cd_NAEI_raster ) +
  tm_raster(
    col.scale = tm_scale_intervals(
        
      style = 'fixed',
      breaks = c(1e-6,1e-5,1e-4,3e-4,1e-3,3e-3,1e-2, 3e-2),
      values = "viridis"
      #labels =  c("low", "medium", "high")
      
    ),
    col.legend = tm_legend(title = "UK emissions\n of Cd (tonnes/km)",
                           labels.fomat = 'scientific'),
    
  )

EUSO_tmap <- 
  tmap::tm_shape(cd_EUSO_raster) +
      tmap::tm_raster(
          col.scale = tmap::tm_scale_continuous_sqrt(
              values = "brewer.rd_yl_gn",
              midpoint = NA),
          # col.legend = tm_legend(title = "Modelled Cd in soil",
          #                  labels.fomat = 'scientific')
    
  ) + 
  tm_title("Top soil Cd (mg/L) -- Sqrt. transformed")

NAEI_tmap

EUSO_tmap
 tmap modes "plot" - "view"
Warning messages:
1: Values have found that are less than the lowest break. They are assigned to the lowest interval 
2: Values have found that are higher than the highest break. They are assigned to the highest interval 
../../_images/340958b8b55c0d8d8594d06837409b940b7de78650f63fd07a5743038254aa8f.png ../../_images/05e1b7672b10e95e40e11ba15ece6b69420252911dc294e1f8730e64cc307e03.png
#bivariate map
# raster1_projected <- project(cd_EUSO_raster, cd_NAEI_raster , method = "bilinear") # no need if scale matches
raster1_projected <- cd_EUSO_raster

r_stack <- c(raster1_projected , cd_NAEI_raster )
names(r_stack) <- c("var1", "var2")

r_stack$var1[r_stack$var1 == 0] <- 1e-3  # abitrary, just for quantiles

r_stack$var1 = log10(r_stack$var1) # soil


r_stack$var2[r_stack$var2 < 1e-6] <- NA # remove off shore

r_stack$var2 = log10(r_stack$var2) #NAEI



tm_shape(r_stack) + 
    tm_raster(
        col = tm_vars(c("var1", "var2"), multivariate = FALSE),
        col.scale = tm_scale_intervals(style = "quantile", n = 3, labels = c("L", "M", "H"),
        values = "-carto.earth"))
../../_images/7226b51780fc4a9500414c45bd5d655df0aed40c6f342c9f4f56dca43d63765d.png

But how are air and soil pollution hotspots co-located? Here we can use an integration tool called bivariate plots. Essentially, this is to colour the pixels on the map with a two-dimensional colour map.

stevens.bluered <- matrix(c("#E8E8E8", "#E4ACAC", "#C85A5A",
                 "#B0D5DF", "#AD9EA5", "#985356",
                 "#64ACBE", "#627F8C", "#574249"),
                 byrow = TRUE, ncol = 3)

tm_shape(r_stack) + 
    tm_raster(
        col = tm_vars(c("var2", "var1"), multivariate = TRUE),
        col.scale = tm_scale_bivariate(
        scale1 = tm_scale_intervals(style = "quantile", n = 3, labels = c("L", "M", "H")),
        scale2 = tm_scale_intervals(style = "quantile", n = 3, labels = c("L", "M", "H")),
        values =  "stevens.pinkgreen"), # 97.94% colour-blind friendly
        col.legend = tm_legend_bivariate(
            position = c("top", "right"),
            xlab = "Soil",
            ylab = "Air",
            xlab.size = 1,
            ylab.size = 1,
            item.r = 0,
            item.width = 1,
            item.height = 1,
            text.size = 1,
            ))+
    #tm_credits("UKCEH") +
    tm_compass(type = "arrow") +
    tm_scalebar()+
    tm_components(frame = FALSE)+
    tm_layout(inner.margins = c(0, 0.15, 0.02, 0.02))
../../_images/f09cfb4da9571595710ad3e5078644bfca23ef9ed58000009ccb0afa781ac63e.png

Predatory birds and otters data#

# otter_metals <- read_csv('/assets/bio-xter-liver-metal-habitat-uk-otters-2006-2017-v1/data/Concentrations_of_inorganic_elements_in_UK_otter_livers_2006–2017.csv')
otter_metals <- read_csv('data/Concentrations_of_inorganic_elements_in_UK_otter_livers_2006–2017.csv')
otter_metals[,c('long','lat')] <-sf_project(from = st_crs(27700), to = st_crs(4326),  otter_metals[,c('X','Y')])


tm_shape(st_as_sf(otter_metals, coords = c("long", "lat"), crs = 4326) )+
    tm_basemap() +
  tm_symbols(
    size = "Cd",
    fill = "maroon",
    col = "black",
    lwd = 1,
    size.scale = tm_scale(values.scale = 2)
  )+ 
  tm_title("Otter Cd (mg/L)")
Rows: 278 Columns: 57
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): Sex, AgeClass, Al, Sb, V, Mine
dbl (51): UWCRef, Year, Batch, Weight, Length, sSMI, X, Y, DM, Mn, Fe, Co, N...

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
../../_images/412c64c4a7b8b26a4efdd9e3038596676f44b88c6d80a87a76b737cd2689284a.png

River water quality data#

riv_metals <- read_csv('data/Cd_mean_2025.csv')

tm_shape(st_as_sf(riv_metals, coords = c("lon", "lat"), crs = 4326) )+
    tm_basemap() +
  tm_symbols(
    size = "Cd_mean",
    fill = "steelblue",
    # fill.legend = tm_legend(
    #   title = "Mean Cd (mg/L)",
    #   show = TRUE
    # ),
    col = "black",
    lwd = 1,
    size.scale = tm_scale(values.scale = 2) 
  ) + 
  tm_title("River mean Cd (mg/L)"
    ) 
New names:
 `` -> `...1`
Rows: 404 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): site_id, site_name, region, G_ID, S_ID, wb_id, organisation, mater...
dbl (12): ...1, easting, northing, min_year, max_year, HA_NUM, rbd_id, mncat...

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
../../_images/b6fbb8d13d60b9848b9769b900230b7ce3fb44f27cd7308343ad9eba1aabf854.png

The power of integration#

(1) Displaying the data together#

# tmap_mode("view")
# tm_basemap(c(StreetMap = "OpenStreetMap", TopoMap = "OpenTopoMap")) +
# NAEI_tmap
#| fig-cap: "Bivariate chloropleth plot for air and soil cadmium levels, overlaid by otter and river water cadmium levels"

# tmap_mode("view")

tm_shape(r_stack) + 
    tm_raster(
        col = tm_vars(c("var2", "var1"), multivariate = TRUE),
        col.scale = tm_scale_bivariate(
        scale1 = tm_scale_intervals(style = "quantile", n = 3, labels = c("L", "M", "H")),
        scale2 = tm_scale_intervals(style = "quantile", n = 3, labels = c("L", "M", "H")),
        values =  "stevens.pinkgreen"), # 97.94% colour-blind friendly
        col.legend = tm_legend_bivariate(
            position = c("top", "right"),
            xlab = "Soil",
            ylab = "Air",
            xlab.size = 1,
            ylab.size = 1,
            item.r = 0,
            item.width = 1,
            item.height = 1,
            text.size = 1,
            ))+
tm_shape(st_as_sf(riv_metals, coords = c("lon", "lat"), crs = 4326) )+
  tm_symbols(
    size = "Cd_mean",
    col = "black",
    fill = "#FEFE62",
  #  shape = 1,
    lwd = 1,
    size.scale = tm_scale(values.scale = 2) 
  ) +
  tm_shape(st_as_sf(otter_metals, coords = c("long", "lat"), crs = 4326) )+
  tm_symbols(
    size = "Cd",
    fill = "black", #"#CC79A7",
    col = "black",
    lwd = 1,
    size.scale = tm_scale(values.scale = 2)
  )+
    #tm_credits("UKCEH") +
    tm_compass(type = "arrow") +
    tm_scalebar()+
    tm_components(frame = FALSE)+
    tm_layout(inner.margins = c(0, 0.15, 0.02, 0.02))
../../_images/3c92d3cb9573d751d0dd181268c17014f673072d9a1a75bee90b7eb0183b6000.png

(2) Getting data ready for statistical or machine learning analysis#

  • Point process modelling[1]

library(maps)
uk_nuts1 <- terra::vect('data/NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom.shp')
eng_wales <- uk_nuts1[1:10,]
# eng_wales <- terra::aggregate(uk_nuts1[1:10,])



metals_map <- st_as_sf(x = otter_metals, coords = c("X", "Y"), crs = st_crs(eng_wales))

# qtm(metals_map)



# Prediction
grid <- terra::rast(eng_wales , nrows = 50, ncols = 50)
xy <- terra::xyFromCell(grid, 1:ncell(grid))

Pred_loc <- st_as_sf(as.data.frame(xy), coords = c("x", "y"),
                 crs = st_crs(eng_wales))
Pred_loc <- st_filter(Pred_loc, st_as_sf(eng_wales)) # filter out points outside domain

# qtm(Pred_loc)


### method (1) closest obs
# Voronoi
v <-terra::voronoi(x = terra::vect(metals_map[1:250,'Cd']), bnd = eng_wales) # some bug: can't deal with last few lines
# plot(v)
# points(vect(metals_map), cex = 0.5)

# # voronoi map
tm_shape(st_as_sf(eng_wales)) + tm_polygons(alpha = 0.3) + tm_shape(metals_map) +
  tm_dots("Cd", palette = "viridis")

# tm_shape(v) +
#   tm_fill(col = "Cd", palette = "viridis", alpha = 0.6) + ### need changing>> what is the 'vble' column
#  tm_shape(metals_map) +
#   tm_dots("Cd", palette = "viridis")+
#   tm_shape(eng_wales) +
#    tm_borders(lwd = 2)

# raster map

resp <- st_intersection(st_as_sf(v), Pred_loc)
resp$pred <- log10(resp$Cd)
pred <- terra::rasterize(resp, grid, field = "pred", fun = "mean")
Cd_grid <- tm_shape(eng_wales) +
   tm_borders(lwd = 2) +
   tm_shape(pred) + tm_raster(alpha = 0.6, palette = "viridis")+
   tm_title('Estimated values of (log10) Cd in otters (closest observation)
            ')

Cd_grid
── tmap v3 code detected ───────────────────────────────────────────────────────
[v3->v4] `tm_polygons()`: use `fill_alpha` instead of `alpha`.
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 
── tmap v3 code detected ───────────────────────────────────────────────────────
[v3->v4] `tm_tm_raster()`: migrate the argument(s) related to the scale of the
visual variable `col` namely 'palette' (rename to 'values') to col.scale =
tm_scale(<HERE>).
[v3->v4] `tm_raster()`: use `col_alpha` instead of `alpha`.
../../_images/65830135d42942e7ee26479ad51c84d8707385eacf72c46a79db8284c083cf2d.png ../../_images/93755c81b45a4cd495c23fb0b5efa0f0d16fafed32555b4d04bb8fb4ea277126.png
#| fig-cap: Repeat for water quality

### method (1) closest obs
# Voronoi
# v <-terra::voronoi(x = terra::vect(st_as_sf( riv_metals[c('lon','lat', 'Cd_mean')], 
#                   coords = c("lon", "lat"), crs = 4326) ), bnd = eng_wales) 
v <-terra::voronoi(x = terra::vect(st_as_sf( riv_metals[c('easting','northing', 'Cd_mean')], 
                  coords = c("easting", "northing"), crs = 27700) ), bnd = eng_wales) 

resp <- st_intersection(st_as_sf(v), Pred_loc)
resp$pred <- log10(resp$Cd)
pred_riv <- terra::rasterize(resp, grid, field = "pred", fun = "mean")
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 
#| fig-cap: Data cube for cadmium data across compartments
#| eval: true
#| fig-height: 6
#| fig-width: 3


# tmap_grob(Cd_grid + tm_layout(legend.show=FALSE, frame = FALSE))
sm <- matrix(c(2.0, 1.2, 0, 1), 2, 2)


pred_sf <- st_as_sf(as.polygons(pred))
pred_sf_tilt <- pred_sf
pred_sf_tilt$geometry <- pred_sf$geometry * sm
st_crs(pred_sf_tilt) <- st_crs(pred_sf)

pred_sf_tilt2 <- pred_sf
pred_sf_tilt2$geometry <- pred_sf_tilt$geometry + c(0,-0 )
st_crs(pred_sf_tilt2) <- st_crs(pred_sf)

## River
pred_riv_sf <- st_as_sf(as.polygons(pred_riv))
pred_sf_tilt3 <- pred_riv_sf 
pred_sf_tilt3$geometry <- pred_riv_sf$geometry * sm + c(0,-1e6 )
st_crs(pred_sf_tilt3) <- st_crs(pred_riv_sf)

# coarse, if not too slow
r_coarse <- resample(r_stack, pred, method="near")
# crop r_stack to England and Wales

r_coarse <- crop(r_coarse, eng_wales, mask=TRUE)


r_coarse_sf <- st_as_sf(as.polygons(r_coarse))
r_coarse_sf2 <- st_as_sf(as.polygons(r_coarse$var2))

r_coarse_sf_tilt <- r_coarse_sf
r_coarse_sf_tilt$geometry <- r_coarse_sf$geometry * sm  + c(0,-1.5e6 )
st_crs(r_coarse_sf_tilt) <- st_crs(r_coarse_sf)


r_coarse_sf2_tilt <- r_coarse_sf2
r_coarse_sf2_tilt$geometry <- r_coarse_sf2$geometry * sm  + c(0,-0.5e6 )
st_crs(r_coarse_sf2_tilt) <- st_crs(r_coarse_sf2)




tm_shape(pred_sf_tilt3 ) + ## everything relative to this
  tm_fill(c('mean'),alpha = 0.6, palette = "brewer.pi_yg") + 
tm_shape(pred_sf_tilt2) + 
  tm_fill(c('mean'),alpha = 0.6, palette = "brewer.prgn") + 
tm_shape(r_coarse_sf_tilt) +
  tm_fill(c('var1'),alpha = 0.6, palette = "viridis") + 
tm_shape(r_coarse_sf2_tilt) +
  tm_fill(c('var2'),alpha = 0.6, palette = "brewer.br_bg") +
  tm_credits('Water', position = c(0.02, 0.88) , size = 1) +
  tm_credits('Air', position = c(0.02, 0.6) , size = 1) +
  tm_credits('Otter', position = c(0.02, 0.33) , size = 1) +
  tm_credits('Soil', position = c(0.02, 0.05) , size = 1) +
  tm_layout(legend.show=FALSE, frame = TRUE, asp = 0.5)
── tmap v3 code detected ───────────────────────────────────────────────────────
[v3->v4] `tm_tm_fill()`: migrate the argument(s) related to the scale of the
visual variable `fill` namely 'palette' (rename to 'values') to fill.scale =
tm_scale(<HERE>).
[v3->v4] `tm_fill()`: use `fill_alpha` instead of `alpha`.
[v3->v4] `tm_fill()`: use `fill_alpha` instead of `alpha`.
[v3->v4] `tm_fill()`: use `fill_alpha` instead of `alpha`.
[v3->v4] `tm_fill()`: use `fill_alpha` instead of `alpha`.
../../_images/d49df82ab1d71e543ad19fe6a112707a71e444d278cc8d2c9069e1735b34f7a7.png