Environmental Data Science Toolbox

natcapuk-logo abstract-environmental-data-science

Environmental Data Science Toolbox#

This is a prototype version of the National Capability UK (NC-UK) Environmental Data Science Toolbox, hosted by the UK Centre for Ecology & Hydrology (UKCEH). The aim is to apply FAIR principles (Findable, Accessible, Interoperable, and Reusable) to a collection of data science methods that are generalizable across different environmental applications, with a focus on integrative modelling. The hope being that this will encourage cross-disciplinary use of methods, enhancing national environmental research.

If you’re interested in contributing to this project it would be great to hear from you and you can find details of how to do so via the CONTRIBUTING.md page in the root of the repository. 🌞

The current recommended workflow for interactively engaging with the code in the methodology notebooks is to clone the Notebook Repository linked at the top of each notebook to get access to the relevant files and then to create a virtual environment and test running different sections of the code in your favourite IDE, such as VS Code.

Methods

Key Concepts

Key Datasets

Bias Correction of Climate Models Ongoing Development

Gaussian Processes, Bayesian Hierarchical Modelling

Climate Model Output, In-situ Weather Station Measurements

Calculating Risk to Terrestrial Carbon Pool Ongoing Development

Data Access, Data Integration

MODIS Land Cover and Net Primary Production Products, European Space Agency (ESA) Climate Change Initiative (CCI) Soil Moisture Dataset, Global Standardized Precipitation-Evapotranspiration Index (SPEI) Dataset.

Understanding the error of Multispecies Biodiversity Indicators Ongoing Development

Bias, Uncertainty

Simulated Dataset (Multispecies Occupancy).

Joint Species Distribution Models with jsdmstan

Stochastic Partial Differential Equations, Integrated Nested Laplace Approximations,

Simulated Dataset (Multispecies Populations).

Non-target Analysis of Environmental Mass Spectrometry Data Ongoing Development

Cheminformatics, Data Access, Non-target Analysis, Large Language Models, Principal Component Analysis, UpSet Analysis

Processed LC-MS and GC-MS Data hosted on the NORMAN Digital Sample Freezing Platform (DSFP).

RO-Crate Tutorial Ongoing Development

Data Access, Metadata, Data Integrity

COSMOS Dataset from EIDC

EEX-placebased-exposure Ongoing Development

Place based data exploration, Data Integration, Data Visualisation, Data analysis

Air quality data from Defra and Water Quality data from Environment Agency

Multivariate Modelling of Censored Chemicals using jsdmstan

Multivariate, Censored Data, Joint Species Distribution Modelling, Bayesian Inference

Environment Agency PFAS River Monitoring & PAH Estuary Datasets, Simulated lognormal Censored Dataset

Extracting evidence-linked abundance drivers from species-account text

LLM Assisted Knowledge Extraction, Evidence-linked Causal Relationships, Structured JSON Extraction, Causal DAG generation

Plant Atlas-style Species Account Text, Synthetic Biodiversity Species-account Examples, Extracted Ecological Driver Relationship Table