2025 Week 15

There was no update last week as I was at Kubecon Europe. It was a great event and there’s some rough notes below, I’ll do a proper write up/talk later at some point. But this means this weeks post is covering 2 weeks! Also guest weeknoter Matt Brown is back again for gridded data insights. So this means this post might be a bit longer than usual, what a treat!

⛵ Kubecon Europe 2025

Sustainability in cloud computing: Tools like kepler and Kubegreen to monitor energy consumption
Edge Computing and IOT: K8s on iot devices (K0s), NATS as an alternative protocol to MQTT
Science and Research K8s: Talks about managing satillite imagery (~600PB/year). CERN now talk about exabytes (Petabytes were early 2000’s). Label Studio for tagging (could be used for phenocam?)
Catalogue and Datalabs: https://www.skao.int had a really sleek catalogue to jupyter integration
Lots of AI/ML infra: GPU’s on K8s are becoming very common
Web Assembly (WASM): Early days but lots of excitement around WASM and Unikernels
AWS Auto-mode: Allows better and simpler management of our AWS k8s cluster, with some potential cost savings to scale down more when usage is low
Open Telemetry: More and more adoption of open telemetry, something to keep an eye on as we improve our observability
Regulatory Compliance: EU DORA and Cyber Resiliance Act are coming into effect in a few years, likely to be more applicable if we take on private work
Apple leaning into open source: Number of talks from apple about open source, main thing was Swift lang on the server, which some people are raving about.
lots more…

⏲️Announcing Time Stream

(This is still work in progress and not ready for production use)

https://github.com/NERC-CEH/time-stream This is the core time series model work from our data processing pipeline, which we have extracted into it’s own repo. Not only did we get the code extracted and ready to be reused, we were able to get it reused into our api!

Test out the timeseries model aggregation via our data api now! (aggregate query param):

curl -X 'GET' \
  'https://dri-api.staging.eds.ceh.ac.uk/v1/cosmos/collections/30M/sites/ALIC1?variables=WS&start_date=2025-03-01T13%3A10%3A32.106Z&end_date=2025-04-30T13%3A10%3A32.106Z&aggregate=max%3AP1D' \
  -H 'accept: application/json'

📰Time stream docs

As the time series model is such a core focus of what we are building in the timeseries product, we have put effort into documenting. There’s been some great progress on this read the docs here: https://nerc-ceh.github.io/time-stream/

❇️ DRI UI updates

432326516-54f7273a-2d9d-448f-bfea-7a6db4b5924b

Huge progress on the UI

Y-axis management is now driven by the units for the variables being plotted (variables with the same units are plotted together)
When plotting more than 2 variables on the same chart we now split into multiple charts
Improved experience around the legend
Try it out here

🏗️ FDRI dataset structure

We are continue to ingest fdri sensor data, last week we split it into 3 datasets, one_minute, fifteen_minute and thirty_minute. Can be seen here https://dri-ui.staging.eds.ceh.ac.uk/fdri Select QSF5-STKX-4YVB and a date before 9th April, we have open Bugs for it not fully working 😁.

🧑‍🚀 Metadata integration

We are continuing hooking our processing up the metadata api, our main focus is driving the processing pipeline based on the config in the metadata service. The bit circled in red.

🦠IOT LoRaWAN

We’ve been investigating options to allow FDRI Work Package 3: Innovation easy ways for citizen scientists to connect their own devices into the FDRI ecosystem. This is very early days LoRaWAN investigation write is available here.

🌐Geospatial Postgres PostGIS database

We have started setting up an Aurora Serverless Postgres DB to manage some geospatial use cases. With the first step getting it setup and getting some existing geospatial data loaded in. We are setting up the database with the idea of reusing it across other projects, potential consolidating our tech stack and removing dynamodb. We are looking at aurora serverless since it can “magically” scale up and down, importantly it can scale to zero when not in use, which would save us 💸.

🪟 Gridded (by Matt Brown)

Some more Gridded Goodness 🍰 (sponsored by the fact that Matt B was hungry whilst writing this) 🍰

Some of us had a really productive cross-centre meeting in Liverpool with the National Oceanography Centre, to help unify our efforts towards developing digital research platforms and products for accessing and working with large gridded datasets that are stored remotely. It started off brilliantly, with us starting the meeting late after being caught off guard with the ginormous portions at a nearby Kurdish restaurant :D

PXL_20250325_133426390

It’s making me hungry whilst writing this, not helpful, ahem, anyway whilst digesting that we were then given plenty more food (for thought) by being shown through the NOC Data Science Platform or DSP (another acronym for y’all ;) ) which bears many similarities - in design, UI and technical difficulties - to our DataLabs platform. Some cool features I saw were the ability to use Q/ArcGIS/MatLab directly within the jupyter environment with an embedded graphical desktop:

, the ability to load custom docker containers for setting up your environment, and the ability to access data stored across the organisation easily (something that’s a struggle with DataLabs due to it being hosted on JASMIN). The platform is available on GitHub and a test version is currently deployed on JASMIN which I’m looking forward to having a play with and sharing with the wider FDRI & UKCEH community.

Next we had a look at how we are both (UKCEH, NOC) trying to convert gridded NetCDF datasets to Zarr for use on object storage and then making those datasets available for easy visualisation over the web. Both key parts of the Gridded Data product. Key takeaways (🍕):

I’ve gone down the route of using Pangeo-forge-recipes and Apache Beam for the conversion part which abstracts away a lot of the complexity of these types of workflows,
Whereas NOC directly use Xarray backed with Dask, something I’d like to compare the performance of
Using TiTiler Tile Server as the backend for a React-based web GUI for visualising maps of Zarr datasets seemed promising, but with performance strongly dependent on the chunksize of the data, which we intend to benchmark.
Alternatively a very simple Streamlit static website can also be used to generate quick plots of the Zarr datasets

Then it was dinner time, which was so good I forgot to take a picture, soz 🤷‍♂️ (well, apart from the receipt…)

Do get in touch if any of this is of interest! You can follow and contribute to our joint work here

Finally, here’s one last picture of Liverpool, until next time 👋: PXL_20250325_214655460 NIGHT