2025 Week 19

It might be a short week with the bank holiday on monday and the FDRI annual meeting today, but still loads happening!

🐘 Postgres, PostGIS and geoserver setup

We’re expanding our technology stack with our first relational database implementation for the project. Given our team’s extensive experience with relational databases, this was a natural choice for supporting our GIS operations. The infrastructure has been deployed in our staging environment, but we’ve maintained production-ready standards by fully implementing it through Terraform and including a bastion instance for secure access management. It’s good to be adding a SQL db to our tech stack, as we already have other small usecases that can piggyback on this.

🌀 Merging Timeseries and Phenocam Dev teams

We are merging the Phenocam and Timeseries development teams. This restructuring makes practical sense as the Phenocam team had effectively become a single-person operation and we’ve already identified several workflow overlaps between these areas and anticipate that this integration will enhance knowledge sharing and lead to better software delivery!

🧑‍🤝‍🧑 UI Community Workshop

This week we had the first of the FDRI User Interface community workshops, where we are starting more focused engagement on the community. With the plan to feed this into the development process and continue continuous feedback throughout our development. If this sounds interesting to your register for the FDRI UX Workshops on 14 May.

🗺️ FDRI Annual Meeting

Today is the FDRI annual meeting, reflecting on year 1 across all work packages and planning/kicking off year 2. Unfortunately I wasn’t able to attend it in Wallingford, will provide an update next week.

🔢 Time series data processing

This week we had a technical deep dive mostly focused on the recent updates to time series data processing. I find these extremely useful and the engagement from the team is brilliant, we used to do these a lot at the start of the project but they dropped off a bit, but now is the time to bring them back.

Main changes to the actual time series pipeline were the ability to handle and configure the handling of missing data (For example, if we have two 1 minute data points on a given day, doing a mean aggregation would return the mean of those 2 values - even though we’d expect 1440 values for a full day.) There’s a few different ways users might want this to be handled which can be configured in options:

{"missing": 30} Aggregate only if there are no more than 30 values missing in the period.
{"available": 30} Aggregate only if there are at least 30 input values in the period.
{"percent": 30} Aggregate only if the data in the period is at least 30 percent complete (accepts integers or floats).

Read more about this on our documention site: https://nerc-ceh.github.io/time-stream/user_guide/timeseries_basics.html#aggregating-data

📱Asset Management Service Requirements Gathering

This week we have a summary meeting of all the requirements that have been gathered so far for this essential part of our systems. It’s been a mamoth task to collect this information from everybody and get it all in one place. This is being run by an external company (ARUP).

Gridded Data Update ♟️

This week in Gridded land it was the CHESS-Met dataset’s turn to get the pangeo-forge treatment and be converted to Zarr ready for upload to the steadily-growing object-store of cloud-optimized datasets ready for analysis. It’s been a bit of a chess-match, with the dataset putting up some interesting fights, but like any good champion I’ve learnt from these challenges and put some improvements into the data converter to handle more complex datasets, namely handling multi-variable datasets. As ever you can see my progress over on GitHub.