1. Tutorial overview

RainfallQC contains five modules:

  1. gauge_checks - For detecting abnormalities in summary and descriptive statistics.

  2. comparison_checks - For detecting abnormalities by comparing to benchmark data.

  3. timeseries_checks - For detecting abnormalities in patterns of the data record.

  4. neighbourhood_checks - For detecting abnormalities based on measurements in neighbouring gauges.

  5. pypwsqc_filters - For applying quality assurance protocols and filters for rainfall data from pyPWSQC

Each one of these modules contains individual QC check methods, which begin with the syntax check_. For example to run a QC check that will check whether there are streaks of repeating values in your data, you can run: timeseries_checks.check_streaks(data, **kwargs).

Various example of errors in rainfall data can be viewed on this web map.

1.1. Getting started

1.1.1. What’s the format of your data?

How you use RainfallQC will depend on the format of your data. The table below outlines a few potential formats and how to use RainfallQC with them.

Data format

See…

Notes

Single rain gauge (e.g. 1 CSV)

Example 1

All RainfallQC checks were built to run on tabular data

Rain gauge network data (e.g. 1 CSV with multiple columns)

Example 2

You will need to define which gauges are considered neighbouring to a target gauge. Therefore you also need metadata with gauge locations.

Rain gauge network data (multiple file paths)

Example 3

Load in metadata with gauge locations, then read in only nearby gauges to a given target.

Rain gauge data as xarray Dataset

Example 6

If your data is in NetCDF format, for example. Be careful as you will lose metadata.

Tabular data you want to convert to xarray for pyPWSQC

Example 7

Required if you want to run pyPWSQC methods, but your data is CSVs. Sets your data’s time format and projection using defaults to create metadata.

1.1.2. Which scenario best suits you?

Do you have a single rain gauge, or a whole network? Do you want to run a single check or use RainfallQC as part of a data processing pipeline? The table below outlines some common scenarios and advice on how to proceed.

Scenario

Advice

Running a single QC check

See Examples 1, 2 and 3.

Running multiple QC checks on a single gauge

Use the .apply_qc_framework() method. See Example 4 below.

Running multiple QC checks on multiple gauges

Use the .apply_qc_framework() method in a loop and store a summary. See Example 5 below.

Defining your own sensitivity analysis

You will need to create your own qc_framework specs. See Example 8 below.


Real coding example for RainfallQC are available as Jupyter Notebooks at: https://github.com/Thomasjkeel/RainfallQC-notebooks/tree/main