rainfallqc.checks.pypwsqc_filters

Quality control checks translated from the pyPWSQC framework (https://pypwsqc.readthedocs.io/en/latest/).

The PWSQC framework includes filters originally develop for automated PWS within the COST Action OPENSENSE.

run_’ and ‘check_’ relate to the algorithms from pyPWSQC.

Functions are ordered alphabetically.

rainfallqc.checks.pypwsqc_filters.check_faulty_zeros(neighbour_data, neighbour_metadata, neighbouring_gauge_ids, neighbour_metadata_gauge_id_col, time_res, projection, nint, n_stat, max_distance_for_neighbours=10000.0, time_units='seconds since 1970-01-01 00:00:00', rainfall_attributes={'coverage_contant_type': 'physicalMeasurement', 'long_name': 'rainfall amount per time unit', 'name': 'rainfall', 'units': 'mm'}, lat_lon_attributes={'unit': 'degrees in WGS84 projection'}, global_attributes=None)[source]

Will flag faulty zeros based on neighbours …

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • neighbour_metadata (DataFrame) – Metadata for the rainfall data with ‘latitude’ and ‘longitude’

  • neighbour_metadata_gauge_id_col (str) – Column with the gauge ID

  • target_gauge_col – Target gauge column

  • neighbouring_gauge_ids: – List of ids with neighbouring gauges

  • time_res (str) – Time resolution of data

  • projection (str) – cartesian/metric coordinate system

  • nint (int) – Number of intervals

  • n_stat (int) – Number of stations

  • max_distance_for_neighbours (int | float) – Maximum distance to consider for neighbours

  • time_units (str) – Units and encoding of the ‘time’ column

  • rainfall_attributes (dict) – Attributes for rainfall in the xarray Dataset

  • lat_lon_attributes (dict) – Attributes for lat and lon in the xarray Dataset

  • global_attributes (dict) – Global attributes for xarray Dataset

  • neighbouring_gauge_ids (List[str])

Return type:

Dataset

Returns:

:
neighbour_data_ds_filtered

Data with flags for faulty zeros

Examples

available at: https://pypwsqc.readthedocs.io/en/latest/notebooks/merged_filters.html

rainfallqc.checks.pypwsqc_filters.check_high_influx_filter(neighbour_data)[source]

High influx filter.

Parameters:

neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

Return type:

None

Returns:

:
neighbour_data

todo

rainfallqc.checks.pypwsqc_filters.check_station_outlier(neighbour_data, neighbour_metadata, neighbouring_gauge_ids, neighbour_metadata_gauge_id_col, time_res, projection, evaluation_period, mmatch, gamma, n_stat, max_distance_for_neighbours=10000.0, time_units='seconds since 1970-01-01 00:00:00', rainfall_attributes={'coverage_contant_type': 'physicalMeasurement', 'long_name': 'rainfall amount per time unit', 'name': 'rainfall', 'units': 'mm'}, lat_lon_attributes={'unit': 'degrees in WGS84 projection'}, global_attributes=None)[source]

Station outlier.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • neighbour_metadata (DataFrame) – Metadata for the rainfall data with ‘latitude’ and ‘longitude’

  • neighbour_metadata_gauge_id_col (str) – Column with the gauge ID

  • target_gauge_col – Target gauge column

  • neighbouring_gauge_ids: – List of ids with neighbouring gauges

  • time_res (str) – Time resolution of data

  • projection (str) – cartesian/metric coordinate system

  • evaluation_period (int) – length of (rolling) window for correlation calculation

  • mmatch (int) – threshold for number of matching rainy intervals in evaluation period

  • gamma (float) – threshold for rolling median pearson correlation

  • n_stat (int) – Number of stations

  • max_distance_for_neighbours (int | float) – Maximum distance to consider for neighbours

  • time_units (str) – Units and encoding of the ‘time’ column

  • rainfall_attributes (dict) – Attributes for rainfall in the xarray Dataset

  • lat_lon_attributes (dict) – Attributes for lat and lon in the xarray Dataset

  • global_attributes (dict) – Global attributes for xarray Dataset

  • neighbouring_gauge_ids (List[str])

Return type:

Dataset

Returns:

:
neighbour_data_ds_filtered

Data with flags for station outliers

Examples

available at: https://pypwsqc.readthedocs.io/en/latest/notebooks/merged_filters.html

rainfallqc.checks.pypwsqc_filters.compute_distance_matrix(neighbour_data_ds)[source]

Compute a distance matrix.

Parameters:

neighbour_data_ds (Dataset) – xarray dataset of neighbour data

Return type:

Dataset

Returns:

:
distance_matrix

A distance matrix of all neighbouring gauges

rainfallqc.checks.pypwsqc_filters.convert_neighbour_data_to_xarray(neighbour_data, neighbour_metadata, projection, time_units='seconds since 1970-01-01 00:00:00', rainfall_attributes={'coverage_contant_type': 'physicalMeasurement', 'long_name': 'rainfall amount per time unit', 'name': 'rainfall', 'units': 'mm'}, lat_lon_attributes={'unit': 'degrees in WGS84 projection'}, global_attributes=None)[source]

Convert neighbour data in polars format to xarray dataset.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • neighbour_metadata (DataFrame) – Metadata for the rainfall data with ‘latitude’ and ‘longitude’

  • projection (str) – cartesian/metric coordinate system

  • time_units (str) – Units and encoding of the ‘time’ column

  • rainfall_attributes (dict) – Attributes for rainfall in the xarray Dataset

  • lat_lon_attributes (dict) – Attributes for lat and lon in the xarray Dataset

  • global_attributes (dict) – Global attributes for xarray Dataset

Return type:

Dataset

Returns:

:
neighbour_data_ds

xarray dataset with assigned attributes

rainfallqc.checks.pypwsqc_filters.run_bias_correction(neighbour_data)[source]

Bias correction.

Parameters:

neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

Return type:

None

Returns:

:
neighbour_data

todo

rainfallqc.checks.pypwsqc_filters.run_event_based_filter(neighbour_data)[source]

Event based filter (EBF).

Parameters:

neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

Return type:

None

Returns:

:
neighbour_data

todo

rainfallqc.checks.pypwsqc_filters.run_indicator_correlation(neighbour_data)[source]

Run indicator correlation.

Parameters:

neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

Return type:

None

Returns:

:
neighbour_data

todo

rainfallqc.checks.pypwsqc_filters.run_peak_removal(neighbour_data)[source]

Peak removal.

Parameters:

neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

Return type:

None

Returns:

:
neighbour_data

todo

rainfallqc.checks.pypwsqc_filters.subset_distance_matrix(neighbour_data_ds, distance_matrix, max_distance_for_neighbours)[source]

Compute a distance matrix.

Parameters:
  • neighbour_data_ds (Dataset) – xarray dataset of neighbour data

  • distance_matrix (Dataset) – A distance matrix of all neighbouring gauges

  • max_distance_for_neighbours (int | float) – Maximum distance to consider for neighbours

Return type:

Dataset

Returns:

:
neighbour_data_ds

A distance matrix of all neighbouring gauges

Functions

check_faulty_zeros(neighbour_data, ...[, ...])

Will flag faulty zeros based on neighbours ...

check_high_influx_filter(neighbour_data)

High influx filter.

check_station_outlier(neighbour_data, ...[, ...])

Station outlier.

compute_distance_matrix(neighbour_data_ds)

Compute a distance matrix.

convert_neighbour_data_to_xarray(...[, ...])

Convert neighbour data in polars format to xarray dataset.

run_bias_correction(neighbour_data)

Bias correction.

run_event_based_filter(neighbour_data)

Event based filter (EBF).

run_indicator_correlation(neighbour_data)

Run indicator correlation.

run_peak_removal(neighbour_data)

Peak removal.

subset_distance_matrix(neighbour_data_ds, ...)

Compute a distance matrix.