rainfallqc.checks.gauge_checks

Quality control checks examining suspicious rain gauges.

Gauge checks are defined as QC checks that: “detect abnormalities in summary and descriptive statistics of rain gauges.”

Classes and functions ordered by appearance in IntenseQC framework.

rainfallqc.checks.gauge_checks.check_breakpoints(data, target_gauge_col, p_threshold=0.01)[source]

Use a Pettitt test rainfall data to check for breakpoints.

This is QC6 from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Rainfall data.

  • target_gauge_col (str) – Column with rainfall data.

  • p_threshold (float) – Significance level for the test.

Return type:

int

Returns:

:
flagint

1 if breakpoint is detected (p < p_threshold), 0 otherwise

rainfallqc.checks.gauge_checks.check_intermittency(data, target_gauge_col, no_data_threshold=2, annual_count_threshold=5)[source]

Return years where more than five periods of missing data are bounded by zeros.

TODO: split into multiple sub-functions and write more tests! This is QC5 from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • no_data_threshold (int) – Number of missing values needed to be counted as a no data period (default: 2 (days))

  • annual_count_threshold (int) – Number of missing data periods above no_data_threshold per year (default: 5)

Return type:

list

Returns:

:
years_w_intermittency

List of years with intermittency issues.

rainfallqc.checks.gauge_checks.check_min_val_change(data, target_gauge_col, expected_min_val)[source]

Return years when the minimum recorded value changes.

Used to determine whether there are possible changes to the measuring equipment. This is QC7 from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data.

  • expected_min_val (float) – Expected value of rainfall i.e. basically the resolution of data.

Return type:

list

Returns:

:
yr_list

List of years with minimum value changes.

rainfallqc.checks.gauge_checks.check_temporal_bias(data, target_gauge_col, time_granularity, p_threshold=0.01)[source]

Perform a two-sided t-test on the distribution of mean rainfall over time slices.

This check performs less well when using less data.

This is QC3 (day of week bias) and QC4 (hour-of-day bias) from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • time_granularity (str) – Temporal grouping, either ‘weekday’ or ‘hour’

  • p_threshold (float) – Significance level for the test

Return type:

int

Returns:

:
flagint

1 if bias is detected (p < threshold), 0 otherwise

rainfallqc.checks.gauge_checks.check_years_where_annual_kth_largest_value_is_zero(data, target_gauge_col, k)[source]

Return list of years where the k-th largest value is 0.

This is QC2 from the IntenseQC framework

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • k (int) – Number of the largest values to take for a given year i.e. k==5 is top 5

Return type:

list

Returns:

:
year_list

List of years where k-largest value is zero.

rainfallqc.checks.gauge_checks.check_years_where_nth_percentile_is_zero(data, target_gauge_col, percentile)[source]

Return years where the n-th percentiles is zero.

This is QC1 from the IntenseQC framework

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • percentile (float) – Between 1 & 100

Return type:

list

Returns:

:
year_list

List of years where n-th percentile is zero.

Functions

check_breakpoints(data, target_gauge_col[, ...])

Use a Pettitt test rainfall data to check for breakpoints.

check_intermittency(data, target_gauge_col)

Return years where more than five periods of missing data are bounded by zeros.

check_min_val_change(data, target_gauge_col, ...)

Return years when the minimum recorded value changes.

check_temporal_bias(data, target_gauge_col, ...)

Perform a two-sided t-test on the distribution of mean rainfall over time slices.

check_years_where_annual_kth_largest_value_is_zero(...)

Return list of years where the k-th largest value is 0.

check_years_where_nth_percentile_is_zero(...)

Return years where the n-th percentiles is zero.