rainfallqc.checks.neighbourhood_checks

Quality control checks using neighbouring gauges to identify suspicious data.

Neighbourhood checks are QC checks that: “detect abnormalities in a gauges given measurements in neighbouring gauges.”

Classes and functions ordered by appearance in IntenseQC framework.

rainfallqc.checks.neighbourhood_checks.add_wet_flags_to_data(neighbour_data_diff, target_gauge_col, nearest_neighbour, expon_percentiles, wet_threshold)[source]

Add flags to data based on when target gauge is wetter than neighbour above certain exponential thresholds.

Parameters:
  • neighbour_data_diff (DataFrame) – Data with normalised diff to neighbour

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

  • expon_percentiles (dict) – Thresholds at percentile of fitted distribution (needs 0.95, 0.99 & 0.999)

  • wet_threshold (float) – Threshold for rainfall intensity in given time period

Return type:

DataFrame

Returns:

:
neighbour_data_wet_flags

Data with wet flags applied

rainfallqc.checks.neighbourhood_checks.check_daily_factor(neighbour_data, target_gauge_col, nearest_neighbour, averaging_method='mean')[source]

Daily factor difference between target and neighbouring gauge.

Flag: Scalar factor difference.

This is QC24 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Daily rainfall data with target and neighbouring gauge and time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

  • averaging_method (str) – Method to use to get average i.e. mean or median (default mean)

Return type:

float

Returns:

:
daily_factor

Average factor diff between target and neighbour

Raises:

ValueError – If averaging method not ‘mean’ or ‘median’

rainfallqc.checks.neighbourhood_checks.check_dry_neighbours_daily(neighbour_data, target_gauge_col, list_of_nearest_stations, min_n_neighbours, dry_period_days=15, n_neighbours_ignored=0)[source]

Identify suspicious dry periods by comparison to neighbour for daily data.

Flags (majority voting where flag is the highest value across all neighbours): 3, if >= 3 average number of wet days in neighbours during a dry period in target. 2, …if 2 days 1, …if 1 day 0, if not neighbours on average dry during dry target gauge period.

This is QC18 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

  • dry_period_days (int) – Length for of a “dry_spell” (default: 15 days)

  • n_neighbours_ignored (int) – Number of zero flags allowed for majority voting (default: 0)

  • list_of_nearest_stations (List[str])

Return type:

DataFrame

Returns:

:
data_w_dry_flags

Target data with dry flags

rainfallqc.checks.neighbourhood_checks.check_dry_neighbours_hourly(neighbour_data, target_gauge_col, list_of_nearest_stations, time_res, min_n_neighbours, dry_period_days=15, n_neighbours_ignored=0, hour_offset=0, min_count=None)[source]

Identify suspicious dry periods by comparison to neighbour for hourly or 15-min data.

Flags (majority voting where flag is the highest value across all neighbours): 3, if >= 3 average number of wet days in neighbours during a dry period in target. 2, …if 2 days 1, …if 1 day 0, if not neighbours on average dry during dry target gauge period.

This is QC19 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • time_res (str) – Time resolution of data (hourly or 15m)

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

  • dry_period_days (int) – Length for of a “dry_spell” (default: 15 days)

  • n_neighbours_ignored (int) – Number of zero flags allowed for majority voting (default: 0)

  • hour_offset (int) – Time offset of hourly data in hours (i.e. if 7am-7am, then set this to 7) (default: 0)

  • min_count (int) – Minimum number of time steps needed per time period (default: 1)

  • list_of_nearest_stations (List[str])

Return type:

DataFrame

Returns:

:
data_w_dry_flags

Target data with dry flags

rainfallqc.checks.neighbourhood_checks.check_monthly_factor(neighbour_data, target_gauge_col, nearest_neighbour)[source]

Monthly factor difference between target and neighbouring gauge.

Flags: 1, when ~10 x greater than neighbour monthly total 2, when ~25.4 x greater … 3, when ~2.54 x greater … 4, when ~10 x smaller than neighbour monthly total 5, when ~25.4 x smaller … 6, when ~2.54 x smaller … else, 0

This is QC25 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Daily rainfall data with target and neighbouring gauge and time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

Return type:

DataFrame

Returns:

:
monthly_factor_flag

Factor diff flags between target and neighbour

rainfallqc.checks.neighbourhood_checks.check_monthly_neighbours(neighbour_data, target_gauge_col, list_of_nearest_stations, time_res, min_n_neighbours, n_neighbours_ignored=0, hour_offset=0, min_count=None)[source]

Identify suspicious monthly totals by comparison to neighbouring monthly gauges.

Flags (majority voting where flag is the highest value across all neighbours): Flags -3 to 3 based on percentage difference: -3, -100% (i.e. gauge dry but neighbours not) -2, <= 50% -1, <= 25% 1, >= 25% 2, >= 50% 3, >= 100% Flags equal to 3 may be upgraded to: 4, >=1.25 x record maximum for all neighbours 5, >=2 x record maximum for all neighbours Or: 0, if not in extreme exceedance of neighbours

This is QC20 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • time_res (str) – Time resolution of data (e.g. ‘monthly’ or ‘daily’, ‘hourly’ or ‘15m’ - will be resampled to monthly)

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

  • n_neighbours_ignored (int) – Number of zero flags allowed for majority voting (default: 0)

  • hour_offset (int) – Time offset of hourly data in hours (i.e. if 7am-7am, then set this to 7) (default: 0)

  • min_count (int) – Minimum number of time steps needed per time period (default: will be half of possible time steps)

  • list_of_nearest_stations (List[str])

Return type:

DataFrame

Returns:

:
data_w_monthly_flags

Target data with monthly flags

rainfallqc.checks.neighbourhood_checks.check_nearest_neighbour_columns(neighbour_data, target_gauge_col, list_of_nearest_stations)[source]

Run checks of neighbouring gauge columns to check if there are any columns and if the target gauge is there.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of all neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • list_of_nearest_stations (list)

Raises:
  • ValueError – If there are no neighbouring gauges in the ‘list_of_nearest_stations’ list

  • AssertionError – If ‘target_gauge_col’ not in neighbour_data

Return type:

None

rainfallqc.checks.neighbourhood_checks.check_neighbour_affinity_index(neighbour_data, target_gauge_col, nearest_neighbour)[source]

Pre-QC Affinity index calculated between target and nearest neighbouring gauge.

Flag: Between 0-1 for affinity index

This is QC22 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data with target and neighbouring gauge and time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

Return type:

float

Returns:

:
affinity_index

Between 0 and 1

rainfallqc.checks.neighbourhood_checks.check_neighbour_correlation(neighbour_data, target_gauge_col, nearest_neighbour)[source]

Pre-QC pearson correlation calculated between target and neighbouring gauge.

Flag: Between -1 to +1 for pearson correlation coefficient

This is QC23 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data with target and neighbouring gauge and time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

Return type:

float

Returns:

:
r_squared

Between -1 to 1

rainfallqc.checks.neighbourhood_checks.check_timing_offset(neighbour_data, target_gauge_col, nearest_neighbour, time_res, offsets_to_check=(-1, 0, 1))[source]

Identify suspicious data offset using Affinity Index and correlation (r^2) between target and nearest neighbour.

Flags: -1, -1 day offset 0, no offset 1, +1 day offset

This is QC21 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data with target and neighbouring gauge and time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

  • time_res (str) – Time resolution of data

  • offsets_to_check (Iterable[int]) – Offset values to check (default: -1, 0, 1)

Return type:

int

Returns:

:
offset_flag

e.g. -1, 0 or 1

rainfallqc.checks.neighbourhood_checks.check_wet_neighbours_daily(neighbour_data, target_gauge_col, list_of_nearest_stations, wet_threshold, min_n_neighbours, n_neighbours_ignored=0)[source]

Identify suspicious large values by comparison to neighbour for daily data.

Flags (majority voting where flag is the highest value across all neighbours): 3, if normalised difference between target gauge and neighbours is above the 99.9th percentile 2, …if above 99th percentile 1, …if above 95th percentile 0, if not in extreme exceedance of neighbours

This is QC16 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • wet_threshold (int | float) – Threshold for rainfall intensity in given time period

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

  • n_neighbours_ignored (int) – Number of zero flags allowed for majority voting (default: 0)

  • list_of_nearest_stations (List[str])

Return type:

DataFrame

Returns:

:
data_w_wet_flags

Target data with wet flags

rainfallqc.checks.neighbourhood_checks.check_wet_neighbours_hourly(neighbour_data, target_gauge_col, list_of_nearest_stations, time_res, wet_threshold, min_n_neighbours, n_neighbours_ignored=0, hour_offset=0, min_count=None)[source]

Identify suspicious large values by comparison to neighbour for hourly or 15-min data.

Flags (majority voting where flag is the highest value across all neighbours): 3, if normalised difference between target gauge and neighbours is above the 99.9th percentile 2, …if above 99th percentile 1, …if above 95th percentile 0, if not in extreme exceedance of neighbours

This is QC17 from the IntenseQC framework.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • time_res (str) – Time resolution of data

  • wet_threshold (int | float) – Threshold for rainfall intensity in given time period

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

  • n_neighbours_ignored (int) – Number of zero flags allowed for majority voting (default: 0)

  • hour_offset (int) – Time offset of hourly data in hours (i.e. if 7am-7am, then set this to 7) (default: 0)

  • min_count (int) – Minimum number of time steps needed per time period (default: 2)

  • list_of_nearest_stations (List[str])

Return type:

DataFrame

Returns:

:
data_w_wet_flags

Target data with wet flags

rainfallqc.checks.neighbourhood_checks.filter_data_based_on_unusual_wetness(neighbour_data_diff, target_gauge_col, nearest_neighbour, wet_threshold)[source]

Filter data based on wet threshold.

Parameters:
  • neighbour_data_diff (DataFrame) – Data with normalised diff to neighbour

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

  • wet_threshold (float) – Threshold for rainfall intensity in given time period

Return type:

DataFrame

Returns:

:
filtered_diff

Data filtered to wet threshold and where diff is positive (thus more wet)

rainfallqc.checks.neighbourhood_checks.flag_dry_spell_fractions(one_neighbour_data, target_gauge_col, nearest_neighbour, proportion_of_dry_day_for_flags)[source]

Flag dry spell fractions.

Parameters:
  • one_neighbour_data (DataFrame) – Rainfall data of one neighbouring gauge with time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

  • proportion_of_dry_day_for_flags (dict) – Proportion of dry days needed to be flagged 1, 2, or 3

Return type:

DataFrame

Returns:

:
data_w_dry_spell_fraction

Target data with dry spell fractions

rainfallqc.checks.neighbourhood_checks.flag_monthly_factor_differences(monthly_factor)[source]

Flag monthly difference flag after IntenseQC framework for QC25.

Flags: 1, when ~10 x greater than neighbour monthly total 2, when ~25.4 x greater … 3, when ~2.54 x greater … 4, when ~10 x smaller than neighbour monthly total 5, when ~25.4 x smaller … 6, when ~2.54 x smaller … else, 0

Parameters:
  • monthly_factor (DataFrame) – Rainfall data with ‘factor_diff’ and gauge_col

  • target_gauge_col – Rain column

Return type:

DataFrame

Returns:

:
monthly_factor_w_flag

Rainfall data with flags based on monthly factor difference

rainfallqc.checks.neighbourhood_checks.flag_percentage_diff_of_neighbour(neighbour_data, nearest_neighbour)[source]

Flag percentage difference between target gauge and neighbouring gauge.

Flags -3 to 3 based on percentage difference: -3, -100% (i.e. gauge dry but neighbours not) -2, <= 50% -1, <= 25% 1, >= 25% 2, >= 50% 3, >= 100%

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of all neighbouring gauges with time col

  • nearest_neighbour: – Neighbouring gauge column

  • nearest_neighbour (str)

Return type:

DataFrame

Returns:

:
neighbour_data_w_flags

Data with perc_diff flags

rainfallqc.checks.neighbourhood_checks.flag_wet_day_errors_based_on_neighbours(neighbour_data, target_gauge_col, nearest_neighbour, wet_threshold)[source]

Flag wet days with errors based on the percentile difference with neighbouring gauge.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of all neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour: – Neighbouring gauge column

  • wet_threshold (float) – Threshold for rainfall intensity in given time period

  • nearest_neighbour (str)

Return type:

DataFrame

Returns:

:
neighbour_data_wet_flags

Data with wet flags

rainfallqc.checks.neighbourhood_checks.get_dry_spell_fraction_col(neighbour_data, target_gauge_col, nearest_neighbour, dry_period_days)[source]

Get dry spell fraction column.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour: – Neighbouring gauge column

  • dry_period_days (int) – Length for of a “dry_spell” (default: 15 days)

  • nearest_neighbour (str)

Return type:

DataFrame

Returns:

:
data_w_dry_spell_fraction

Target data with dry spell fractions

rainfallqc.checks.neighbourhood_checks.get_majority_positive_or_negative_flags(monthly_neighbour_data, list_of_nearest_stations, min_n_neighbours, n_neighbours_ignored)[source]

Get majority voted positive or negative flags i.e. get minimum positive flag, or maximum negative flag.

Parameters:
  • monthly_neighbour_data (DataFrame) – Monthly rainfall data of neighbouring gauges with time col

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

  • n_neighbours_ignored (int) – Number of zero flags allowed for majority voting

  • list_of_nearest_stations (list)

Return type:

DataFrame

Returns:

:
data_w_monthly_flag

Data with majority_monthly_flag

rainfallqc.checks.neighbourhood_checks.get_majority_voting_flag(neighbour_data, list_of_nearest_stations, min_n_neighbours, n_zeros_allowed, flag_col_prefix, new_flag_col_name, aggregation)[source]

Get the highest flag that is in all neighbours.

For this function, we introduce the ‘n_zeros_allowed’ parameter to allow for some leeway for problematic neighbours This stops a problematic neighbour that is similar to problematic target from stopping flagging.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • min_n_neighbours (int) – Minimum number of neighbours online that will be considered

  • n_zeros_allowed (int) – Number of zero flags allowed (default: 0)

  • flag_col_prefix (str) – Prefix for flag column e.g. “wet_flag_

  • new_flag_col_name (str) – New flag column name

  • aggregation (str) – “min” or “max”

  • list_of_nearest_stations (list[str])

Return type:

DataFrame

Returns:

:
neighbour_data_w_majority_wet_flag

Data with majority wet flag

rainfallqc.checks.neighbourhood_checks.make_neighbour_monthly_max_climatology(monthly_neighbour_data, list_of_nearest_stations)[source]

Make neighbourhood monthly max climatology.

Parameters:
  • monthly_neighbour_data (DataFrame) – Monthly rainfall data of neighbouring gauges with time col

  • list_of_nearest_stations: – List of columns with neighbouring gauges

  • list_of_nearest_stations (list)

Return type:

DataFrame

Returns:

:
data_w_monthly_flags

Target data with monthly flags

rainfallqc.checks.neighbourhood_checks.make_num_neighbours_online_col(neighbour_data, list_of_nearest_stations)[source]

Get number of neighbours online column.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of neighbouring gauges with time col

  • list_of_nearest_stations (list[str]) – Neighbouring columns to check if not null

Return type:

DataFrame

Returns:

:
neighbour_data_online_neighbours

Data with column for number of online neighbours

rainfallqc.checks.neighbourhood_checks.normalised_diff_between_target_neighbours(neighbour_data, target_gauge_col, nearest_neighbour)[source]

Normalised difference between target rain col and neighbouring rain col.

Parameters:
  • neighbour_data (DataFrame) – Rainfall data of all neighbouring gauges with time col

  • target_gauge_col (str) – Target gauge column

  • nearest_neighbour (str) – Neighbouring gauge column

Return type:

DataFrame

Returns:

:
neighbour_data_w_diff

Data with normalised diff to each neighbour

rainfallqc.checks.neighbourhood_checks.upgrade_monthly_flag_using_neighbour_max_climatology(monthly_neighbour_data_w_flags, target_gauge_col, min_n_neighbours)[source]

Upgrade flags to 4 and 5 flags for monthly neighbours in excess of neighbourhood monthly climatological max.

Parameters:
  • monthly_neighbour_data_w_flags (DataFrame) – Monthly rainfall data of neighbouring gauges with time col and ‘majority_monthly_flag’

  • target_gauge_col (str) – Target gauge column

  • min_n_neighbours (int) – Minimum number of neighbours needed to be checked for flag

Return type:

DataFrame

Returns:

:
data_w_monthly_flags

Target data with monthly flags

Functions

add_wet_flags_to_data(neighbour_data_diff, ...)

Add flags to data based on when target gauge is wetter than neighbour above certain exponential thresholds.

check_daily_factor(neighbour_data, ...[, ...])

Daily factor difference between target and neighbouring gauge.

check_monthly_factor(neighbour_data, ...)

Monthly factor difference between target and neighbouring gauge.

check_monthly_neighbours(neighbour_data, ...)

Identify suspicious monthly totals by comparison to neighbouring monthly gauges.

check_nearest_neighbour_columns(...)

Run checks of neighbouring gauge columns to check if there are any columns and if the target gauge is there.

check_neighbour_affinity_index(...)

Pre-QC Affinity index calculated between target and nearest neighbouring gauge.

check_neighbour_correlation(neighbour_data, ...)

Pre-QC pearson correlation calculated between target and neighbouring gauge.

check_timing_offset(neighbour_data, ...[, ...])

Identify suspicious data offset using Affinity Index and correlation (r^2) between target and nearest neighbour.

filter_data_based_on_unusual_wetness(...)

Filter data based on wet threshold.

flag_dry_spell_fractions(one_neighbour_data, ...)

Flag dry spell fractions.

flag_monthly_factor_differences(monthly_factor)

Flag monthly difference flag after IntenseQC framework for QC25.

flag_percentage_diff_of_neighbour(...)

Flag percentage difference between target gauge and neighbouring gauge.

flag_wet_day_errors_based_on_neighbours(...)

Flag wet days with errors based on the percentile difference with neighbouring gauge.

get_dry_spell_fraction_col(neighbour_data, ...)

Get dry spell fraction column.

get_majority_positive_or_negative_flags(...)

Get majority voted positive or negative flags i.e. get minimum positive flag, or maximum negative flag.

get_majority_voting_flag(neighbour_data, ...)

Get the highest flag that is in all neighbours.

make_neighbour_monthly_max_climatology(...)

Make neighbourhood monthly max climatology.

make_num_neighbours_online_col(...)

Get number of neighbours online column.

normalised_diff_between_target_neighbours(...)

Normalised difference between target rain col and neighbouring rain col.

upgrade_monthly_flag_using_neighbour_max_climatology(...)

Upgrade flags to 4 and 5 flags for monthly neighbours in excess of neighbourhood monthly climatological max.