rainfallqc.checks.neighbourhood_checks¶
Quality control checks using neighbouring gauges to identify suspicious data.
Neighbourhood checks are QC checks that: “detect abnormalities in a gauges given measurements in neighbouring gauges.”
Classes and functions ordered by appearance in IntenseQC framework.
- rainfallqc.checks.neighbourhood_checks.add_wet_flags_to_data(neighbour_data_diff, target_gauge_col, nearest_neighbour, expon_percentiles, wet_threshold)[source]¶
Add flags to data based on when target gauge is wetter than neighbour above certain exponential thresholds.
- Parameters:
neighbour_data_diff (
DataFrame) – Data with normalised diff to neighbourtarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge columnexpon_percentiles (
dict) – Thresholds at percentile of fitted distribution (needs 0.95, 0.99 & 0.999)wet_threshold (
float) – Threshold for rainfall intensity in given time period
- Return type:
DataFrame- Returns:
- :
- neighbour_data_wet_flags
Data with wet flags applied
- rainfallqc.checks.neighbourhood_checks.check_daily_factor(neighbour_data, target_gauge_col, nearest_neighbour, averaging_method='mean')[source]¶
Daily factor difference between target and neighbouring gauge.
Flag: Scalar factor difference.
This is QC24 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Daily rainfall data with target and neighbouring gauge and time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge columnaveraging_method (
str) – Method to use to get average i.e. mean or median (default mean)
- Return type:
float- Returns:
- :
- daily_factor
Average factor diff between target and neighbour
- Raises:
ValueError – If averaging method not ‘mean’ or ‘median’
- rainfallqc.checks.neighbourhood_checks.check_dry_neighbours_daily(neighbour_data, target_gauge_col, list_of_nearest_stations, min_n_neighbours, dry_period_days=15, n_neighbours_ignored=0)[source]¶
Identify suspicious dry periods by comparison to neighbour for daily data.
Flags (majority voting where flag is the highest value across all neighbours): 3, if >= 3 average number of wet days in neighbours during a dry period in target. 2, …if 2 days 1, …if 1 day 0, if not neighbours on average dry during dry target gauge period.
This is QC18 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnlist_of_nearest_stations: – List of columns with neighbouring gauges
min_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flagdry_period_days (
int) – Length for of a “dry_spell” (default: 15 days)n_neighbours_ignored (
int) – Number of zero flags allowed for majority voting (default: 0)list_of_nearest_stations (
List[str])
- Return type:
DataFrame- Returns:
- :
- data_w_dry_flags
Target data with dry flags
- rainfallqc.checks.neighbourhood_checks.check_dry_neighbours_hourly(neighbour_data, target_gauge_col, list_of_nearest_stations, time_res, min_n_neighbours, dry_period_days=15, n_neighbours_ignored=0, hour_offset=0, min_count=None)[source]¶
Identify suspicious dry periods by comparison to neighbour for hourly or 15-min data.
Flags (majority voting where flag is the highest value across all neighbours): 3, if >= 3 average number of wet days in neighbours during a dry period in target. 2, …if 2 days 1, …if 1 day 0, if not neighbours on average dry during dry target gauge period.
This is QC19 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnlist_of_nearest_stations: – List of columns with neighbouring gauges
time_res (
str) – Time resolution of data (hourly or 15m)min_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flagdry_period_days (
int) – Length for of a “dry_spell” (default: 15 days)n_neighbours_ignored (
int) – Number of zero flags allowed for majority voting (default: 0)hour_offset (
int) – Time offset of hourly data in hours (i.e. if 7am-7am, then set this to 7) (default: 0)min_count (
int) – Minimum number of time steps needed per time period (default: 1)list_of_nearest_stations (
List[str])
- Return type:
DataFrame- Returns:
- :
- data_w_dry_flags
Target data with dry flags
- rainfallqc.checks.neighbourhood_checks.check_monthly_factor(neighbour_data, target_gauge_col, nearest_neighbour)[source]¶
Monthly factor difference between target and neighbouring gauge.
Flags: 1, when ~10 x greater than neighbour monthly total 2, when ~25.4 x greater … 3, when ~2.54 x greater … 4, when ~10 x smaller than neighbour monthly total 5, when ~25.4 x smaller … 6, when ~2.54 x smaller … else, 0
This is QC25 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Daily rainfall data with target and neighbouring gauge and time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge column
- Return type:
DataFrame- Returns:
- :
- monthly_factor_flag
Factor diff flags between target and neighbour
- rainfallqc.checks.neighbourhood_checks.check_monthly_neighbours(neighbour_data, target_gauge_col, list_of_nearest_stations, time_res, min_n_neighbours, n_neighbours_ignored=0, hour_offset=0, min_count=None)[source]¶
Identify suspicious monthly totals by comparison to neighbouring monthly gauges.
Flags (majority voting where flag is the highest value across all neighbours): Flags -3 to 3 based on percentage difference: -3, -100% (i.e. gauge dry but neighbours not) -2, <= 50% -1, <= 25% 1, >= 25% 2, >= 50% 3, >= 100% Flags equal to 3 may be upgraded to: 4, >=1.25 x record maximum for all neighbours 5, >=2 x record maximum for all neighbours Or: 0, if not in extreme exceedance of neighbours
This is QC20 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnlist_of_nearest_stations: – List of columns with neighbouring gauges
time_res (
str) – Time resolution of data (e.g. ‘monthly’ or ‘daily’, ‘hourly’ or ‘15m’ - will be resampled to monthly)min_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flagn_neighbours_ignored (
int) – Number of zero flags allowed for majority voting (default: 0)hour_offset (
int) – Time offset of hourly data in hours (i.e. if 7am-7am, then set this to 7) (default: 0)min_count (
int) – Minimum number of time steps needed per time period (default: will be half of possible time steps)list_of_nearest_stations (
List[str])
- Return type:
DataFrame- Returns:
- :
- data_w_monthly_flags
Target data with monthly flags
- rainfallqc.checks.neighbourhood_checks.check_nearest_neighbour_columns(neighbour_data, target_gauge_col, list_of_nearest_stations)[source]¶
Run checks of neighbouring gauge columns to check if there are any columns and if the target gauge is there.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of all neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnlist_of_nearest_stations: – List of columns with neighbouring gauges
list_of_nearest_stations (
list)
- Raises:
ValueError – If there are no neighbouring gauges in the ‘list_of_nearest_stations’ list
AssertionError – If ‘target_gauge_col’ not in neighbour_data
- Return type:
None
- rainfallqc.checks.neighbourhood_checks.check_neighbour_affinity_index(neighbour_data, target_gauge_col, nearest_neighbour)[source]¶
Pre-QC Affinity index calculated between target and nearest neighbouring gauge.
Flag: Between 0-1 for affinity index
This is QC22 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data with target and neighbouring gauge and time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge column
- Return type:
float- Returns:
- :
- affinity_index
Between 0 and 1
- rainfallqc.checks.neighbourhood_checks.check_neighbour_correlation(neighbour_data, target_gauge_col, nearest_neighbour)[source]¶
Pre-QC pearson correlation calculated between target and neighbouring gauge.
Flag: Between -1 to +1 for pearson correlation coefficient
This is QC23 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data with target and neighbouring gauge and time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge column
- Return type:
float- Returns:
- :
- r_squared
Between -1 to 1
- rainfallqc.checks.neighbourhood_checks.check_timing_offset(neighbour_data, target_gauge_col, nearest_neighbour, time_res, offsets_to_check=(-1, 0, 1))[source]¶
Identify suspicious data offset using Affinity Index and correlation (r^2) between target and nearest neighbour.
Flags: -1, -1 day offset 0, no offset 1, +1 day offset
This is QC21 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data with target and neighbouring gauge and time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge columntime_res (
str) – Time resolution of dataoffsets_to_check (
Iterable[int]) – Offset values to check (default: -1, 0, 1)
- Return type:
int- Returns:
- :
- offset_flag
e.g. -1, 0 or 1
- rainfallqc.checks.neighbourhood_checks.check_wet_neighbours_daily(neighbour_data, target_gauge_col, list_of_nearest_stations, wet_threshold, min_n_neighbours, n_neighbours_ignored=0)[source]¶
Identify suspicious large values by comparison to neighbour for daily data.
Flags (majority voting where flag is the highest value across all neighbours): 3, if normalised difference between target gauge and neighbours is above the 99.9th percentile 2, …if above 99th percentile 1, …if above 95th percentile 0, if not in extreme exceedance of neighbours
This is QC16 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnlist_of_nearest_stations: – List of columns with neighbouring gauges
wet_threshold (
int|float) – Threshold for rainfall intensity in given time periodmin_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flagn_neighbours_ignored (
int) – Number of zero flags allowed for majority voting (default: 0)list_of_nearest_stations (
List[str])
- Return type:
DataFrame- Returns:
- :
- data_w_wet_flags
Target data with wet flags
- rainfallqc.checks.neighbourhood_checks.check_wet_neighbours_hourly(neighbour_data, target_gauge_col, list_of_nearest_stations, time_res, wet_threshold, min_n_neighbours, n_neighbours_ignored=0, hour_offset=0, min_count=None)[source]¶
Identify suspicious large values by comparison to neighbour for hourly or 15-min data.
Flags (majority voting where flag is the highest value across all neighbours): 3, if normalised difference between target gauge and neighbours is above the 99.9th percentile 2, …if above 99th percentile 1, …if above 95th percentile 0, if not in extreme exceedance of neighbours
This is QC17 from the IntenseQC framework.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnlist_of_nearest_stations: – List of columns with neighbouring gauges
time_res (
str) – Time resolution of datawet_threshold (
int|float) – Threshold for rainfall intensity in given time periodmin_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flagn_neighbours_ignored (
int) – Number of zero flags allowed for majority voting (default: 0)hour_offset (
int) – Time offset of hourly data in hours (i.e. if 7am-7am, then set this to 7) (default: 0)min_count (
int) – Minimum number of time steps needed per time period (default: 2)list_of_nearest_stations (
List[str])
- Return type:
DataFrame- Returns:
- :
- data_w_wet_flags
Target data with wet flags
- rainfallqc.checks.neighbourhood_checks.filter_data_based_on_unusual_wetness(neighbour_data_diff, target_gauge_col, nearest_neighbour, wet_threshold)[source]¶
Filter data based on wet threshold.
- Parameters:
neighbour_data_diff (
DataFrame) – Data with normalised diff to neighbourtarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge columnwet_threshold (
float) – Threshold for rainfall intensity in given time period
- Return type:
DataFrame- Returns:
- :
- filtered_diff
Data filtered to wet threshold and where diff is positive (thus more wet)
- rainfallqc.checks.neighbourhood_checks.flag_dry_spell_fractions(one_neighbour_data, target_gauge_col, nearest_neighbour, proportion_of_dry_day_for_flags)[source]¶
Flag dry spell fractions.
- Parameters:
one_neighbour_data (
DataFrame) – Rainfall data of one neighbouring gauge with time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge columnproportion_of_dry_day_for_flags (
dict) – Proportion of dry days needed to be flagged 1, 2, or 3
- Return type:
DataFrame- Returns:
- :
- data_w_dry_spell_fraction
Target data with dry spell fractions
- rainfallqc.checks.neighbourhood_checks.flag_monthly_factor_differences(monthly_factor)[source]¶
Flag monthly difference flag after IntenseQC framework for QC25.
Flags: 1, when ~10 x greater than neighbour monthly total 2, when ~25.4 x greater … 3, when ~2.54 x greater … 4, when ~10 x smaller than neighbour monthly total 5, when ~25.4 x smaller … 6, when ~2.54 x smaller … else, 0
- Parameters:
monthly_factor (
DataFrame) – Rainfall data with ‘factor_diff’ and gauge_coltarget_gauge_col – Rain column
- Return type:
DataFrame- Returns:
- :
- monthly_factor_w_flag
Rainfall data with flags based on monthly factor difference
- rainfallqc.checks.neighbourhood_checks.flag_percentage_diff_of_neighbour(neighbour_data, nearest_neighbour)[source]¶
Flag percentage difference between target gauge and neighbouring gauge.
Flags -3 to 3 based on percentage difference: -3, -100% (i.e. gauge dry but neighbours not) -2, <= 50% -1, <= 25% 1, >= 25% 2, >= 50% 3, >= 100%
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of all neighbouring gauges with time colnearest_neighbour: – Neighbouring gauge column
nearest_neighbour (
str)
- Return type:
DataFrame- Returns:
- :
- neighbour_data_w_flags
Data with perc_diff flags
- rainfallqc.checks.neighbourhood_checks.flag_wet_day_errors_based_on_neighbours(neighbour_data, target_gauge_col, nearest_neighbour, wet_threshold)[source]¶
Flag wet days with errors based on the percentile difference with neighbouring gauge.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of all neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour: – Neighbouring gauge column
wet_threshold (
float) – Threshold for rainfall intensity in given time periodnearest_neighbour (
str)
- Return type:
DataFrame- Returns:
- :
- neighbour_data_wet_flags
Data with wet flags
- rainfallqc.checks.neighbourhood_checks.get_dry_spell_fraction_col(neighbour_data, target_gauge_col, nearest_neighbour, dry_period_days)[source]¶
Get dry spell fraction column.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour: – Neighbouring gauge column
dry_period_days (
int) – Length for of a “dry_spell” (default: 15 days)nearest_neighbour (
str)
- Return type:
DataFrame- Returns:
- :
- data_w_dry_spell_fraction
Target data with dry spell fractions
- rainfallqc.checks.neighbourhood_checks.get_majority_positive_or_negative_flags(monthly_neighbour_data, list_of_nearest_stations, min_n_neighbours, n_neighbours_ignored)[source]¶
Get majority voted positive or negative flags i.e. get minimum positive flag, or maximum negative flag.
- Parameters:
monthly_neighbour_data (
DataFrame) – Monthly rainfall data of neighbouring gauges with time collist_of_nearest_stations: – List of columns with neighbouring gauges
min_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flagn_neighbours_ignored (
int) – Number of zero flags allowed for majority votinglist_of_nearest_stations (
list)
- Return type:
DataFrame- Returns:
- :
- data_w_monthly_flag
Data with majority_monthly_flag
- rainfallqc.checks.neighbourhood_checks.get_majority_voting_flag(neighbour_data, list_of_nearest_stations, min_n_neighbours, n_zeros_allowed, flag_col_prefix, new_flag_col_name, aggregation)[source]¶
Get the highest flag that is in all neighbours.
For this function, we introduce the ‘n_zeros_allowed’ parameter to allow for some leeway for problematic neighbours This stops a problematic neighbour that is similar to problematic target from stopping flagging.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time collist_of_nearest_stations: – List of columns with neighbouring gauges
min_n_neighbours (
int) – Minimum number of neighbours online that will be consideredn_zeros_allowed (
int) – Number of zero flags allowed (default: 0)flag_col_prefix (
str) – Prefix for flag column e.g. “wet_flag_”new_flag_col_name (
str) – New flag column nameaggregation (
str) – “min” or “max”list_of_nearest_stations (
list[str])
- Return type:
DataFrame- Returns:
- :
- neighbour_data_w_majority_wet_flag
Data with majority wet flag
- rainfallqc.checks.neighbourhood_checks.make_neighbour_monthly_max_climatology(monthly_neighbour_data, list_of_nearest_stations)[source]¶
Make neighbourhood monthly max climatology.
- Parameters:
monthly_neighbour_data (
DataFrame) – Monthly rainfall data of neighbouring gauges with time collist_of_nearest_stations: – List of columns with neighbouring gauges
list_of_nearest_stations (
list)
- Return type:
DataFrame- Returns:
- :
- data_w_monthly_flags
Target data with monthly flags
- rainfallqc.checks.neighbourhood_checks.make_num_neighbours_online_col(neighbour_data, list_of_nearest_stations)[source]¶
Get number of neighbours online column.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of neighbouring gauges with time collist_of_nearest_stations (
list[str]) – Neighbouring columns to check if not null
- Return type:
DataFrame- Returns:
- :
- neighbour_data_online_neighbours
Data with column for number of online neighbours
- rainfallqc.checks.neighbourhood_checks.normalised_diff_between_target_neighbours(neighbour_data, target_gauge_col, nearest_neighbour)[source]¶
Normalised difference between target rain col and neighbouring rain col.
- Parameters:
neighbour_data (
DataFrame) – Rainfall data of all neighbouring gauges with time coltarget_gauge_col (
str) – Target gauge columnnearest_neighbour (
str) – Neighbouring gauge column
- Return type:
DataFrame- Returns:
- :
- neighbour_data_w_diff
Data with normalised diff to each neighbour
- rainfallqc.checks.neighbourhood_checks.upgrade_monthly_flag_using_neighbour_max_climatology(monthly_neighbour_data_w_flags, target_gauge_col, min_n_neighbours)[source]¶
Upgrade flags to 4 and 5 flags for monthly neighbours in excess of neighbourhood monthly climatological max.
- Parameters:
monthly_neighbour_data_w_flags (
DataFrame) – Monthly rainfall data of neighbouring gauges with time col and ‘majority_monthly_flag’target_gauge_col (
str) – Target gauge columnmin_n_neighbours (
int) – Minimum number of neighbours needed to be checked for flag
- Return type:
DataFrame- Returns:
- :
- data_w_monthly_flags
Target data with monthly flags
Functions¶
|
Add flags to data based on when target gauge is wetter than neighbour above certain exponential thresholds. |
|
Daily factor difference between target and neighbouring gauge. |
|
Monthly factor difference between target and neighbouring gauge. |
|
Identify suspicious monthly totals by comparison to neighbouring monthly gauges. |
Run checks of neighbouring gauge columns to check if there are any columns and if the target gauge is there. |
|
Pre-QC Affinity index calculated between target and nearest neighbouring gauge. |
|
|
Pre-QC pearson correlation calculated between target and neighbouring gauge. |
|
Identify suspicious data offset using Affinity Index and correlation (r^2) between target and nearest neighbour. |
Filter data based on wet threshold. |
|
|
Flag dry spell fractions. |
|
Flag monthly difference flag after IntenseQC framework for QC25. |
Flag percentage difference between target gauge and neighbouring gauge. |
|
Flag wet days with errors based on the percentile difference with neighbouring gauge. |
|
|
Get dry spell fraction column. |
Get majority voted positive or negative flags i.e. get minimum positive flag, or maximum negative flag. |
|
|
Get the highest flag that is in all neighbours. |
Make neighbourhood monthly max climatology. |
|
Get number of neighbours online column. |
|
Normalised difference between target rain col and neighbouring rain col. |
|
Upgrade flags to 4 and 5 flags for monthly neighbours in excess of neighbourhood monthly climatological max. |