rainfallqc.checks.timeseries_checks

Quality control checks based on suspicious time-series artefacts.

Time-series checks are defined as QC checks that: “detect abnormalities in patterns of the data record.”

Classes and functions ordered by appearance in IntenseQC framework.

rainfallqc.checks.timeseries_checks.check_daily_accumulations(data, target_gauge_col, gauge_lat, gauge_lon, wet_day_threshold=1.0, accumulation_multiplying_factor=2.0, accumulation_threshold=None)[source]

Identify suspicious periods where an hour of rainfall is preceded by 23 hours with no rain.

Uses a simple precipitation intensity index (SDII) from ETCCDI.

This is QC13 from the IntenseQC framework.

Please see ‘Notes’ below for any additional information about the implementation of this method.

Parameters:
  • data (DataFrame) – Hourly or 15-min rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • gauge_lat (int | float) – latitude of the rain gauge

  • gauge_lon (int | float) – longitude of the rain gauge

  • wet_day_threshold (int | float) – Threshold for rainfall intensity in one day (default is 1 mm)

  • accumulation_multiplying_factor (int | float) – Factor to multiply SDII value for to identify an accumulation of rain recordings

  • accumulation_threshold (float) – Rain accumulation for detecting possible daily accumulations

Return type:

DataFrame

Returns:

:
data_w_daily_accumulation_flags

Data with daily accumulation flags

Notes

This method returns only 0 and 1 flags. This differs from the description of the daily accumulation check from IntenseQC. This decision was taken as the IntenseQC python package only returns 0 and 1 flags.

rainfallqc.checks.timeseries_checks.check_dry_period_cdd(data, target_gauge_col, time_res, gauge_lat, gauge_lon)[source]

Identify suspiciously long dry periods in time-series using the ETCCDI Consecutive Dry Days (CDD) index.

This is QC12 from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • time_res (str) – Temporal resolution of the time series either ‘15m’, ‘daily’ or ‘hourly’

  • gauge_lat (int | float) – latitude of the rain gauge

  • gauge_lon (int | float) – longitude of the rain gauge

Return type:

DataFrame

Returns:

:
data_w_dry_spell_flags

Data with dry spell flags

rainfallqc.checks.timeseries_checks.check_monthly_accumulations(data, target_gauge_col, gauge_lat, gauge_lon, min_dry_spell_duration_in_days=28, wet_day_threshold=1.0, accumulation_multiplying_factor=2.0, accumulation_threshold=None)[source]

Identify suspicious periods when an hour of rainfall is preceded by 1 month with no rain.

Flags two different types of accumulations: 1) dry, when the isolated high value 2) wet, when the isolated value is followed by a few more wet values

Uses a simple precipitation intensity index (SDII) from ETCCDI.

This is QC14 from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Daily or Hourly or 15 min rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • gauge_lat (int | float) – latitude of the rain gauge

  • gauge_lon (int | float) – longitude of the rain gauge

  • min_dry_spell_duration_in_days (int) – Minimum number of days in dry spell preceeding monthly accumulation (default is 28 i.e. Feb)

  • wet_day_threshold (int | float) – Threshold for rainfall intensity in one day (default is 1 mm)

  • accumulation_multiplying_factor (int | float) – Factor to multiply SDII value for to identify an accumulation of rain recordings (default is 2)

  • accumulation_threshold (float) – Rain accumulation for detecting possible monthly accumulations

Return type:

DataFrame

Returns:

:
data_w_monthly_accumulation_flags

Data with monthly accumulation flags

Notes

The original method filters out dry spells less than

rainfallqc.checks.timeseries_checks.check_streaks(data, target_gauge_col, gauge_lat, gauge_lon, smallest_measurable_rainfall_amount, accumulation_threshold=None)[source]

Check for suspected repeated values.

Flags (TODO: could change numbers as original includes unhelpful 2): 1, if streaks of 2 or more repeated values exceeding 2* mean wet day rainfall 3, if streaks of 12 or more greater than smallest measurable rainfall amount 4, if streaks of 24 or more greater than zero 5, if period of zeros bounded by streaks of >= 24

This is QC15 from the IntenseQC framework.

Parameters:
  • data (DataFrame) – Hourly or 15-min data with rainfall.

  • target_gauge_col (str) – Column with rainfall data.

  • gauge_lat (int | float) – latitude of the rain gauge.

  • gauge_lon (int | float) – longitude of the rain gauge.

  • smallest_measurable_rainfall_amount (float) – Resolution of rainfall data (i.e. minimum rainfall recording).

  • accumulation_threshold (float) – Rain accumulation for detecting possible monthly accumulations

Return type:

DataFrame

Returns:

:
data_w_streak_flags

Data with streak flags.

rainfallqc.checks.timeseries_checks.compute_dry_spell_days(dry_spell_data)[source]

Compute dry spells in days from ETCCDI Consecutive Dry Days data.

Parameters:

dry_spell_data (Dataset) – ETCCDI CDD index data

Return type:

Dataset

Returns:

:
dry_spell_days

ETCCDI CDD index data with CDD_days variable

rainfallqc.checks.timeseries_checks.fill_in_monthly_accumulation_flags(monthly_accumulation_flags, time_step, min_dry_spell_duration, max_dry_spell_duration)[source]

Fill in flags preceeding monthly accumulation.

Parameters:
  • monthly_accumulation_flags (DataFrame) – Rainfall data with monthly accumulation flag and dry spell info

  • time_step (str) – Time step of data i.e. ‘1h’, ‘1d’, ‘15m’.

  • min_dry_spell_duration (int | float) – Minimum dry spell duration

  • max_dry_spell_duration (int | float) – Maximum dry spell duration

Return type:

DataFrame

Returns:

:
monthly_accumulation_flags

Data with accumulation flag filled in

rainfallqc.checks.timeseries_checks.flag_accumulation_based_on_next_dry_spell_duration(data, min_dry_spell_duration, accumulation_col_name)[source]

Flag possible accumulation based on subsequent minimum dry spell duration.

Flags: 3, if dry spell followed with high value then wet period (wet) 1, if dry spell followed with high value then no rain for next 23 hours (dry) 0, if neither

Parameters:
  • data (DataFrame) – Rainfall data with dry spell info and possible accumulation label

  • min_dry_spell_duration (int | float) – Minimum dry spell duration

  • accumulation_col_name (str) – Name for accumulation column

Return type:

DataFrame

Returns:

:
data_w_flag

Data with accumulation flag

rainfallqc.checks.timeseries_checks.flag_accumulation_periods(data, target_gauge_col, accumulation_threshold, accumulation_period_in_hours)[source]

Flag accumulation in a given period of hourly data.

TODO: make work for daily using: DAILY_DIVIDING_FACTOR

Parameters:
  • data (DataFrame) – Hourly rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • accumulation_threshold (float) – Rain accumulation for detecting possible period accumulations

  • accumulation_period_in_hours (int) – Accumulation period in hours

Return type:

ndarray

Returns:

:
pa_flags

Accumulation flags

rainfallqc.checks.timeseries_checks.flag_dry_spell_duration(dry_spell_lengths, ref_dry_spell_length, time_res)[source]

Flag the dry spell duration using reference local dry spell length.

Parameters:
  • dry_spell_lengths (DataFrame) – Data with dry spell lengths

  • ref_dry_spell_length (int | float) – Reference dry spell length

  • time_res (str) – Temporal resolution of the time series either ‘daily’ or ‘hourly’

Return type:

DataFrame

Returns:

:
dry_spell_lengths_flags

Data with dry spell flags

rainfallqc.checks.timeseries_checks.flag_n_hours_accumulation_based_on_threshold(period_rain_vals, accumulation_threshold, n_hours)[source]

Flag a period as accumulation if a value is preceded by n hourly recordings of 0.

Parameters:
  • period_rain_vals (Series) – One period of rain values

  • accumulation_threshold (float) – Reference SDII threshold

  • n_hours (int) – Number of hours in reference period

Return type:

int | float

Returns:

:
flag

1 if period accumulation, otherwise 0

rainfallqc.checks.timeseries_checks.flag_streaks_exceeding_smallest_measurable_rainfall_amount(data, target_gauge_col, streak_length, smallest_measurable_rainfall_amount)[source]

Flag streaks exceeding smallest measurable rainfall amount in data.

Parameters:
  • data: – Rainfall data with streak_id..

  • target_gauge_col: – Column with rainfall data.

  • streak_length (int) – Only streaks longer than this will be considered

  • smallest_measurable_rainfall_amount: – Resolution of rainfall data (i.e. minimum rainfall recording).

  • data (DataFrame)

  • target_gauge_col (str)

  • smallest_measurable_rainfall_amount (float)

Return type:

DataFrame

Returns:

:
data_w_flags

Data with streak flag 3

rainfallqc.checks.timeseries_checks.flag_streaks_exceeding_wet_day_rainfall_threshold(data, target_gauge_col, streak_length, accumulation_threshold)[source]

Flag values exceeding wet day rainfall accumulation threshold.

Parameters:
  • data (DataFrame) – Rainfall data with streak_id..

  • target_gauge_col (str) – Column with rainfall data.

  • streak_length (int) – Only streaks longer than this will be considered

  • accumulation_threshold (float) – Threshold for rain accumulation.

Return type:

DataFrame

Returns:

:
data_w_flags

Data with streak flag 1

rainfallqc.checks.timeseries_checks.flag_streaks_exceeding_zero(data, target_gauge_col, streak_length)[source]

Flag values exceeding wet day rainfall accumulation threshold.

Parameters:
  • data (DataFrame) – Rainfall data with streak_id.

  • target_gauge_col (str) – Column with rainfall data.

  • streak_length (int) – Only streaks longer than this will be considered.

Return type:

DataFrame

Returns:

:
data_w_flags

Data with streak flag 4

rainfallqc.checks.timeseries_checks.flag_streaks_of_zero_bounded_by_days(data, target_gauge_col, time_res)[source]

Flag streak of zeros bounded by record that are a multiple of 24 hours.

Parameters:
  • data (DataFrame) – Hourly, 15-min or daily data with rainfall.

  • target_gauge_col (str) – Column with rainfall data.

  • time_res (str) – Time resolution: “1h”, “15m”, “1d”, or “hourly”, “daily”

Return type:

DataFrame

Returns:

:
streaks_w_flag5

Data with streak flag 5.

rainfallqc.checks.timeseries_checks.get_accumulation_threshold(etccdi_sdii, gauge_sdii, accumulation_multiplying_factor)[source]

Get rainfall accumulation threshold based on ETCCDI or rain gauge Standard Precipitation Intensity Index (index).

Parameters:
  • etccdi_sdii (float) – SDII value from ETCCDI

  • gauge_sdii (float) – SDII value from rain gauge

  • accumulation_multiplying_factor (int | float) – Factor to multiply to SDII value for to identify an accumulation of rain recordings

Return type:

float

Returns:

:
accumulation_threshold

Reference SDII threshold

rainfallqc.checks.timeseries_checks.get_accumulation_threshold_from_etccdi(data, target_gauge_col, time_res, gauge_lat, gauge_lon, wet_day_threshold, accumulation_multiplying_factor)[source]

Get rain accumulation threshold from ETCCDI data.

Parameters:
  • data (DataFrame) – Rainfall data.

  • target_gauge_col (str) – Column with rainfall data.

  • time_res (str) – Temporal resolution of the time series either ‘15m’, ‘daily’ or ‘hourly’

  • gauge_lat (int | float) – latitude of the rain gauge.

  • gauge_lon (int | float) – longitude of the rain gauge.

  • wet_day_threshold (float) – Threshold for rainfall intensity in one day (whether it is a wet day or not)

  • accumulation_multiplying_factor (float) – Factor to multiply SDII value for to identify an accumulation of rain recordings

Return type:

float

Returns:

:
accumulation_threshold

Rain accumulation threshold that is e.g. 2*standard precipitation intensity threshold

rainfallqc.checks.timeseries_checks.get_consecutive_dry_days(gauge_dry_spells)[source]

Get consecutive groups of 0 rainfall days.

Parameters:

gauge_dry_spells (DataFrame) – Data with ‘is_dry’ column

Return type:

DataFrame

Returns:

:
gauge_dry_spell_groups

Data with group ids for consecutive dry days

rainfallqc.checks.timeseries_checks.get_daily_non_wr_data(data, target_gauge_col, time_res)[source]

Get daily non-world record data.

Parameters:
  • data (DataFrame) – Hourly rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • time_res (str) – Temporal resolution of the time series either ‘15m’, ‘daily’ or ‘hourly

Return type:

DataFrame

Returns:

:
daily_data_not_wr

Daily rainfall data with world records filtered out

rainfallqc.checks.timeseries_checks.get_dry_spell_duration(data, target_gauge_col)[source]

Get consecutive dry spell duration.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

Return type:

DataFrame

Returns:

:
gauge_dry_spell_lengths

Data with dry spell start, end and duration

rainfallqc.checks.timeseries_checks.get_dry_spell_info(data, target_gauge_col)[source]

Get summary of dry spells (i.e. duration and first wet value after dry and previous and next dry spells duration).

Parameters:
  • data (DataFrame) – Hourly rainfall data

  • target_gauge_col (str) – Column with rainfall data

Return type:

DataFrame

Returns:

:
gauge_dry_spell_info

Data with dry spell information

rainfallqc.checks.timeseries_checks.get_first_wet_after_dry_spell(data, target_gauge_col)[source]

Get first non-zero rainfall value after dry spell.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

Return type:

DataFrame

Returns:

:
data_w_first_wet

Data with binary column denoting first wet after dry spell

rainfallqc.checks.timeseries_checks.get_local_etccdi_sdii_mean(gauge_lat, gauge_lon)[source]

Get the nearby ETCCDI Standard Precipitation Index mean SDII.

Parameters:
  • gauge_lat (int | float) – latitude of the rain gauge

  • gauge_lon (int | float) – longitude of the rain gauge

Return type:

float

Returns:

:
nearby_etccdi_sdii_mean

Local mean SDII value

rainfallqc.checks.timeseries_checks.get_possible_accumulations(gauge_dry_spell_info, target_gauge_col, accumulation_threshold)[source]

Get possible accumulations as 0 or 1 based on dry spell info.

Parameters:
  • gauge_dry_spell_info (DataFrame) – Rainfall data with columns with dry spell info (durations, first_wet_after_dry, etc.)

  • target_gauge_col (str) – Column with rainfall data

  • accumulation_threshold (float) – Threshold of rainfall intensity

Return type:

DataFrame

Returns:

:
gauge_data_possible_accumulations

Data with 1 is possible accumulation, otherwise 0.

rainfallqc.checks.timeseries_checks.get_streaks_above_threshold(data, target_gauge_col, streak_length, value_threshold)[source]

Get streak groups above given threshold.

Parameters:
  • data (DataFrame) – Rainfall data with streak_id..

  • target_gauge_col (str) – Column with rainfall data.

  • streak_length (int) – Minimum length of streaks.

  • value_threshold (int | float) – Threshold to check .

Return type:

DataFrame

Returns:

:
streaks_above_accumulation

Get all streaks above given value

rainfallqc.checks.timeseries_checks.get_streaks_of_repeated_values(data, data_col)[source]

Get streaks of repeated values in time series.

Parameters:
  • data (DataFrame) – Data with time column.

  • data_col (str) – Column with values to check streaks in.

Return type:

DataFrame

Returns:

:
streak_data

Data with streak column.

rainfallqc.checks.timeseries_checks.get_surrounding_dry_spell_lengths(data)[source]

Make prev_dry_spell and next_dry_spell columns from dry_spell_lengths.

Parameters:

data (DataFrame) – Data with dry_spell_lengths

Return type:

DataFrame

Returns:

:
data

Data with columns of previous and next dry spell durations

rainfallqc.checks.timeseries_checks.join_dry_spell_data_back_to_original(data, dry_spell_lengths_flags)[source]

Flag dry spell data using dry spell lengths.

Parameters:
  • data (DataFrame) – Rainfall data

  • dry_spell_lengths_flags (DataFrame) – Data with dry spell flags

Return type:

DataFrame

Returns:

:
dry_spell_flag_data

Data with dry spell flags

Functions

check_daily_accumulations(data, ...[, ...])

Identify suspicious periods where an hour of rainfall is preceded by 23 hours with no rain.

check_dry_period_cdd(data, target_gauge_col, ...)

Identify suspiciously long dry periods in time-series using the ETCCDI Consecutive Dry Days (CDD) index.

check_monthly_accumulations(data, ...[, ...])

Identify suspicious periods when an hour of rainfall is preceded by 1 month with no rain.

check_streaks(data, target_gauge_col, ...[, ...])

Check for suspected repeated values.

compute_dry_spell_days(dry_spell_data)

Compute dry spells in days from ETCCDI Consecutive Dry Days data.

fill_in_monthly_accumulation_flags(...)

Fill in flags preceeding monthly accumulation.

flag_accumulation_based_on_next_dry_spell_duration(...)

Flag possible accumulation based on subsequent minimum dry spell duration.

flag_accumulation_periods(data, ...)

Flag accumulation in a given period of hourly data.

flag_dry_spell_duration(dry_spell_lengths, ...)

Flag the dry spell duration using reference local dry spell length.

flag_n_hours_accumulation_based_on_threshold(...)

Flag a period as accumulation if a value is preceded by n hourly recordings of 0.

flag_streaks_exceeding_smallest_measurable_rainfall_amount(...)

Flag streaks exceeding smallest measurable rainfall amount in data.

flag_streaks_exceeding_wet_day_rainfall_threshold(...)

Flag values exceeding wet day rainfall accumulation threshold.

flag_streaks_exceeding_zero(data, ...)

Flag values exceeding wet day rainfall accumulation threshold.

flag_streaks_of_zero_bounded_by_days(data, ...)

Flag streak of zeros bounded by record that are a multiple of 24 hours.

get_accumulation_threshold(etccdi_sdii, ...)

Get rainfall accumulation threshold based on ETCCDI or rain gauge Standard Precipitation Intensity Index (index).

get_accumulation_threshold_from_etccdi(data, ...)

Get rain accumulation threshold from ETCCDI data.

get_consecutive_dry_days(gauge_dry_spells)

Get consecutive groups of 0 rainfall days.

get_daily_non_wr_data(data, ...)

Get daily non-world record data.

get_dry_spell_duration(data, target_gauge_col)

Get consecutive dry spell duration.

get_dry_spell_info(data, target_gauge_col)

Get summary of dry spells (i.e. duration and first wet value after dry and previous and next dry spells duration).

get_first_wet_after_dry_spell(data, ...)

Get first non-zero rainfall value after dry spell.

get_local_etccdi_sdii_mean(gauge_lat, gauge_lon)

Get the nearby ETCCDI Standard Precipitation Index mean SDII.

get_possible_accumulations(...)

Get possible accumulations as 0 or 1 based on dry spell info.

get_streaks_above_threshold(data, ...)

Get streak groups above given threshold.

get_streaks_of_repeated_values(data, data_col)

Get streaks of repeated values in time series.

get_surrounding_dry_spell_lengths(data)

Make prev_dry_spell and next_dry_spell columns from dry_spell_lengths.

join_dry_spell_data_back_to_original(data, ...)

Flag dry spell data using dry spell lengths.