rainfallqc.checks.timeseries_checks¶
Quality control checks based on suspicious time-series artefacts.
Time-series checks are defined as QC checks that: “detect abnormalities in patterns of the data record.”
Classes and functions ordered by appearance in IntenseQC framework.
- rainfallqc.checks.timeseries_checks.check_daily_accumulations(data, target_gauge_col, gauge_lat, gauge_lon, wet_day_threshold=1.0, accumulation_multiplying_factor=2.0, accumulation_threshold=None)[source]¶
Identify suspicious periods where an hour of rainfall is preceded by 23 hours with no rain.
Uses a simple precipitation intensity index (SDII) from ETCCDI.
This is QC13 from the IntenseQC framework.
Please see ‘Notes’ below for any additional information about the implementation of this method.
- Parameters:
data (
DataFrame) – Hourly or 15-min rainfall datatarget_gauge_col (
str) – Column with rainfall datagauge_lat (
int|float) – latitude of the rain gaugegauge_lon (
int|float) – longitude of the rain gaugewet_day_threshold (
int|float) – Threshold for rainfall intensity in one day (default is 1 mm)accumulation_multiplying_factor (
int|float) – Factor to multiply SDII value for to identify an accumulation of rain recordingsaccumulation_threshold (
float) – Rain accumulation for detecting possible daily accumulations
- Return type:
DataFrame- Returns:
- :
- data_w_daily_accumulation_flags
Data with daily accumulation flags
Notes
This method returns only 0 and 1 flags. This differs from the description of the daily accumulation check from IntenseQC. This decision was taken as the IntenseQC python package only returns 0 and 1 flags.
- rainfallqc.checks.timeseries_checks.check_dry_period_cdd(data, target_gauge_col, time_res, gauge_lat, gauge_lon)[source]¶
Identify suspiciously long dry periods in time-series using the ETCCDI Consecutive Dry Days (CDD) index.
This is QC12 from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datatime_res (
str) – Temporal resolution of the time series either ‘15m’, ‘daily’ or ‘hourly’gauge_lat (
int|float) – latitude of the rain gaugegauge_lon (
int|float) – longitude of the rain gauge
- Return type:
DataFrame- Returns:
- :
- data_w_dry_spell_flags
Data with dry spell flags
- rainfallqc.checks.timeseries_checks.check_monthly_accumulations(data, target_gauge_col, gauge_lat, gauge_lon, min_dry_spell_duration_in_days=28, wet_day_threshold=1.0, accumulation_multiplying_factor=2.0, accumulation_threshold=None)[source]¶
Identify suspicious periods when an hour of rainfall is preceded by 1 month with no rain.
Flags two different types of accumulations: 1) dry, when the isolated high value 2) wet, when the isolated value is followed by a few more wet values
Uses a simple precipitation intensity index (SDII) from ETCCDI.
This is QC14 from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Daily or Hourly or 15 min rainfall datatarget_gauge_col (
str) – Column with rainfall datagauge_lat (
int|float) – latitude of the rain gaugegauge_lon (
int|float) – longitude of the rain gaugemin_dry_spell_duration_in_days (
int) – Minimum number of days in dry spell preceeding monthly accumulation (default is 28 i.e. Feb)wet_day_threshold (
int|float) – Threshold for rainfall intensity in one day (default is 1 mm)accumulation_multiplying_factor (
int|float) – Factor to multiply SDII value for to identify an accumulation of rain recordings (default is 2)accumulation_threshold (
float) – Rain accumulation for detecting possible monthly accumulations
- Return type:
DataFrame- Returns:
- :
- data_w_monthly_accumulation_flags
Data with monthly accumulation flags
Notes
The original method filters out dry spells less than
- rainfallqc.checks.timeseries_checks.check_streaks(data, target_gauge_col, gauge_lat, gauge_lon, smallest_measurable_rainfall_amount, accumulation_threshold=None)[source]¶
Check for suspected repeated values.
Flags (TODO: could change numbers as original includes unhelpful 2): 1, if streaks of 2 or more repeated values exceeding 2* mean wet day rainfall 3, if streaks of 12 or more greater than smallest measurable rainfall amount 4, if streaks of 24 or more greater than zero 5, if period of zeros bounded by streaks of >= 24
This is QC15 from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Hourly or 15-min data with rainfall.target_gauge_col (
str) – Column with rainfall data.gauge_lat (
int|float) – latitude of the rain gauge.gauge_lon (
int|float) – longitude of the rain gauge.smallest_measurable_rainfall_amount (
float) – Resolution of rainfall data (i.e. minimum rainfall recording).accumulation_threshold (
float) – Rain accumulation for detecting possible monthly accumulations
- Return type:
DataFrame- Returns:
- :
- data_w_streak_flags
Data with streak flags.
- rainfallqc.checks.timeseries_checks.compute_dry_spell_days(dry_spell_data)[source]¶
Compute dry spells in days from ETCCDI Consecutive Dry Days data.
- Parameters:
dry_spell_data (
Dataset) – ETCCDI CDD index data- Return type:
Dataset- Returns:
- :
- dry_spell_days
ETCCDI CDD index data with CDD_days variable
- rainfallqc.checks.timeseries_checks.fill_in_monthly_accumulation_flags(monthly_accumulation_flags, time_step, min_dry_spell_duration, max_dry_spell_duration)[source]¶
Fill in flags preceeding monthly accumulation.
- Parameters:
monthly_accumulation_flags (
DataFrame) – Rainfall data with monthly accumulation flag and dry spell infotime_step (
str) – Time step of data i.e. ‘1h’, ‘1d’, ‘15m’.min_dry_spell_duration (
int|float) – Minimum dry spell durationmax_dry_spell_duration (
int|float) – Maximum dry spell duration
- Return type:
DataFrame- Returns:
- :
- monthly_accumulation_flags
Data with accumulation flag filled in
- rainfallqc.checks.timeseries_checks.flag_accumulation_based_on_next_dry_spell_duration(data, min_dry_spell_duration, accumulation_col_name)[source]¶
Flag possible accumulation based on subsequent minimum dry spell duration.
Flags: 3, if dry spell followed with high value then wet period (wet) 1, if dry spell followed with high value then no rain for next 23 hours (dry) 0, if neither
- Parameters:
data (
DataFrame) – Rainfall data with dry spell info and possible accumulation labelmin_dry_spell_duration (
int|float) – Minimum dry spell durationaccumulation_col_name (
str) – Name for accumulation column
- Return type:
DataFrame- Returns:
- :
- data_w_flag
Data with accumulation flag
- rainfallqc.checks.timeseries_checks.flag_accumulation_periods(data, target_gauge_col, accumulation_threshold, accumulation_period_in_hours)[source]¶
Flag accumulation in a given period of hourly data.
TODO: make work for daily using: DAILY_DIVIDING_FACTOR
- Parameters:
data (
DataFrame) – Hourly rainfall datatarget_gauge_col (
str) – Column with rainfall dataaccumulation_threshold (
float) – Rain accumulation for detecting possible period accumulationsaccumulation_period_in_hours (
int) – Accumulation period in hours
- Return type:
ndarray- Returns:
- :
- pa_flags
Accumulation flags
- rainfallqc.checks.timeseries_checks.flag_dry_spell_duration(dry_spell_lengths, ref_dry_spell_length, time_res)[source]¶
Flag the dry spell duration using reference local dry spell length.
- Parameters:
dry_spell_lengths (
DataFrame) – Data with dry spell lengthsref_dry_spell_length (
int|float) – Reference dry spell lengthtime_res (
str) – Temporal resolution of the time series either ‘daily’ or ‘hourly’
- Return type:
DataFrame- Returns:
- :
- dry_spell_lengths_flags
Data with dry spell flags
- rainfallqc.checks.timeseries_checks.flag_n_hours_accumulation_based_on_threshold(period_rain_vals, accumulation_threshold, n_hours)[source]¶
Flag a period as accumulation if a value is preceded by n hourly recordings of 0.
- Parameters:
period_rain_vals (
Series) – One period of rain valuesaccumulation_threshold (
float) – Reference SDII thresholdn_hours (
int) – Number of hours in reference period
- Return type:
int|float- Returns:
- :
- flag
1 if period accumulation, otherwise 0
- rainfallqc.checks.timeseries_checks.flag_streaks_exceeding_smallest_measurable_rainfall_amount(data, target_gauge_col, streak_length, smallest_measurable_rainfall_amount)[source]¶
Flag streaks exceeding smallest measurable rainfall amount in data.
- Parameters:
data: – Rainfall data with streak_id..
target_gauge_col: – Column with rainfall data.
streak_length (
int) – Only streaks longer than this will be consideredsmallest_measurable_rainfall_amount: – Resolution of rainfall data (i.e. minimum rainfall recording).
data (
DataFrame)target_gauge_col (
str)smallest_measurable_rainfall_amount (
float)
- Return type:
DataFrame- Returns:
- :
- data_w_flags
Data with streak flag 3
- rainfallqc.checks.timeseries_checks.flag_streaks_exceeding_wet_day_rainfall_threshold(data, target_gauge_col, streak_length, accumulation_threshold)[source]¶
Flag values exceeding wet day rainfall accumulation threshold.
- Parameters:
data (
DataFrame) – Rainfall data with streak_id..target_gauge_col (
str) – Column with rainfall data.streak_length (
int) – Only streaks longer than this will be consideredaccumulation_threshold (
float) – Threshold for rain accumulation.
- Return type:
DataFrame- Returns:
- :
- data_w_flags
Data with streak flag 1
- rainfallqc.checks.timeseries_checks.flag_streaks_exceeding_zero(data, target_gauge_col, streak_length)[source]¶
Flag values exceeding wet day rainfall accumulation threshold.
- Parameters:
data (
DataFrame) – Rainfall data with streak_id.target_gauge_col (
str) – Column with rainfall data.streak_length (
int) – Only streaks longer than this will be considered.
- Return type:
DataFrame- Returns:
- :
- data_w_flags
Data with streak flag 4
- rainfallqc.checks.timeseries_checks.flag_streaks_of_zero_bounded_by_days(data, target_gauge_col, time_res)[source]¶
Flag streak of zeros bounded by record that are a multiple of 24 hours.
- Parameters:
data (
DataFrame) – Hourly, 15-min or daily data with rainfall.target_gauge_col (
str) – Column with rainfall data.time_res (
str) – Time resolution: “1h”, “15m”, “1d”, or “hourly”, “daily”
- Return type:
DataFrame- Returns:
- :
- streaks_w_flag5
Data with streak flag 5.
- rainfallqc.checks.timeseries_checks.get_accumulation_threshold(etccdi_sdii, gauge_sdii, accumulation_multiplying_factor)[source]¶
Get rainfall accumulation threshold based on ETCCDI or rain gauge Standard Precipitation Intensity Index (index).
- Parameters:
etccdi_sdii (
float) – SDII value from ETCCDIgauge_sdii (
float) – SDII value from rain gaugeaccumulation_multiplying_factor (
int|float) – Factor to multiply to SDII value for to identify an accumulation of rain recordings
- Return type:
float- Returns:
- :
- accumulation_threshold
Reference SDII threshold
- rainfallqc.checks.timeseries_checks.get_accumulation_threshold_from_etccdi(data, target_gauge_col, time_res, gauge_lat, gauge_lon, wet_day_threshold, accumulation_multiplying_factor)[source]¶
Get rain accumulation threshold from ETCCDI data.
- Parameters:
data (
DataFrame) – Rainfall data.target_gauge_col (
str) – Column with rainfall data.time_res (
str) – Temporal resolution of the time series either ‘15m’, ‘daily’ or ‘hourly’gauge_lat (
int|float) – latitude of the rain gauge.gauge_lon (
int|float) – longitude of the rain gauge.wet_day_threshold (
float) – Threshold for rainfall intensity in one day (whether it is a wet day or not)accumulation_multiplying_factor (
float) – Factor to multiply SDII value for to identify an accumulation of rain recordings
- Return type:
float- Returns:
- :
- accumulation_threshold
Rain accumulation threshold that is e.g. 2*standard precipitation intensity threshold
- rainfallqc.checks.timeseries_checks.get_consecutive_dry_days(gauge_dry_spells)[source]¶
Get consecutive groups of 0 rainfall days.
- Parameters:
gauge_dry_spells (
DataFrame) – Data with ‘is_dry’ column- Return type:
DataFrame- Returns:
- :
- gauge_dry_spell_groups
Data with group ids for consecutive dry days
- rainfallqc.checks.timeseries_checks.get_daily_non_wr_data(data, target_gauge_col, time_res)[source]¶
Get daily non-world record data.
- Parameters:
data (
DataFrame) – Hourly rainfall datatarget_gauge_col (
str) – Column with rainfall datatime_res (
str) – Temporal resolution of the time series either ‘15m’, ‘daily’ or ‘hourly
- Return type:
DataFrame- Returns:
- :
- daily_data_not_wr
Daily rainfall data with world records filtered out
- rainfallqc.checks.timeseries_checks.get_dry_spell_duration(data, target_gauge_col)[source]¶
Get consecutive dry spell duration.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall data
- Return type:
DataFrame- Returns:
- :
- gauge_dry_spell_lengths
Data with dry spell start, end and duration
- rainfallqc.checks.timeseries_checks.get_dry_spell_info(data, target_gauge_col)[source]¶
Get summary of dry spells (i.e. duration and first wet value after dry and previous and next dry spells duration).
- Parameters:
data (
DataFrame) – Hourly rainfall datatarget_gauge_col (
str) – Column with rainfall data
- Return type:
DataFrame- Returns:
- :
- gauge_dry_spell_info
Data with dry spell information
- rainfallqc.checks.timeseries_checks.get_first_wet_after_dry_spell(data, target_gauge_col)[source]¶
Get first non-zero rainfall value after dry spell.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall data
- Return type:
DataFrame- Returns:
- :
- data_w_first_wet
Data with binary column denoting first wet after dry spell
- rainfallqc.checks.timeseries_checks.get_local_etccdi_sdii_mean(gauge_lat, gauge_lon)[source]¶
Get the nearby ETCCDI Standard Precipitation Index mean SDII.
- Parameters:
gauge_lat (
int|float) – latitude of the rain gaugegauge_lon (
int|float) – longitude of the rain gauge
- Return type:
float- Returns:
- :
- nearby_etccdi_sdii_mean
Local mean SDII value
- rainfallqc.checks.timeseries_checks.get_possible_accumulations(gauge_dry_spell_info, target_gauge_col, accumulation_threshold)[source]¶
Get possible accumulations as 0 or 1 based on dry spell info.
- Parameters:
gauge_dry_spell_info (
DataFrame) – Rainfall data with columns with dry spell info (durations, first_wet_after_dry, etc.)target_gauge_col (
str) – Column with rainfall dataaccumulation_threshold (
float) – Threshold of rainfall intensity
- Return type:
DataFrame- Returns:
- :
- gauge_data_possible_accumulations
Data with 1 is possible accumulation, otherwise 0.
- rainfallqc.checks.timeseries_checks.get_streaks_above_threshold(data, target_gauge_col, streak_length, value_threshold)[source]¶
Get streak groups above given threshold.
- Parameters:
data (
DataFrame) – Rainfall data with streak_id..target_gauge_col (
str) – Column with rainfall data.streak_length (
int) – Minimum length of streaks.value_threshold (
int|float) – Threshold to check .
- Return type:
DataFrame- Returns:
- :
- streaks_above_accumulation
Get all streaks above given value
- rainfallqc.checks.timeseries_checks.get_streaks_of_repeated_values(data, data_col)[source]¶
Get streaks of repeated values in time series.
- Parameters:
data (
DataFrame) – Data with time column.data_col (
str) – Column with values to check streaks in.
- Return type:
DataFrame- Returns:
- :
- streak_data
Data with streak column.
- rainfallqc.checks.timeseries_checks.get_surrounding_dry_spell_lengths(data)[source]¶
Make prev_dry_spell and next_dry_spell columns from dry_spell_lengths.
- Parameters:
data (
DataFrame) – Data with dry_spell_lengths- Return type:
DataFrame- Returns:
- :
- data
Data with columns of previous and next dry spell durations
- rainfallqc.checks.timeseries_checks.join_dry_spell_data_back_to_original(data, dry_spell_lengths_flags)[source]¶
Flag dry spell data using dry spell lengths.
- Parameters:
data (
DataFrame) – Rainfall datadry_spell_lengths_flags (
DataFrame) – Data with dry spell flags
- Return type:
DataFrame- Returns:
- :
- dry_spell_flag_data
Data with dry spell flags
Functions¶
|
Identify suspicious periods where an hour of rainfall is preceded by 23 hours with no rain. |
|
Identify suspiciously long dry periods in time-series using the ETCCDI Consecutive Dry Days (CDD) index. |
|
Identify suspicious periods when an hour of rainfall is preceded by 1 month with no rain. |
|
Check for suspected repeated values. |
|
Compute dry spells in days from ETCCDI Consecutive Dry Days data. |
Fill in flags preceeding monthly accumulation. |
|
Flag possible accumulation based on subsequent minimum dry spell duration. |
|
|
Flag accumulation in a given period of hourly data. |
|
Flag the dry spell duration using reference local dry spell length. |
Flag a period as accumulation if a value is preceded by n hourly recordings of 0. |
|
|
Flag streaks exceeding smallest measurable rainfall amount in data. |
Flag values exceeding wet day rainfall accumulation threshold. |
|
|
Flag values exceeding wet day rainfall accumulation threshold. |
|
Flag streak of zeros bounded by record that are a multiple of 24 hours. |
|
Get rainfall accumulation threshold based on ETCCDI or rain gauge Standard Precipitation Intensity Index (index). |
|
Get rain accumulation threshold from ETCCDI data. |
|
Get consecutive groups of 0 rainfall days. |
|
Get daily non-world record data. |
|
Get consecutive dry spell duration. |
|
Get summary of dry spells (i.e. duration and first wet value after dry and previous and next dry spells duration). |
|
Get first non-zero rainfall value after dry spell. |
|
Get the nearby ETCCDI Standard Precipitation Index mean SDII. |
Get possible accumulations as 0 or 1 based on dry spell info. |
|
|
Get streak groups above given threshold. |
|
Get streaks of repeated values in time series. |
Make prev_dry_spell and next_dry_spell columns from dry_spell_lengths. |
|
|
Flag dry spell data using dry spell lengths. |