rainfallqc.checks.gauge_checks¶
Quality control checks examining suspicious rain gauges.
Gauge checks are defined as QC checks that: “detect abnormalities in summary and descriptive statistics of rain gauges.”
Classes and functions ordered by appearance in IntenseQC framework.
- rainfallqc.checks.gauge_checks.check_breakpoints(data, target_gauge_col, p_threshold=0.01)[source]¶
Use a Pettitt test rainfall data to check for breakpoints.
This is QC6 from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Rainfall data.target_gauge_col (
str) – Column with rainfall data.p_threshold (
float) – Significance level for the test.
- Return type:
int- Returns:
- :
- flagint
1 if breakpoint is detected (p < p_threshold), 0 otherwise
- rainfallqc.checks.gauge_checks.check_intermittency(data, target_gauge_col, no_data_threshold=2, annual_count_threshold=5)[source]¶
Return years where more than five periods of missing data are bounded by zeros.
TODO: split into multiple sub-functions and write more tests! This is QC5 from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datano_data_threshold (
int) – Number of missing values needed to be counted as a no data period (default: 2 (days))annual_count_threshold (
int) – Number of missing data periods above no_data_threshold per year (default: 5)
- Return type:
list- Returns:
- :
- years_w_intermittency
List of years with intermittency issues.
- rainfallqc.checks.gauge_checks.check_min_val_change(data, target_gauge_col, expected_min_val)[source]¶
Return years when the minimum recorded value changes.
Used to determine whether there are possible changes to the measuring equipment. This is QC7 from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall data.expected_min_val (
float) – Expected value of rainfall i.e. basically the resolution of data.
- Return type:
list- Returns:
- :
- yr_list
List of years with minimum value changes.
- rainfallqc.checks.gauge_checks.check_temporal_bias(data, target_gauge_col, time_granularity, p_threshold=0.01)[source]¶
Perform a two-sided t-test on the distribution of mean rainfall over time slices.
This check performs less well when using less data.
This is QC3 (day of week bias) and QC4 (hour-of-day bias) from the IntenseQC framework.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datatime_granularity (
str) – Temporal grouping, either ‘weekday’ or ‘hour’p_threshold (
float) – Significance level for the test
- Return type:
int- Returns:
- :
- flagint
1 if bias is detected (p < threshold), 0 otherwise
- rainfallqc.checks.gauge_checks.check_years_where_annual_kth_largest_value_is_zero(data, target_gauge_col, k)[source]¶
Return list of years where the k-th largest value is 0.
This is QC2 from the IntenseQC framework
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datak (
int) – Number of the largest values to take for a given year i.e. k==5 is top 5
- Return type:
list- Returns:
- :
- year_list
List of years where k-largest value is zero.
- rainfallqc.checks.gauge_checks.check_years_where_nth_percentile_is_zero(data, target_gauge_col, percentile)[source]¶
Return years where the n-th percentiles is zero.
This is QC1 from the IntenseQC framework
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datapercentile (
float) – Between 1 & 100
- Return type:
list- Returns:
- :
- year_list
List of years where n-th percentile is zero.
Functions¶
|
Use a Pettitt test rainfall data to check for breakpoints. |
|
Return years where more than five periods of missing data are bounded by zeros. |
|
Return years when the minimum recorded value changes. |
|
Perform a two-sided t-test on the distribution of mean rainfall over time slices. |
Return list of years where the k-th largest value is 0. |
|
Return years where the n-th percentiles is zero. |