rainfallqc.utils.stats¶
Statistical tests and other indices for rainfall data quality control.
Classes and functions ordered alphabetically.
- rainfallqc.utils.stats.affinity_index(data, binary_col, return_match_and_diff=False)[source]¶
Calculate affinity index from binary column.
- Parameters:
data (
DataFrame) – Rainfall databinary_col (
str) – Column with binary datareturn_match_and_diff (
bool) – Whether to return count of matching and difference columns as well as affinity index.
- Return type:
tuple|float- Returns:
- :
- affinity
Affinity index.
- rainfallqc.utils.stats.dry_spell_fraction(rain_daily, target_gauge_col, dry_period_days)[source]¶
Make dry spell fraction column.
- Parameters:
rain_daily (
DataFrame) – Single time-step of rainfall data with ‘dry_day’ columntarget_gauge_col (
str) – Column with Rainfall datadry_period_days (
int) – Dry periods window in days
- Return type:
Series- Returns:
- :
- rain_daily_w_dry_spell_fraction
Single row with dry spell fraction column
- rainfallqc.utils.stats.factor_diff(data, target_col, other_col)[source]¶
Compute factor diff for polars.
- Parameters:
data (
DataFrame) – Rainfall datatarget_col (
str) – Target column to compute factor diff forother_col (
str) – Other column to compute factor diff for
- Return type:
DataFrame- Returns:
- :
- data_w_factor_diff
Data with factor diff
- rainfallqc.utils.stats.filter_out_rain_world_records(data, target_gauge_col, time_res)[source]¶
Filter out rain world records based on time resolution.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datatime_res (
str) – Temporal resolution of the time series either ‘daily’ or ‘hourly’
- Return type:
DataFrame- Returns:
- :
- data_not_wr
Data without rain world records
- rainfallqc.utils.stats.fit_expon_and_get_percentile(series, percentiles)[source]¶
Fit exponential to data series and then get percentile using PPF.
- Parameters:
series (
Series) – Data series to fit exponential distribution.percentiles (
list[float]) – Percentiles (between 0-1) to evaluate on the fitted exponential distribution
- Return type:
dict[float,float]- Returns:
- :
- expon_percentiles
Threshold at percentile of fitted distribution
- rainfallqc.utils.stats.gauge_correlation(data, target_col, other_col)[source]¶
Calculate correlation between rain gauge data columns.
- Parameters:
data (
DataFrame) – Rainfall datatarget_col (
str) – Target rainfall columnother_col (
str) – Other rainfall column
- Return type:
float- Returns:
- :
- corr_coef
Correlation coefficient.
- rainfallqc.utils.stats.get_rainfall_world_records()[source]¶
Return rainfall world record as of 29/04/25.
See: - http://www.nws.noaa.gov/oh/hdsc/record_precip/record_precip_world.html - http://www.bom.gov.au/water/designRainfalls/rainfallEvents/worldRecRainfall.shtml - https://wmo.asu.edu/content/world-meteorological-organization-global-weather-climate-extremes-archive
- Return type:
dict[str,float]- Returns:
- :
- rwr
rainfall world records set in stats.py
- rainfallqc.utils.stats.percentage_diff(target, other)[source]¶
Percentage difference between target and other column.
- Parameters:
target: – Target data to compare other too
other: – Other data
target (
Expr)other (
Expr)
- Return type:
Series- Returns:
- :
- perc_diff:
Percentage difference
- rainfallqc.utils.stats.pettitt_test(arr)[source]¶
Pettitt test for detecting a change point in a time series.
Calculated following Pettitt (1979): https://www.jstor.org/stable/2346729?seq=4#metadata_info_tab_contents.
TAKEN FROM: https://stackoverflow.com/questions/58537876/how-to-run-standard-normal-homogeneity-test-for-a-time-series-data.
- Parameters:
arr (
Series|ndarray) – The input time series data.- Return type:
(
int|float,int|float)- Returns:
- :
- tauint
Index of the change point (first point of the second segment).
- pfloat
p-value for the test statistic.
- rainfallqc.utils.stats.simple_precip_intensity_index(data, target_gauge_col, wet_threshold)[source]¶
Calculate simple precipitation intensity index.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall datawet_threshold (
int|float) – Threshold for rainfall intensity in given time period
- Return type:
float- Returns:
- :
- sdii_val
Simple precipitation intensity index
Functions¶
|
Calculate affinity index from binary column. |
|
Make dry spell fraction column. |
|
Compute factor diff for polars. |
|
Filter out rain world records based on time resolution. |
|
Fit exponential to data series and then get percentile using PPF. |
|
Calculate correlation between rain gauge data columns. |
Return rainfall world record as of 29/04/25. |
|
|
Percentage difference between target and other column. |
|
Pettitt test for detecting a change point in a time series. |
|
Calculate simple precipitation intensity index. |