rainfallqc.utils.stats

Statistical tests and other indices for rainfall data quality control.

Classes and functions ordered alphabetically.

rainfallqc.utils.stats.affinity_index(data, binary_col, return_match_and_diff=False)[source]

Calculate affinity index from binary column.

Parameters:
  • data (DataFrame) – Rainfall data

  • binary_col (str) – Column with binary data

  • return_match_and_diff (bool) – Whether to return count of matching and difference columns as well as affinity index.

Return type:

tuple | float

Returns:

:
affinity

Affinity index.

rainfallqc.utils.stats.dry_spell_fraction(rain_daily, target_gauge_col, dry_period_days)[source]

Make dry spell fraction column.

Parameters:
  • rain_daily (DataFrame) – Single time-step of rainfall data with ‘dry_day’ column

  • target_gauge_col (str) – Column with Rainfall data

  • dry_period_days (int) – Dry periods window in days

Return type:

Series

Returns:

:
rain_daily_w_dry_spell_fraction

Single row with dry spell fraction column

rainfallqc.utils.stats.factor_diff(data, target_col, other_col)[source]

Compute factor diff for polars.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_col (str) – Target column to compute factor diff for

  • other_col (str) – Other column to compute factor diff for

Return type:

DataFrame

Returns:

:
data_w_factor_diff

Data with factor diff

rainfallqc.utils.stats.filter_out_rain_world_records(data, target_gauge_col, time_res)[source]

Filter out rain world records based on time resolution.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • time_res (str) – Temporal resolution of the time series either ‘daily’ or ‘hourly’

Return type:

DataFrame

Returns:

:
data_not_wr

Data without rain world records

rainfallqc.utils.stats.fit_expon_and_get_percentile(series, percentiles)[source]

Fit exponential to data series and then get percentile using PPF.

Parameters:
  • series (Series) – Data series to fit exponential distribution.

  • percentiles (list[float]) – Percentiles (between 0-1) to evaluate on the fitted exponential distribution

Return type:

dict[float, float]

Returns:

:
expon_percentiles

Threshold at percentile of fitted distribution

rainfallqc.utils.stats.gauge_correlation(data, target_col, other_col)[source]

Calculate correlation between rain gauge data columns.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_col (str) – Target rainfall column

  • other_col (str) – Other rainfall column

Return type:

float

Returns:

:
corr_coef

Correlation coefficient.

rainfallqc.utils.stats.get_rainfall_world_records()[source]

Return rainfall world record as of 29/04/25.

See: - http://www.nws.noaa.gov/oh/hdsc/record_precip/record_precip_world.html - http://www.bom.gov.au/water/designRainfalls/rainfallEvents/worldRecRainfall.shtml - https://wmo.asu.edu/content/world-meteorological-organization-global-weather-climate-extremes-archive

Return type:

dict[str, float]

Returns:

:
rwr

rainfall world records set in stats.py

rainfallqc.utils.stats.percentage_diff(target, other)[source]

Percentage difference between target and other column.

Parameters:
  • target: – Target data to compare other too

  • other: – Other data

  • target (Expr)

  • other (Expr)

Return type:

Series

Returns:

:
perc_diff:

Percentage difference

rainfallqc.utils.stats.pettitt_test(arr)[source]

Pettitt test for detecting a change point in a time series.

Calculated following Pettitt (1979): https://www.jstor.org/stable/2346729?seq=4#metadata_info_tab_contents.

TAKEN FROM: https://stackoverflow.com/questions/58537876/how-to-run-standard-normal-homogeneity-test-for-a-time-series-data.

Parameters:

arr (Series | ndarray) – The input time series data.

Return type:

(int | float, int | float)

Returns:

:
tauint

Index of the change point (first point of the second segment).

pfloat

p-value for the test statistic.

rainfallqc.utils.stats.simple_precip_intensity_index(data, target_gauge_col, wet_threshold)[source]

Calculate simple precipitation intensity index.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_gauge_col (str) – Column with rainfall data

  • wet_threshold (int | float) – Threshold for rainfall intensity in given time period

Return type:

float

Returns:

:
sdii_val

Simple precipitation intensity index

Functions

affinity_index(data, binary_col[, ...])

Calculate affinity index from binary column.

dry_spell_fraction(rain_daily, ...)

Make dry spell fraction column.

factor_diff(data, target_col, other_col)

Compute factor diff for polars.

filter_out_rain_world_records(data, ...)

Filter out rain world records based on time resolution.

fit_expon_and_get_percentile(series, percentiles)

Fit exponential to data series and then get percentile using PPF.

gauge_correlation(data, target_col, other_col)

Calculate correlation between rain gauge data columns.

get_rainfall_world_records()

Return rainfall world record as of 29/04/25.

percentage_diff(target, other)

Percentage difference between target and other column.

pettitt_test(arr)

Pettitt test for detecting a change point in a time series.

simple_precip_intensity_index(data, ...)

Calculate simple precipitation intensity index.