rainfallqc.utils.data_utils¶
All data operations for polars including datetime and calendar functionality.
Classes and functions ordered alphabetically.
- rainfallqc.utils.data_utils.back_propagate_daily_data_flags(data, flag_column, num_days)[source]¶
Back fill-in flags a number of days.
This will prioritise higher flag values.
- Parameters:
data (
DataFrame) – Daily data with flag_columnflag_column (
str) – column with flagsnum_days: – Number of days to back-propagate
num_days (
int)
- Return type:
DataFrame- Returns:
- :
- data
Data with flags back-propogated
- rainfallqc.utils.data_utils.calculate_dry_spell_fraction(data, target_gauge_col, dry_period_days)[source]¶
Calculate dry spell fraction.
- Parameters:
data (
DataFrame) – Data with time columntarget_gauge_col (
str) – Column with rainfall datadry_period_days (
int) – Length for of a “dry_spell”
- Return type:
Series- Returns:
- :
- rain_daily_dry_day
Data with dry spell fraction
- rainfallqc.utils.data_utils.check_data_has_consistent_time_step(data)[source]¶
Check data has a consistent time step i.e. ‘1h’.
- Parameters:
data (
DataFrame) – Data with time column- Raises:
ValueError – If data has more than one time steps
- Return type:
None
- rainfallqc.utils.data_utils.check_data_is_monthly(data)[source]¶
Check data is monthly.
- Parameters:
data (
DataFrame) – Data with time column- Raises:
ValueError – If data has a no monthly time steps
- Return type:
None
- rainfallqc.utils.data_utils.check_data_is_specific_time_res(data, time_res)[source]¶
Check data has a hourly or daily time step.
Does not work for monthly data, please use ‘check_data_is_monthly’.
- Parameters:
data (
DataFrame) – Data with time column.time_res (
str|list) – Time resolutions either a single string or list of strings
- Raises:
ValueError – If data is not hourly or daily.
- Return type:
None
- rainfallqc.utils.data_utils.check_for_negative_values(df, target_gauge_col)[source]¶
Check if the target column contains any negative values.
- Parameters:
df (
DataFrame) – DataFrame to check.target_gauge_col (
str) – Column to check for negative values.
- Raises:
ValueError – If negative values are found in the target column.
- Return type:
bool
- rainfallqc.utils.data_utils.convert_daily_data_to_monthly(daily_data, rain_cols, perc_for_valid_month=95)[source]¶
Convert daily data to monthly whilst setting month to NaN if less than a given percentage of days is missing.
- Parameters:
daily_data (
DataFrame) – Daily data to convert to monthlyrain_cols (
list) – Columns with rainfall dataperc_for_valid_month (
int|float) – Percentage of month needed to be classed as a valid month for the monthly group by
- Return type:
DataFrame- Returns:
- :
- monthly_data
Monthly data
- rainfallqc.utils.data_utils.convert_datarray_seconds_to_days(series_seconds)[source]¶
Convert xarray series from seconds to days. For some reason the CDD data from ETCCDI is in seconds.
- Parameters:
series_seconds (
DataArray) – Data in series to convert to days.- Return type:
ndarray- Returns:
- :
- series_days
Data array converted to days.
- rainfallqc.utils.data_utils.downsample_and_fill_columns(high_res_data, low_res_data, data_cols, fill_limit, fill_method='backward', time_col='time')[source]¶
Join columns from lower resolution data to higher resolution data and fill gaps.
- Parameters:
high_res_data (
DataFrame) – Higher resolution data (e.g., 15-min)low_res_data (
DataFrame) – Lower resolution data with columns to join (e.g., hourly)data_cols (
str|list[str]) – Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”] - Regex pattern: “^rain.*$”fill_limit (
int) – Maximum number of intervals to fillfill_method (
str) – “forward”, “backward”, or “none”time_col (
str) – Name of time column (default: ‘time’)
- Return type:
DataFrame- Returns:
- :
- high_res_data_filled
High resolution data with filled columns
- rainfallqc.utils.data_utils.downsample_monthly_data(sub_monthly_data, monthly_data, data_cols, time_col='time')[source]¶
Join monthly data to hourly and fill only within same month.
- Parameters:
sub_monthly_data (
DataFrame) – Sub-monthly data (e.g., hourly)monthly_data (
DataFrame) – Monthly data with columns to joindata_cols (
str|list[str]) – Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”]time_col (
str) – Name of time column (default: ‘time’)
- Return type:
DataFrame- Returns:
- :
- result
Sub-monthly data with monthly columns joined and filled within month
- rainfallqc.utils.data_utils.extract_negative_values_from_data(data, cols_to_extract_from)[source]¶
Extract negative values from data.
- Parameters:
data (
DataFrame) – Rainfall data.cols_to_extract_from (
list) – Columns to extract negative values from
- Return type:
DataFrame- Returns:
- :
- data
Data with only negative values or 0.
- rainfallqc.utils.data_utils.extract_positive_values_from_data(data, cols_to_extract_from)[source]¶
Extract positive values from data.
- Parameters:
data (
DataFrame) – Rainfall data.cols_to_extract_from (
list) – Columns to extract positive values from
- Return type:
DataFrame- Returns:
- :
- data
Data with only positive values or 0.
- rainfallqc.utils.data_utils.format_timedelta_duration(td)[source]¶
Convert timedelta to custom strings.
- Parameters:
td (
timedelta) – Time delta to convert.- Return type:
str- Returns:
- :
- td
Human-readable timedelta string using largest unit (d, h, m, s).
- rainfallqc.utils.data_utils.get_data_timestep_as_str(data)[source]¶
Get time step of data.
- Parameters:
data (
DataFrame) – Data with time column- Return type:
str- Returns:
- :
- time_step
Time step of data i.e. ‘1h’, ‘1d’, ‘15m’.
- rainfallqc.utils.data_utils.get_data_timesteps(data)[source]¶
Get data timesteps. Ideally the data should have 1.
- Parameters:
data (
DataFrame) – Data with time column.- Return type:
Series- Returns:
- :
- unique_timesteps
All unique time steps in data (timedelta).
- rainfallqc.utils.data_utils.get_dry_period_proportions(dry_period_days)[source]¶
Get dry period proportions.
- Parameters:
dry_period_days (
int) – Length for of a “dry_spell” (default: 15 days)- Return type:
dict- Returns:
- :
- fraction_dry_days
Dictionary with keys “1”, “2”, “3” with dry spell fractions
- rainfallqc.utils.data_utils.get_dry_spells(data, target_gauge_col)[source]¶
Get dry spell column.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column with rainfall data
- Return type:
DataFrame- Returns:
- :
- data_w_dry_spells
Data with is_dry binary column
- rainfallqc.utils.data_utils.get_expected_days_in_month(data)[source]¶
Get expected number of days in a months within the data.
- Parameters:
data (
DataFrame) – Data with ‘year’ and ‘month’ columns- Return type:
DataFrame- Returns:
- :
- data:
Data with ‘expected_days_in_month” column
- rainfallqc.utils.data_utils.get_normalised_diff(data, target_col, other_col, diff_col_name)[source]¶
Ger normalised difference between two columns in data.
- Parameters:
data (
DataFrame) – Data with columnstarget_col (
str) – Target columnother_col (
str) – Other column.diff_col_name (
str) – New column name for difference column
- Return type:
DataFrame- Returns:
- :
- data_w_norm_diff
Data with normalised diff
- rainfallqc.utils.data_utils.make_month_and_year_col(data)[source]¶
Make year and month columns for polars dataframe.
- Parameters:
data (
DataFrame) – Data with time column- Return type:
DataFrame- Returns:
- :
- data
Data with year and month columns
- rainfallqc.utils.data_utils.normalise_data(data)[source]¶
Normalise data to [0, 1].
- Parameters:
data (
Series|Expr) – Data with time column.- Return type:
Series- Returns:
- :
- norm_data
Normalised data.
- rainfallqc.utils.data_utils.offset_data_by_time(data, target_col, offset_in_time, time_res)[source]¶
Shift/offset data either backwards or forwards in time.
- Parameters:
data (
DataFrame) – Data with column to offset in ‘time’target_col (
str) – Column of data to offsetoffset_in_time (
int) – Amount to offset data by i.e. 1 for 1 day if time_res set to ‘1d’time_res (
str) – Time resolution like ‘hourly’, ‘daily’, ‘1h’ or ‘1d’
- Return type:
DataFrame- Returns:
- :
- data
Offset data by ‘offset_in_time’ amount
- rainfallqc.utils.data_utils.replace_missing_vals_with_nan(data, target_gauge_col, missing_val=None)[source]¶
Replace no data value with numpy.nan.
- Parameters:
data (
DataFrame) – Rainfall datatarget_gauge_col (
str) – Column of rainfallmissing_val (
int) – Missing value identifier
- Return type:
DataFrame- Returns:
- :
- gsdr_data
GSDR data with missing values replaced
- rainfallqc.utils.data_utils.resample_data_by_time_step(data, rain_cols, time_col, time_step, min_count, hour_offset)[source]¶
Group hourly data into daily and check for at least 24 daily time steps per day.
- Parameters:
data (
DataFrame) – Rainfall data to resamplerain_cols (
List[str]) – List of column with rainfall datatime_col (
str) – Name of time columntime_step (
str) – Time step to resample into (e.g. ‘1d’ for daily, ‘1h’ for hourly, ‘15m’ for 15 minute)min_count (
int) – Minimum number of time steps needed per time periodhour_offset (
int) – Time offset in hours (needed if data is not aligned to midnight)
- Return type:
DataFrame- Returns:
- :
- resampled_data
Rainfall data grouped into a given time step
Functions¶
|
Back fill-in flags a number of days. |
|
Calculate dry spell fraction. |
Check data has a consistent time step i.e. '1h'. |
|
|
Check data is monthly. |
|
Check data has a hourly or daily time step. |
|
Check if the target column contains any negative values. |
|
Convert daily data to monthly whilst setting month to NaN if less than a given percentage of days is missing. |
|
Convert xarray series from seconds to days. |
|
Join columns from lower resolution data to higher resolution data and fill gaps. |
|
Join monthly data to hourly and fill only within same month. |
|
Extract negative values from data. |
|
Extract positive values from data. |
Convert timedelta to custom strings. |
|
|
Get time step of data. |
|
Get data timesteps. |
|
Get dry period proportions. |
|
Get dry spell column. |
Get expected number of days in a months within the data. |
|
|
Ger normalised difference between two columns in data. |
|
Make year and month columns for polars dataframe. |
|
Normalise data to [0, 1]. |
|
Shift/offset data either backwards or forwards in time. |
|
Replace no data value with numpy.nan. |
|
Group hourly data into daily and check for at least 24 daily time steps per day. |