rainfallqc.utils.neighbourhood_utils

All neighbourhood and nearby related operations.

rainfallqc.utils.neighbourhood_utils.compute_km_distances_from_target_id(gauge_network_metadata, target_id, station_id_col)[source]

Compute kilometre distances between gauges in network and target gauges.

Parameters:
  • gauge_network_metadata (DataFrame) – Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.

  • target_id (str) – Target gauge to compare against.

  • station_id_col (str) – Column name for station ID in gauge_network_metadata

Return type:

DataFrame

Returns:

:
neighbour_distances_df

Data of distances to a target gauge in kilometers

rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days(start_1, end_1, start_2, end_2)[source]

Compute temporal overlap in days.

Note: assumes that the data is contiguous.

Parameters:
  • start_1 (datetime) – Start time of timestamp 1

  • end_1 (datetime) – End time of timestamp 2

  • start_2 (datetime) – Start time of timestamp 2

  • end_2 (datetime) – End time of timestamp 2

Return type:

int

Returns:

:
overlap_days

Days that overlap between the two timestamps

rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days_from_target_id(gauge_network_metadata, target_id, station_id_col, start_datetime_col, end_datetime_col)[source]

Compute overlap in days between target gauges and its neighbours.

Note: assumes that the data is contiguous.

Parameters:
  • gauge_network_metadata (DataFrame) – Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.

  • target_id (str) – Target gauge to compare against.

  • station_id_col (str) – Column name for station ID in gauge_network_metadata

  • start_datetime_col (str) – Column name for start datetime in gauge_network_metadata

  • end_datetime_col (str) – Column name for end datetime in gauge_network_metadata

Return type:

DataFrame

Returns:

:
neighbour_overlap_days_df

Neighbouring gauges with overlap days to target gauge.

rainfallqc.utils.neighbourhood_utils.get_ids_of_n_nearest_overlapping_neighbouring_gauges(gauge_network_metadata, target_id, distance_threshold, n_closest, min_overlap_days, station_id_col='station_id', start_datetime_col='start_datetime', end_datetime_col='end_datetime')[source]

Get gauge IDs of nearest n time-overlapping neighbouring gauges.

Parameters:
  • gauge_network_metadata (DataFrame) – Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.

  • target_id (str) – Target gauge to compare against.

  • distance_threshold (int | float) – Threshold for maximum distance considered

  • n_closest (int) – Number of closest neighbours.

  • min_overlap_days (int) – Minimum overlap between target and neighbouring gauges

  • station_id_col (str) – Column name for station ID in gauge_network_metadata (default ‘station_id’)

  • start_datetime_col (str) – Column name for start datetime in gauge_network_metadata (default ‘start_datetime’)

  • end_datetime_col (str) – Column name for end datetime in gauge_network_metadata (default ‘end_datetime’)

Return type:

list

Returns:

:
neighbouring_gauge_id

IDs of neighbouring gauges within a given distance to target and min overlapping days

rainfallqc.utils.neighbourhood_utils.get_n_closest_neighbours(neighbour_distances_df, distance_threshold, n_closest)[source]

Get closest neighbours from neighbour distances data.

Will return more than number of n_closest if there is multiple values that are equal at that index. Will not return values that are 0 dist away.

Parameters:
  • neighbour_distances_df (DataFrame) – Data of distances to a target gauge

  • distance_threshold (int | float) – Threshold for maximum distance considered

  • n_closest (int) – Number of closest neighbours.

Return type:

DataFrame

Returns:

:
n_closest_neighbour_df

Data of n_closest neighbours

rainfallqc.utils.neighbourhood_utils.get_nearest_non_nan_etccdi_val_to_gauge(etccdi_data, etccdi_name, gauge_lat, gauge_lon, max_distance_km=500)[source]

Get the value at the nearest non-nan ETCCDI grid cell to the gauge coordinates.

Parameters:
  • etccdi_data (Dataset) – ETCCDI data with given variable to check

  • etccdi_name (str) – ETCCDI variable name to check

  • gauge_lat (int | float) – latitude of the rain gauge

  • gauge_lon (int | float) – longitude of the rain gauge

  • max_distance_km (int | float) – Maximum distance in km to search for a non-nan value (default 500 km)

Return type:

Dataset

Returns:

:
nearby_etccdi_data

ETCCDI data at the nearest grid cell with non-nan values

rainfallqc.utils.neighbourhood_utils.get_neighbours_with_min_overlap_days(neighbour_overlap_days_df, min_overlap_days)[source]

Get neighbours around gauge at least min_overlap_days of overlapping time steps.

Note: assumes that the data is contiguous.

Parameters:
  • neighbour_overlap_days_df (DataFrame) – Neighbouring gauges with overlap days to target gauge.

  • min_overlap_days (int) – Minimum overlap between target and neighbouring gauges

Return type:

DataFrame

Returns:

:
neighbour_overlap_days_df

Neighbouring gauges with at least min_overlap_days overlap days.

rainfallqc.utils.neighbourhood_utils.get_rain_not_minima_column(data, target_col, other_col)[source]

Get rain not equal to minima column.

Combines two functions for getting non_zero_minima i.e. 0.1 and then get ‘rain_not_minima’

Parameters:
  • data (DataFrame) – Rainfall data

  • target_col (str) – Target rainfall column

  • other_col (str) – Other rainfall column

Return type:

DataFrame

Returns:

:
data_w_minima_col

Rainfall data with rain is minima column

rainfallqc.utils.neighbourhood_utils.get_target_neighbour_non_zero_minima(data, target_col, other_col, default_minima=0.1)[source]

Get minimum non-zero value in rainfall data between target and neighbour.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_col (str) – Target rainfall column

  • other_col (str) – Other rainfall column

  • default_minima (float) – Default minimum to use for non-zero value

Return type:

float

Returns:

:
non_zero_minima

Minimum non-zero value.

rainfallqc.utils.neighbourhood_utils.make_rain_not_minima_column_target_or_neighbour(data, target_col, other_col, data_minima)[source]

Get rain values that are not minima rainfall for target or neighbour.

Parameters:
  • data (DataFrame) – Rainfall data

  • target_col (str) – Target rainfall column

  • other_col (str) – Other rainfall column

  • data_minima (float) – Data minimum (i.e. lowest non-zero value)

Return type:

DataFrame

Returns:

:
data

Rainfall data with “rain_not_minima” column

Functions

compute_km_distances_from_target_id(...)

Compute kilometre distances between gauges in network and target gauges.

compute_temporal_overlap_days(start_1, ...)

Compute temporal overlap in days.

compute_temporal_overlap_days_from_target_id(...)

Compute overlap in days between target gauges and its neighbours.

get_ids_of_n_nearest_overlapping_neighbouring_gauges(...)

Get gauge IDs of nearest n time-overlapping neighbouring gauges.

get_n_closest_neighbours(...)

Get closest neighbours from neighbour distances data.

get_nearest_non_nan_etccdi_val_to_gauge(...)

Get the value at the nearest non-nan ETCCDI grid cell to the gauge coordinates.

get_neighbours_with_min_overlap_days(...)

Get neighbours around gauge at least min_overlap_days of overlapping time steps.

get_rain_not_minima_column(data, target_col, ...)

Get rain not equal to minima column.

get_target_neighbour_non_zero_minima(data, ...)

Get minimum non-zero value in rainfall data between target and neighbour.

make_rain_not_minima_column_target_or_neighbour(...)

Get rain values that are not minima rainfall for target or neighbour.