rainfallqc.utils.data_readers

Data loading tools.

Classes for reading rain gauge network data at bottom of file.

class rainfallqc.utils.data_readers.GPCCNetworkReader(path_to_gpcc_dir, time_res, file_format='.zip', unzipped_file_format='.dat')[source]

Bases: GaugeNetworkReader

GPCC rain gauge network reader.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

load_network_data(data_paths, target_gauge_col)

Load GPCC network data based on file paths.

Parameters:
  • path_to_gpcc_dir (str)

  • time_res (str)

  • file_format (str)

  • unzipped_file_format (str)

load_network_data(data_paths, target_gauge_col, missing_val=-999.9)[source]

Load GPCC network data based on file paths.

Parameters:
  • data_paths (Union[List[str], ndarray[str]]) – Paths to load network data from.

  • target_gauge_col (str) – Rainfall data column

  • missing_val (int | float) – Missing value (default: -999)

Return type:

DataFrame

Returns:

:
network_data

Dataframe of GPCC gauges.

class rainfallqc.utils.data_readers.GSDRNetworkReader(path_to_gsdr_dir, file_format='.txt')[source]

Bases: GaugeNetworkReader

GSDR rain gauge network reader.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

load_network_data(rain_col_prefix, data_paths)

Load GSDR network data based on file paths.

Parameters:
  • path_to_gsdr_dir (str)

  • file_format (str)

load_network_data(rain_col_prefix, data_paths, suffix_only=False, gsdr_header_rows=20)[source]

Load GSDR network data based on file paths.

Parameters:
  • data_paths (Union[List[str], ndarray[str]]) – Paths to load network data from.

  • rain_col_prefix (str) – Prefix for rain column name (default is ‘rain’)

  • suffix_only (bool) – Override to only include the suffix e.g. if the column name is the ID)

  • gsdr_header_rows (int) – Number of rows to skip in the header of the GSDR data (default=20)

Return type:

DataFrame

Returns:

:
network_data

Dataframe of GSDR gauges.

class rainfallqc.utils.data_readers.GaugeNetworkReader(path_to_gauge_network)[source]

Bases: ABC

Base class for reading rain gauge networks.

Methods

get_nearest_overlapping_neighbours_to_target(...)

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

Parameters:

path_to_gauge_network (str)

get_nearest_overlapping_neighbours_to_target(target_id, distance_threshold, n_closest, min_overlap_days)[source]

Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.

Parameters:
  • target_id (str) – Target gauge to get neighbour IDs of

  • distance_threshold (int | float) – Distance threshold to check for neighbours

  • n_closest (int) – Number of nearest neighbours to return

  • min_overlap_days (int) – Minimum time overlap between neighbours to return

Return type:

set

Returns:

:
neighbouring_gauge_id

IDs of neighbouring gauges within a given distance to target and min overlapping days

rainfallqc.utils.data_readers.add_datetime_to_gsdr_data(gsdr_data, gsdr_metadata, multiplying_factor)[source]

Add datetime column to GSDR gauge data using metadata from that gauge.

NOTE: Could maybe extend so can find metadata if not provided?

Parameters:
  • gsdr_data (DataFrame) – GSDR data

  • gsdr_metadata (dict) – Metadata from GSDR file

  • multiplying_factor (int | float) – Factor to multiply the data by.

Return type:

DataFrame

Returns:

:
gsdr_data

GSDR data with datetime column added

rainfallqc.utils.data_readers.convert_gsdr_metadata_dates_to_datetime(gsdr_metadata)[source]

Convert GSDR metadata date string column to datetime.

Parameters:

gsdr_metadata (dict) – Metadata from GSDR file

Return type:

dict

Returns:

:
gsdr_metadatadict
Metadata from GSDR file with start and end date column

rainfallqc.utils.data_readers.get_paths_using_gauge_ids(gauge_ids, dir_path, file_format, time_res=None)[source]

Get data path of Gauge IDs.

Parameters:
  • gauge_ids (Union[List[str], ndarray[str]]) – Array of gauge IDs

  • dir_path (str) – Path to data directory

  • file_format (str) – Format of files in directory.

  • time_res (str) – Time resolution (e.g. ‘mw’ or ‘tw’)

Return type:

dict

Returns:

:
gauge_paths

Dictionary of gauge ID and path

rainfallqc.utils.data_readers.load_etccdi_data(etccdi_var, path_to_etccdi=None)[source]

Load ETCCDI data.

Parameters:
  • etccdi_var (str) – variable to load from ETCCDI

  • path_to_etccdi (str) – path to ETCCDI data (default is location of data in tests)

Return type:

Dataset

Returns:

:
etccdi_data

Loaded data

rainfallqc.utils.data_readers.load_gpcc_gauge_network_metadata(path_to_gpcc_dir, time_res, gpcc_file_format='.dat')[source]

Load metadata from GPCC gauges from a directory.

Parameters:
  • path_to_gpcc_dir (str) – Path to directory with GPCC gauges

  • time_res (str) – Time resolution (e.g. ‘mw’ or ‘tw’)

  • gpcc_file_format (str) – Format of file (default is .dat)

Return type:

DataFrame

Returns:

:
all_station_metadata

All GPCC gauges metadata as one dataframe.

rainfallqc.utils.data_readers.load_gsdr_gauge_network_metadata(path_to_gsdr_dir, file_format='.txt')[source]

Load metadata from GSDR gauges from a directory.

Parameters:
  • path_to_gsdr_dir (str) – Path to directory with GSDR gauges

  • file_format (str) – Format of file (default is .txt)

Return type:

DataFrame

Returns:

:
all_station_metadata

All GSDR gauges metadata as one dataframe.

rainfallqc.utils.data_readers.read_gpcc_data_from_zip(data_path, gpcc_file_name, target_gauge_col, time_res, hour_offset=7, missing_val=-999)[source]

Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files.

Parameters:
  • data_path (str) – path to GPCC zip file

  • gpcc_file_name (str) – Name of GPCC file within zip

  • target_gauge_col (str) – Name of rainfall column

  • time_res (str) – ‘daily’ or ‘monthly’

  • hour_offset (int) – Hours to offset grouped data by (default is 7)

  • missing_val (int | float) – Missing value (default: -999)

Return type:

DataFrame

Returns:

:
gpcc_datadict

Data from GPCC file

rainfallqc.utils.data_readers.read_gpcc_metadata_from_zip(data_path, time_res, gpcc_file_format='.dat')[source]

Read GPCC metadata from zip file.

Parameters:
  • data_path (str) – path to GPCC zip file.

  • time_res (str) – Time resolution of data (e.g. daily or monthly)

  • gpcc_file_format (str) – Default GPCC file format (default: .dat)

Return type:

dict

Returns:

:
metadata

Metadata from GPCC file

rainfallqc.utils.data_readers.read_gsdr_data_from_file(data_path, raw_data_time_res, rain_col_prefix=None, rain_col_suffix=None, suffix_only=False, gsdr_header_rows=20)[source]

Read GSDR data from file.

Note: this was developed on the GSDR data available from IntenseQC. So it needs a number of header rows in data.

Parameters:
  • data_path (str) – Path to GSDR data file

  • raw_data_time_res (str) – Time resolution of data record i.e. ‘hourly’ or ‘daily’

  • rain_col_prefix (str) – Prefix for column for target_gauge_col (set as None by default)

  • rain_col_suffix (str) – Suffix for column name for target_gauge_col (set as None by default)

  • suffix_only (bool) – Override to only include the suffix e.g. if the column name is the ID)

  • gsdr_header_rows (int) – Number of rows to skip in the header of the GSDR data (default=20)

Return type:

DataFrame

Returns:

:
gsdr_data

GSDR data as Pandas DataFrame

rainfallqc.utils.data_readers.read_gsdr_metadata(data_path)[source]

Read the specific format and header of Global Sub-Daily Rainfall (GSDR) files.

Parameters:

data_path (str) – path to GSDR data file (.txt)

Return type:

dict

Returns:

:
metadata

Metadata from GSDR file

Classes

GPCCNetworkReader(path_to_gpcc_dir, time_res)

GPCC rain gauge network reader.

GSDRNetworkReader(path_to_gsdr_dir[, ...])

GSDR rain gauge network reader.

GaugeNetworkReader(path_to_gauge_network)

Base class for reading rain gauge networks.

Functions

add_datetime_to_gsdr_data(gsdr_data, ...)

Add datetime column to GSDR gauge data using metadata from that gauge.

convert_gsdr_metadata_dates_to_datetime(...)

Convert GSDR metadata date string column to datetime.

get_paths_using_gauge_ids(gauge_ids, ...[, ...])

Get data path of Gauge IDs.

load_etccdi_data(etccdi_var[, path_to_etccdi])

Load ETCCDI data.

load_gpcc_gauge_network_metadata(...[, ...])

Load metadata from GPCC gauges from a directory.

load_gsdr_gauge_network_metadata(...[, ...])

Load metadata from GSDR gauges from a directory.

read_gpcc_data_from_zip(data_path, ...[, ...])

Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files.

read_gpcc_metadata_from_zip(data_path, time_res)

Read GPCC metadata from zip file.

read_gsdr_data_from_file(data_path, ...[, ...])

Read GSDR data from file.

read_gsdr_metadata(data_path)

Read the specific format and header of Global Sub-Daily Rainfall (GSDR) files.