rainfallqc.utils.data_readers¶
Data loading tools.
Classes for reading rain gauge network data at bottom of file.
- class rainfallqc.utils.data_readers.GPCCNetworkReader(path_to_gpcc_dir, time_res, file_format='.zip', unzipped_file_format='.dat')[source]¶
Bases:
GaugeNetworkReaderGPCC rain gauge network reader.
Methods
get_nearest_overlapping_neighbours_to_target(...)Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
load_network_data(data_paths, target_gauge_col)Load GPCC network data based on file paths.
- Parameters:
path_to_gpcc_dir (str)
time_res (str)
file_format (str)
unzipped_file_format (str)
- load_network_data(data_paths, target_gauge_col, missing_val=-999.9)[source]¶
Load GPCC network data based on file paths.
- Parameters:
data_paths (
Union[List[str],ndarray[str]]) – Paths to load network data from.target_gauge_col (
str) – Rainfall data columnmissing_val (
int|float) – Missing value (default: -999)
- Return type:
DataFrame- Returns:
- :
- network_data
Dataframe of GPCC gauges.
- class rainfallqc.utils.data_readers.GSDRNetworkReader(path_to_gsdr_dir, file_format='.txt')[source]¶
Bases:
GaugeNetworkReaderGSDR rain gauge network reader.
Methods
get_nearest_overlapping_neighbours_to_target(...)Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
load_network_data(rain_col_prefix, data_paths)Load GSDR network data based on file paths.
- Parameters:
path_to_gsdr_dir (str)
file_format (str)
- load_network_data(rain_col_prefix, data_paths, suffix_only=False, gsdr_header_rows=20)[source]¶
Load GSDR network data based on file paths.
- Parameters:
data_paths (
Union[List[str],ndarray[str]]) – Paths to load network data from.rain_col_prefix (
str) – Prefix for rain column name (default is ‘rain’)suffix_only (
bool) – Override to only include the suffix e.g. if the column name is the ID)gsdr_header_rows (
int) – Number of rows to skip in the header of the GSDR data (default=20)
- Return type:
DataFrame- Returns:
- :
- network_data
Dataframe of GSDR gauges.
- class rainfallqc.utils.data_readers.GaugeNetworkReader(path_to_gauge_network)[source]¶
Bases:
ABCBase class for reading rain gauge networks.
Methods
Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
- Parameters:
path_to_gauge_network (str)
- get_nearest_overlapping_neighbours_to_target(target_id, distance_threshold, n_closest, min_overlap_days)[source]¶
Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
- Parameters:
target_id (
str) – Target gauge to get neighbour IDs ofdistance_threshold (
int|float) – Distance threshold to check for neighboursn_closest (
int) – Number of nearest neighbours to returnmin_overlap_days (
int) – Minimum time overlap between neighbours to return
- Return type:
set- Returns:
- :
- neighbouring_gauge_id
IDs of neighbouring gauges within a given distance to target and min overlapping days
- rainfallqc.utils.data_readers.add_datetime_to_gsdr_data(gsdr_data, gsdr_metadata, multiplying_factor)[source]¶
Add datetime column to GSDR gauge data using metadata from that gauge.
NOTE: Could maybe extend so can find metadata if not provided?
- Parameters:
gsdr_data (
DataFrame) – GSDR datagsdr_metadata (
dict) – Metadata from GSDR filemultiplying_factor (
int|float) – Factor to multiply the data by.
- Return type:
DataFrame- Returns:
- :
- gsdr_data
GSDR data with datetime column added
- rainfallqc.utils.data_readers.convert_gsdr_metadata_dates_to_datetime(gsdr_metadata)[source]¶
Convert GSDR metadata date string column to datetime.
- Parameters:
gsdr_metadata (
dict) – Metadata from GSDR file- Return type:
dict- Returns:
- :
- gsdr_metadatadict
- Metadata from GSDR file with start and end date column
- rainfallqc.utils.data_readers.get_paths_using_gauge_ids(gauge_ids, dir_path, file_format, time_res=None)[source]¶
Get data path of Gauge IDs.
- Parameters:
gauge_ids (
Union[List[str],ndarray[str]]) – Array of gauge IDsdir_path (
str) – Path to data directoryfile_format (
str) – Format of files in directory.time_res (
str) – Time resolution (e.g. ‘mw’ or ‘tw’)
- Return type:
dict- Returns:
- :
- gauge_paths
Dictionary of gauge ID and path
- rainfallqc.utils.data_readers.load_etccdi_data(etccdi_var, path_to_etccdi=None)[source]¶
Load ETCCDI data.
- Parameters:
etccdi_var (
str) – variable to load from ETCCDIpath_to_etccdi (
str) – path to ETCCDI data (default is location of data in tests)
- Return type:
Dataset- Returns:
- :
- etccdi_data
Loaded data
- rainfallqc.utils.data_readers.load_gpcc_gauge_network_metadata(path_to_gpcc_dir, time_res, gpcc_file_format='.dat')[source]¶
Load metadata from GPCC gauges from a directory.
- Parameters:
path_to_gpcc_dir (
str) – Path to directory with GPCC gaugestime_res (
str) – Time resolution (e.g. ‘mw’ or ‘tw’)gpcc_file_format (
str) – Format of file (default is .dat)
- Return type:
DataFrame- Returns:
- :
- all_station_metadata
All GPCC gauges metadata as one dataframe.
- rainfallqc.utils.data_readers.load_gsdr_gauge_network_metadata(path_to_gsdr_dir, file_format='.txt')[source]¶
Load metadata from GSDR gauges from a directory.
- Parameters:
path_to_gsdr_dir (
str) – Path to directory with GSDR gaugesfile_format (
str) – Format of file (default is .txt)
- Return type:
DataFrame- Returns:
- :
- all_station_metadata
All GSDR gauges metadata as one dataframe.
- rainfallqc.utils.data_readers.read_gpcc_data_from_zip(data_path, gpcc_file_name, target_gauge_col, time_res, hour_offset=7, missing_val=-999)[source]¶
Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files.
- Parameters:
data_path (
str) – path to GPCC zip filegpcc_file_name (
str) – Name of GPCC file within ziptarget_gauge_col (
str) – Name of rainfall columntime_res (
str) – ‘daily’ or ‘monthly’hour_offset (
int) – Hours to offset grouped data by (default is 7)missing_val (
int|float) – Missing value (default: -999)
- Return type:
DataFrame- Returns:
- :
- gpcc_datadict
Data from GPCC file
- rainfallqc.utils.data_readers.read_gpcc_metadata_from_zip(data_path, time_res, gpcc_file_format='.dat')[source]¶
Read GPCC metadata from zip file.
- Parameters:
data_path (
str) – path to GPCC zip file.time_res (
str) – Time resolution of data (e.g. daily or monthly)gpcc_file_format (
str) – Default GPCC file format (default: .dat)
- Return type:
dict- Returns:
- :
- metadata
Metadata from GPCC file
- rainfallqc.utils.data_readers.read_gsdr_data_from_file(data_path, raw_data_time_res, rain_col_prefix=None, rain_col_suffix=None, suffix_only=False, gsdr_header_rows=20)[source]¶
Read GSDR data from file.
Note: this was developed on the GSDR data available from IntenseQC. So it needs a number of header rows in data.
- Parameters:
data_path (
str) – Path to GSDR data fileraw_data_time_res (
str) – Time resolution of data record i.e. ‘hourly’ or ‘daily’rain_col_prefix (
str) – Prefix for column for target_gauge_col (set as None by default)rain_col_suffix (
str) – Suffix for column name for target_gauge_col (set as None by default)suffix_only (
bool) – Override to only include the suffix e.g. if the column name is the ID)gsdr_header_rows (
int) – Number of rows to skip in the header of the GSDR data (default=20)
- Return type:
DataFrame- Returns:
- :
- gsdr_data
GSDR data as Pandas DataFrame
- rainfallqc.utils.data_readers.read_gsdr_metadata(data_path)[source]¶
Read the specific format and header of Global Sub-Daily Rainfall (GSDR) files.
- Parameters:
data_path (
str) – path to GSDR data file (.txt)- Return type:
dict- Returns:
- :
- metadata
Metadata from GSDR file
Classes¶
|
GPCC rain gauge network reader. |
|
GSDR rain gauge network reader. |
|
Base class for reading rain gauge networks. |
Functions¶
|
Add datetime column to GSDR gauge data using metadata from that gauge. |
Convert GSDR metadata date string column to datetime. |
|
|
Get data path of Gauge IDs. |
|
Load ETCCDI data. |
|
Load metadata from GPCC gauges from a directory. |
|
Load metadata from GSDR gauges from a directory. |
|
Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files. |
|
Read GPCC metadata from zip file. |
|
Read GSDR data from file. |
|
Read the specific format and header of Global Sub-Daily Rainfall (GSDR) files. |