TimeFrame

class time_stream.TimeFrame(df, time_name, resolution=None, offset=None, periodicity=None, time_anchor=TimeAnchor.START, on_duplicates=DuplicateOption.ERROR)[source]

A class representing a time series data model, with data held in a Polars DataFrame.

Parameters:
  • df (DataFrame) – The polars.DataFrame containing the time-series data.

  • time_name (str) – The name of the time column in df.

  • resolution (Period | str | None) – Sampling interval for the timeseries; the unit of time step allowable between consecutive data points. Accepts a Period or ISO-8601 duration string (e.g. "PT15M", "P1D", "P1Y"). If None, defaults to microsecond step (PT0.000001S) (effectively allows any set of datetime values).

  • offset (str | None) – Offset applied from the natural boundary of resolution to position the datetime values along the timeline. For example, you may have daily data (resolution="P1D"), but all the values are measured at 9:00am, an offset of 9 hours ("+T9H") from the natural boundary of midnight 00:00. Accepts an offset string, following the principles of ISO-8601 but replacing the “P” with a “+” (e.g. "+T9H", “+9MT9H”). If None, no offset is applied.

  • periodicity (Period | str | None) – Defines the allowed “frequency” of datetimes in your timeseries, i.e., how many datetime entries are allowed within a given period of time. For example, you may have an annual maximum timeseries, where the individual data points are considered to be at daily resolution (resolution="P1D"), but are limited to only one data point per year (periodicity="P1Y"). Accepts a Period or ISO-8601 duration string (e.g. "PT15M", "P1D", "P1Y") with an optional offset syntax (e.g. "P1D+T9H", "P1Y+9MT9H"). If None, it defaults to the period defined by resolution + offset.

  • time_anchor (TimeAnchor | str) –

    Defines the window of time over which a given timestamp refers to. In the descriptions below, “t” is the time value, “r” stands for a single unit of the resolution of the data:

    • POINT: The time stamp is anchored for the instant of time “t”. A value at “t” is considered valid only for the instant of time “t”.

    • START: The time stamp is anchored starting at “t”. A value at “t” is considered valid starting at “t” (inclusive) and ending at “t+r” (exclusive).

    • END: The time stamp is anchored ending at “t”. A value at “t” is considered valid starting at “t-r” (exclusive) and ending at “t” (inclusive)

  • on_duplicates (DuplicateOption | str) –

    What to do if duplicate rows are found in the data:

    • ERROR (default): Raise error

    • KEEP_FIRST: Keep the first row of any duplicate groups.

    • KEEP_LAST: Keep the last row of any duplicate groups.

    • DROP: Drop all duplicate rows.

    • MERGE: Merge duplicate rows using coalesce (the first non-null value for each column takes precedence)

Examples

>>> # Simple 15 minute timeseries:
>>> tf = TimeFrame(
>>>     df, "timestamp", resolution="PT15M"
>>> )
>>> print(
>>>     "resolution=", tf.resolution,
>>>     " alignment=", tf.alignment,
>>>     " periodicity=", tf.periodicity
>>> )
resoution=PT15M alignment=PT15M periodicity=PT15M
>>> # Daily water day (09:00 to 09:00) with default uniqueness per water day:
>>>
>>> tf = TimeFrame(
>>>     df, "timestamp", resolution="P1D", offset="+T9H"
>>> )
>>> print(
>>>     "resolution=", tf.resolution,
>>>     " alignment=", tf.alignment,
>>>     " periodicity=", tf.periodicity
>>> )
resoution=P1D alignment=P1D+T9H periodicity=P1D+T9H
>>> # Daily timestamps but uniqueness per water-year:
>>>
>>> tf = TimeFrame(
>>>     df, "timestamp", resolution="P1D", offset="+T9H", periodicity="P1Y+P9MT9H"
>>> )
>>> print(
>>>     "resolution=", tf.resolution,
>>>     " alignment=", tf.alignment,
>>>     " periodicity=", tf.periodicity
>>> )
resoution=P1D alignment=P1D+T9H periodicity=P1Y+P9MT9H
>>> # Annual series stored directly on water-year boundary:
>>>
>>> tf = TimeFrame(
>>>     df, "timestamp", resolution="P1Y", offset="+9MT9H"
>>> )
>>> print(
>>>     "resolution=", tf.resolution,
>>>     " alignment=", tf.alignment,
>>>     " periodicity=", tf.periodicity
>>> )
resoution=P1Y alignment=P1D+9MT9H periodicity=P1Y+P9MT9H

Attributes

df

The underlying Polars DataFrame containing the timeseries data.

resolution

The resolution of the timeseries data within the TimeFrame

offset

The offset of the time steps within the TimeFrame

alignment

The alignment of the time steps within the TimeFrame

periodicity

The periodicity of the timeseries data within the TimeFrame

time_anchor

The time anchor of the timeseries data within the TimeFrame

time_name

The name of the primary datetime column in the underlying TimeFrame DataFrame.

columns

All column labels of the DataFrame within the TimeFrame.

flag_columns

Only the labels for any flag columns within the TimeFrame.

data_columns

Only the labels for the data columns within the TimeFrame.

metadata

TimeFrame-level metadata.

column_metadata

Per-column metadata.

Methods

Builders

with_df

Return a new TimeFrame with a new DataFrame, checking the integrity of the time values hasn't been compromised between the old and new TimeFrame.

with_periodicity

Return a new TimeFrame, with a new periodicity registered.

with_metadata

Return a new TimeFrame with TimeFrame-level metadata.

with_column_metadata

Return a new TimeFrame with column-level metadata.

with_flag_system

Return a new TimeFrame, with a flag system registered.

General

sort_time

Sort the TimeFrame DataFrame by the time column.

pad

Pad the time series with missing datetime rows, filling in NULLs for missing values.

select

Return a new TimeFrame instance to include only the specified columns.

Operations

aggregate

Apply an aggregation function to a column in this TimeFrame, check the aggregation satisfies user requirements and return a new derived TimeFrame containing the aggregated data.

infill

Apply an infilling method to a column in the TimeFrame to fill in missing data.

qc_check

Apply a quality control check to the TimeFrame.

Flagging

register_flag_system

Register a named flag system with the internal flag manager.

get_flag_system

Return a registered flag system.

register_flag_column

Mark the specified existing column as a flag column.

init_flag_column

Add a new column to the TimeFrame DataFrame, setting it as a Flag Column.

get_flag_column

Look up a registered flag column by name.

add_flag

Add flag value (if not there) to flag column, where expression is True.

remove_flag

Remove flag value (if there) from flag column.