Quick start

Create and work with timeseries data using the Time-Stream package.

import time_stream as ts

Create a TimeFrame

Create sample data in a Polars DataFrame:

from datetime import datetime, timedelta

import polars as pl

dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20.5, 21.0, 19.1, 26.0, 24.2, 26.6, 28.4, 30.9, 31.0, 29.1]
precipitation = [0.0, 0.0, 5.1, 10.2, 2.0, 0.2, 0.0, 3.0, 1.6, 0.0]

df = pl.DataFrame({"time": dates, "temperature": temperatures, "precipitation": precipitation})

Now wrap the Polars DataFrame in a TimeFrame, which adds specialized functionality for time series operations:

tf = ts.TimeFrame(
    df=df,
    time_name="time",  # Specify which column contains the primary datetime values
)

With Time Properties

The TimeFrame object can configure important properties about the time aspect of your data. More information about these properties and concepts can be found on the concepts page page.

Here, we will show some basic usage of these time properties.

Periodicity, Resolution and Time Anchor

Without specifying resolution and periodicity, the default initialisation sets these properties to 1 microsecond, to account for any set of datetime values. The time anchor property is set to start:

print(tf.resolution)
print(tf.periodicity)
print(tf.time_anchor)
PT0.000001S
PT0.000001S
TimeAnchor.START

Although the default of 1 microsecond will account for any datetime values, for more control over certain time series functionality it is important to specify the actual resolution and periodicity if known. These properties can be provided as an ISO 8601 duration string like P1D (1 day) or PT15M (15 minutes).

The time anchor property can be set to start, end, or point.

Again, more detail can be found on the concepts page page about all these properties.

tf = ts.TimeFrame(
    df=df,
    time_name="time",
    resolution="P1D",  # Each timestamp is at day precision
    periodicity="P1D",  # Data points are spaced 1 day apart
    time_anchor="end",
)

print(tf.resolution)
print(tf.periodicity)
print(tf.time_anchor)
P1D
P1D
TimeAnchor.END
shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ f64         ┆ f64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20.5        ┆ 0.0           │
│ 2023-01-02 00:00:00 ┆ 21.0        ┆ 0.0           │
│ 2023-01-03 00:00:00 ┆ 19.1        ┆ 5.1           │
│ 2023-01-04 00:00:00 ┆ 26.0        ┆ 10.2          │
│ 2023-01-05 00:00:00 ┆ 24.2        ┆ 2.0           │
│ 2023-01-06 00:00:00 ┆ 26.6        ┆ 0.2           │
│ 2023-01-07 00:00:00 ┆ 28.4        ┆ 0.0           │
│ 2023-01-08 00:00:00 ┆ 30.9        ┆ 3.0           │
│ 2023-01-09 00:00:00 ┆ 31.0        ┆ 1.6           │
│ 2023-01-10 00:00:00 ┆ 29.1        ┆ 0.0           │
└─────────────────────┴─────────────┴───────────────┘

Duplicate Detection

TimeFrame automatically checks for rows with duplicates in the specified time column. You have control over what the model should do when it detects rows with duplicate time values. Consider this DataFrame with duplicate time values:

shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ null          │
│ 2023-01-01 00:00:00 ┆ null        ┆ 0             │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ 28          ┆ null          │
│ 2023-06-01 00:00:00 ┆ 30          ┆ 3             │
│ 2023-06-01 00:00:00 ┆ null        ┆ 4             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘
shape: (10, 3)
timetemperatureprecipitation
datetime[μs]i64i64
2023-01-01 00:00:0020null
2023-01-01 00:00:00null0
2023-02-01 00:00:00195
2023-03-01 00:00:002610
2023-04-01 00:00:00242
2023-05-01 00:00:00260
2023-06-01 00:00:0028null
2023-06-01 00:00:00303
2023-06-01 00:00:00null4
2023-07-01 00:00:00290

The following strategies are available to use with the on_duplicate argument:

  1. Error (Default): on_duplicate="error"

Raises an error when duplicate rows are found. This is the default behavior to ensure data integrity.

ts.TimeFrame(df, "time", on_duplicates="error")
Warning: Duplicate time values found. A TimeFrame must have unique time values. Options for dealing with duplicate rows include: ['DROP', 'KEEP_FIRST', 'KEEP_LAST', 'ERROR', 'MERGE'].
  1. Keep First: on_duplicate="keep_first"

For a given group of rows with the same time value, keeps only the first row and discards the others.

tf = ts.TimeFrame(df, "time", on_duplicates="keep_first")
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ null          │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ 28          ┆ null          │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘
  1. Keep Last: on_duplicate="keep_last"

For a given group of rows with the same time value, keeps only the last row and discards the others.

tf = ts.TimeFrame(df, "time", on_duplicates="keep_last")
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ null        ┆ 0             │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ null        ┆ 4             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘
  1. Drop: on_duplicate="drop"

Removes all rows that have duplicate timestamps. This strategy is appropriate when you are unsure of the integrity of duplicate rows and only want unique, unambiguous data.

tf = ts.TimeFrame(df, "time", on_duplicates="drop")
shape: (5, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘
  1. Merge: on_duplicate="merge"

For a given group of rows with the same time value, performs a merge of all rows. This combines values with a top-down approach that preserves the first non-null value for each column.

tf = ts.TimeFrame(df, "time", on_duplicates="merge")
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ 0             │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ 28          ┆ 3             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

With Metadata

The TimeFrame object can hold metadata to describe your data. This can be metadata about the time series dataset as a whole, or about the individual columns. Keeping the metadata and the data together in one object like this can help simplify downstream processes, such as derivation functions, running infilling routines, plotting data, etc.

Dataset-level metadata can be set with the with_metadata() method:

metadata = {"location": "UKCEH Wallingford", "station_id": "ABC123"}

tf = tf.with_metadata(metadata)

Column-level metadata can be set with the with_column_metadata() method:

column_metadata = {
    "temperature": {"units": "°C", "description": "Average temperature"},
    "precipitation": {
        "units": "mm",
        "description": "Precipitation amount",
        "instrument_type": "Tipping bucket",
        # Note that metadata keys are not required to be the same for all columns
    },
}

tf = tf.with_column_metadata(column_metadata)

Metadata can be accessed via the metadata (dataset-level) and column_metadata (column-level) attributes:

print("Dataset-level metadata:")
print("")
print("All: ", tf.metadata)
print("Specific key: ", tf.metadata["location"])
print("")
print("Column-level metadata:")
print("")
print("All: ", tf.column_metadata)
print("Specific column: ", tf.column_metadata["temperature"])
print("Specific column key: ", tf.column_metadata["temperature"]["units"])
Dataset-level metadata:

All:  {'location': 'UKCEH Wallingford', 'station_id': 'ABC123'}
Specific key:  UKCEH Wallingford

Column-level metadata:

All:  {'temperature': {'units': '°C', 'description': 'Average temperature'}, 'precipitation': {'units': 'mm', 'description': 'Precipitation amount', 'instrument_type': 'Tipping bucket'}, 'time': {}}
Specific column:  {'units': '°C', 'description': 'Average temperature'}
Specific column key:  °C

Data Access and Update

Data Selection

The underlying Polars DataFrame is accessed via the df property

tf.df

You can create new TimeFrame objects as a selection, using the select() method, or via indexing syntax:

# Select multiple columns as a TimeFrame
selected_tf = tf.select(["temperature"])
# or
selected_tf = tf[["temperature"]]
print("Type: ", type(selected_tf))
print(selected_tf)
Type:  <class 'time_stream.base.TimeFrame'>
shape: (10, 2)
┌─────────────────────┬─────────────┐
│ time                ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 20.5        │
│ 2023-01-02 00:00:00 ┆ 21.0        │
│ 2023-01-03 00:00:00 ┆ 19.1        │
│ 2023-01-04 00:00:00 ┆ 26.0        │
│ 2023-01-05 00:00:00 ┆ 24.2        │
│ 2023-01-06 00:00:00 ┆ 26.6        │
│ 2023-01-07 00:00:00 ┆ 28.4        │
│ 2023-01-08 00:00:00 ┆ 30.9        │
│ 2023-01-09 00:00:00 ┆ 31.0        │
│ 2023-01-10 00:00:00 ┆ 29.1        │
└─────────────────────┴─────────────┘

Note

The primary time column is automatically maintained in any selection.

Data Update

If you need to make changes to the underlying Polars DataFrame, use the with_df() method. This performs some checks on the new DataFrame to check the integrity of the time data has been maintained, and returns a new TimeFrame object with the updated data.

# Update the DataFrame by adding a new column
new_df = tf.df.with_columns((pl.col("temperature") * 1.8 + 32).alias("temperature_f"))

tf = tf.with_df(new_df)
shape: (10, 4)
┌─────────────────────┬─────────────┬───────────────┬───────────────┐
│ time                ┆ temperature ┆ precipitation ┆ temperature_f │
│ ---                 ┆ ---         ┆ ---           ┆ ---           │
│ datetime[μs]        ┆ f64         ┆ f64           ┆ f64           │
╞═════════════════════╪═════════════╪═══════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20.5        ┆ 0.0           ┆ 68.9          │
│ 2023-01-02 00:00:00 ┆ 21.0        ┆ 0.0           ┆ 69.8          │
│ 2023-01-03 00:00:00 ┆ 19.1        ┆ 5.1           ┆ 66.38         │
│ 2023-01-04 00:00:00 ┆ 26.0        ┆ 10.2          ┆ 78.8          │
│ 2023-01-05 00:00:00 ┆ 24.2        ┆ 2.0           ┆ 75.56         │
│ 2023-01-06 00:00:00 ┆ 26.6        ┆ 0.2           ┆ 79.88         │
│ 2023-01-07 00:00:00 ┆ 28.4        ┆ 0.0           ┆ 83.12         │
│ 2023-01-08 00:00:00 ┆ 30.9        ┆ 3.0           ┆ 87.62         │
│ 2023-01-09 00:00:00 ┆ 31.0        ┆ 1.6           ┆ 87.8          │
│ 2023-01-10 00:00:00 ┆ 29.1        ┆ 0.0           ┆ 84.38         │
└─────────────────────┴─────────────┴───────────────┴───────────────┘