TimeSeries Basics

This guide covers the fundamentals of working with the TimeSeries class, the core data structure in the Time Series Package.

Creating a TimeSeries

A TimeSeries wraps a Polars DataFrame and adds specialized functionality for time series operations.

Basic Creation

from datetime import datetime, timedelta

import polars as pl

from time_stream import TimeSeries, Period
from time_stream.aggregation import Mean, Min, Max

# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 19, 26, 24, 26, 28, 30, 31, 29]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation
})

# Create a simple TimeSeries
ts = TimeSeries(
    df=df,
    time_name="timestamp"  # Specify which column contains the primary datetime values
)

Without specifying resolution and periodicity, the default initialisation sets these properties to “1 microsecond”, to account for any set of datetime values:

print(ts.resolution)
print(ts.periodicity)

PT0.000001S
PT0.000001S

With Resolution and Periodicity

Although the default of 1 microsecond will account for any datetime values, for more control over certain time series functionality it is important to specify the actual resolution and periodicity if known:

ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution = Period.of_days(1),  # Each timestamp is at day precision
    periodicity = Period.of_days(1)  # Data points are spaced 1 day apart
)

print(ts.resolution)
print(ts.periodicity)

P1D
P1D

With Metadata

TimeSeries can be initialised with metadata to describe your data. This can be metadata about the time series as a whole, or about the individual columns.

Keeping the metadata and the data together in one object like this can help simplify downstream processes, such as derivation functions, running infilling routines, plotting data, etc.

# TimeSeries metadata
ts_metadata = {
    "location": "River Thames",
    "elevation": 100,
    "station_id": "ABC123"
}

# Column metadata
col_metadata = {
    "temperature": {
        "units": "°C",
        "description": "Average temperature"
    },
    "precipitation": {
        "units": "mm",
        "description": "Precipitation amount",
        "instrument_type": "Tipping bucket"
        # Note that metadata keys are not required to be the same for all columns
    }
}

ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    metadata=ts_metadata,
    column_metadata=col_metadata
)

Time series level metadata can be accessed via the .metadata() method:

print("All metadata: ", ts.metadata())
print("Specific keys: ", ts.metadata(["location", "elevation"]))

All metadata:  {'location': 'River Thames', 'elevation': 100, 'station_id': 'ABC123'}
Specific keys:  {'location': 'River Thames', 'elevation': 100}

Column level metadata can be accessed via attributes on the column itself, or via the column.metadata() method:

# Access via attributes:
print(ts.temperature.units)
print(ts.precipitation.description)

# Or via metadata method:
print(ts.precipitation.metadata(["units", "instrument_type"]))

°C
Precipitation amount
{'units': 'mm', 'instrument_type': 'Tipping bucket'}

Time Validation

The TimeSeries performs validation on timestamps:

Resolution Validation

The resolution defines how precise the timestamps should be:

# This will raise a warning because some timestamps don't align to midnight (00:00:00),
# as required by daily resolution
timestamps = [
    datetime(2023, 1, 1, 0, 0, 0),  # Aligned to midnight
    datetime(2023, 1, 2, 0, 0, 0),  # Aligned to midnight
    datetime(2023, 1, 3, 12, 0, 0),  # Not aligned (noon)
]

df = pl.DataFrame({"timestamp": timestamps, "value": [1, 2, 3]})

# This will raise a UserWarning about resolution alignment
try:
    ts = TimeSeries(
        df=df,
        time_name="timestamp",
        resolution=Period.of_days(1)
    )
except UserWarning as w:
    print(f"Warning: {w}")

Warning: Values in time field: "timestamp" are not aligned to resolution: P1D

Periodicity Validation

The periodicity defines how frequently data points should appear:

# This will raise a warning because we have two points within the same day
timestamps = [
    datetime(2023, 1, 1, 0, 0, 0),
    datetime(2023, 1, 1, 12, 0, 0),  # Same day as above
    datetime(2023, 1, 2, 0, 0, 0),
]

df = pl.DataFrame({"timestamp": timestamps, "value": [1, 2, 3]})

# This will raise a UserWarning about periodicity
try:
    ts = TimeSeries(
        df=df,
        time_name="timestamp",
        periodicity=Period.of_days(1)
    )
except UserWarning as w:
    print(f"Warning: {w}")

Warning: Values in time field: "timestamp" do not conform to periodicity: P1D

Duplicate Detection

The TimeSeries automatically checks for rows with duplicates in the specified time column. You have control over what the model should do when it detects rows with duplicate time values. Consider this DataFrame with duplicate time values:

# Create sample data
dates = [
    datetime(2023, 1, 1),
    datetime(2023, 1, 1),
    datetime(2023, 2, 1),
    datetime(2023, 3, 1),
    datetime(2023, 4, 1),
    datetime(2023, 5, 1),
    datetime(2023, 6, 1),
    datetime(2023, 6, 1),
    datetime(2023, 6, 1),
    datetime(2023, 7, 1),
]
temperatures = [20, None, 19, 26, 24, 26, 28, 30, None, 29]
precipitation = [None, 0, 5, 10, 2, 0, None, 3, 4, 0]

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation
})

print(df)

shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ null          │
│ 2023-01-01 00:00:00 ┆ null        ┆ 0             │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ 28          ┆ null          │
│ 2023-06-01 00:00:00 ┆ 30          ┆ 3             │
│ 2023-06-01 00:00:00 ┆ null        ┆ 4             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

The following strategies are available to use with the on_duplicate argument:

1. Error Strategy (Default): `on_duplicate="error"`

Raises an error when duplicate rows are found. This is the default behavior to ensure data integrity.

# Raises an error if duplicate timestamps exist. This is the default if `on_duplicate` is not specified.
try:
    ts = TimeSeries(
        df=df,
        time_name="timestamp",
        on_duplicates="error"
    )
except ValueError as w:
    print(f"Warning: {w}")

Warning: Duplicate time values found: [datetime.datetime(2023, 1, 1, 0, 0), datetime.datetime(2023, 6, 1, 0, 0)]

2. Keep First Strategy: `on_duplicate="keep_first"`

For a given group of rows with the same time value, keeps only the first row and discards the others.

# Keeps the first row found in groups of duplicate rows
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    on_duplicates="keep_first"
)

print(ts)

shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ null          │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ 28          ┆ null          │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

3. Keep Last Strategy: `on_duplicate="keep_last"`

For a given group of rows with the same time value, keeps only the last row and discards the others.

# Keeps the last row found in groups of duplicate rows
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    on_duplicates="keep_last"
)

print(ts)

shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ null        ┆ 0             │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ null        ┆ 4             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

4. Drop Strategy: `on_duplicate="drop"`

Removes all rows that have duplicate timestamps. This strategy is appropriate when you are unsure of the integrity of duplicate rows and only want unique, unambiguous data.

# Drops all duplicate rows
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    on_duplicates="drop"
)

print(ts)

shape: (5, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

5. Merge Strategy: `on_duplicate="merge"`

For a given group of rows with the same time value, performs a merge of all rows. This combines values with a top-down approach that preserves the first non-null value for each column.

# Merges groups of duplicate rows
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    on_duplicates="merge"
)

print(ts)

shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ 0             │
│ 2023-02-01 00:00:00 ┆ 19          ┆ 5             │
│ 2023-03-01 00:00:00 ┆ 26          ┆ 10            │
│ 2023-04-01 00:00:00 ┆ 24          ┆ 2             │
│ 2023-05-01 00:00:00 ┆ 26          ┆ 0             │
│ 2023-06-01 00:00:00 ┆ 28          ┆ 3             │
│ 2023-07-01 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

Missing Rows

The TimeSeries class provides functionality to automatically pad missing time points within a time series, ensuring complete temporal coverage without gaps. Consider daily data with some missing days:

# Create sample data with gaps
dates = [
    datetime(2023, 1, 1),
    datetime(2023, 1, 3),
    datetime(2023, 1, 4),
    datetime(2023, 1, 5),
    datetime(2023, 1, 7),
]
temperatures = [20, 19, 26, 24, 26]
precipitation = [0, 5, 10, 2, 0]

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation
})

print(df)

shape: (5, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ 0             │
│ 2023-01-03 00:00:00 ┆ 19          ┆ 5             │
│ 2023-01-04 00:00:00 ┆ 26          ┆ 10            │
│ 2023-01-05 00:00:00 ┆ 24          ┆ 2             │
│ 2023-01-07 00:00:00 ┆ 26          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

The padding is controlled by the pad parameter during TimeSeries initialisation:

ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    pad=True  # Enable padding
)

print(ts)

shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ 0             │
│ 2023-01-02 00:00:00 ┆ null        ┆ null          │
│ 2023-01-03 00:00:00 ┆ 19          ┆ 5             │
│ 2023-01-04 00:00:00 ┆ 26          ┆ 10            │
│ 2023-01-05 00:00:00 ┆ 24          ┆ 2             │
│ 2023-01-06 00:00:00 ┆ null        ┆ null          │
│ 2023-01-07 00:00:00 ┆ 26          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

The padding functionality respects the resolution and periodicity of your data. The above example was simple, with missing daily data being filled in the datetime of the missing days. It gets more complex when you are dealing with time series that may have different resolution to periodicities. For example, consider a time series that is the “annual maximum of 15-minute river flow data in a given UK water-year”. The resolution would be 15-minutes, however the periodicity would be P1Y+9MT9H, because a water-year starts at 9am on the 1st October:

# Create sample water-year data with gaps
dates = [
    datetime(2023, 5, 16, 10, 15, 0),  # Water year 2022-2023
    datetime(2023, 10, 3, 19, 30, 0),  # Water year 2023-2024
    # Missing water year 2024-2025
    datetime(2025, 11, 30, 0, 0, 0), # Water year 2025-2026
]
max_flow = [20, 25, 30]

df = pl.DataFrame({
    "timestamp": dates,
    "max_flow": max_flow,
})

print(df)

shape: (3, 2)
┌─────────────────────┬──────────┐
│ timestamp           ┆ max_flow │
│ ---                 ┆ ---      │
│ datetime[μs]        ┆ i64      │
╞═════════════════════╪══════════╡
│ 2023-05-16 10:15:00 ┆ 20       │
│ 2023-10-03 19:30:00 ┆ 25       │
│ 2025-11-30 00:00:00 ┆ 30       │
└─────────────────────┴──────────┘

The padding takes this resolution and periodicity into account, and sets missing rows to the start of the period:

ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_minutes(15),
    periodicity=Period.of_years(1).with_month_offset(9).with_hour_offset(9),
    pad=True  # Enable padding
)

print(ts)

shape: (4, 2)
┌─────────────────────┬──────────┐
│ timestamp           ┆ max_flow │
│ ---                 ┆ ---      │
│ datetime[μs]        ┆ i64      │
╞═════════════════════╪══════════╡
│ 2023-05-16 10:15:00 ┆ 20       │
│ 2023-10-03 19:30:00 ┆ 25       │
│ 2024-10-01 09:00:00 ┆ null     │
│ 2025-11-30 00:00:00 ┆ 30       │
└─────────────────────┴──────────┘

Warning

It is very important to set the periodicity and resolution parameters if you want to pad your data. Otherwise, the padding process will use the default periods of 1 microsecond, and try to pad your entire dataset with microsecond data, which will almost definitely result in a memory error!

Accessing Data

There are multiple ways to access data from a TimeSeries:

Accessing the DataFrame

# Get the full DataFrame
df = ts.df

This gives the underlying Polars DataFrame, with which you can carry out normal Polars functionality:

# Select specific columns from the DataFrame
temp_precip_df = ts.df.select(["timestamp", "temperature", "precipitation"])

# Filter the DataFrame
rainy_days_df = ts.df.filter(pl.col("precipitation") > 0)

Accessing Columns

The TimeSeries class gives other ways to access data within the time series, whilst maintaining the core link to primary datetime column.

# Access column as a TimeSeriesColumn object
temperature_col = ts.temperature
print("Type temperature_col: ", type(temperature_col))

# Get the underlying data from a column
temperature_data = ts.temperature.data
print("Type temperature_data: ", type(temperature_data))

# Access column properties
temperature_units = ts.temperature.units
print("Temperature units: ", temperature_units)

# Get column as a TimeSeries
temperature_ts = ts["temperature"]
print("Type temperature_ts: ", type(temperature_ts))
print(temperature_ts)

# Select multiple columns as a TimeSeries
selected_ts = ts.select(["temperature", "precipitation"])
# or
selected_ts = ts[["temperature", "precipitation"]]
print("Type selected_ts: ", type(selected_ts))
print(selected_ts)

Type temperature_col:  <class 'time_stream.columns.DataColumn'>
Type temperature_data:  <class 'polars.dataframe.frame.DataFrame'>
Temperature units:  °C
Type temperature_ts:  <class 'time_stream.base.TimeSeries'>
shape: (10, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ i64         │
╞═════════════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 20          │
│ 2023-01-02 00:00:00 ┆ 21          │
│ 2023-01-03 00:00:00 ┆ 19          │
│ 2023-01-04 00:00:00 ┆ 26          │
│ 2023-01-05 00:00:00 ┆ 24          │
│ 2023-01-06 00:00:00 ┆ 26          │
│ 2023-01-07 00:00:00 ┆ 28          │
│ 2023-01-08 00:00:00 ┆ 30          │
│ 2023-01-09 00:00:00 ┆ 31          │
│ 2023-01-10 00:00:00 ┆ 29          │
└─────────────────────┴─────────────┘
Type selected_ts:  <class 'time_stream.base.TimeSeries'>
shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ precipitation │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ i64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ 0             │
│ 2023-01-02 00:00:00 ┆ 21          ┆ 0             │
│ 2023-01-03 00:00:00 ┆ 19          ┆ 5             │
│ 2023-01-04 00:00:00 ┆ 26          ┆ 10            │
│ 2023-01-05 00:00:00 ┆ 24          ┆ 2             │
│ 2023-01-06 00:00:00 ┆ 26          ┆ 0             │
│ 2023-01-07 00:00:00 ┆ 28          ┆ 0             │
│ 2023-01-08 00:00:00 ┆ 30          ┆ 3             │
│ 2023-01-09 00:00:00 ┆ 31          ┆ 1             │
│ 2023-01-10 00:00:00 ┆ 29          ┆ 0             │
└─────────────────────┴─────────────┴───────────────┘

Updating a TimeSeries

You can update the underlying DataFrame (while preserving column settings), as long as the primary datetime column remains unchanged.

# Update the DataFrame by adding a new column
ts.df = ts.df.with_columns(
    (pl.col("temperature") * 1.8 + 32).alias("temperature_f")
)

# The new column will be available as a DataColumn
print("New temperature column in fahrenheit: ", ts[["temperature", "temperature_f"]])

New temperature column in fahrenheit:  shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp           ┆ temperature ┆ temperature_f │
│ ---                 ┆ ---         ┆ ---           │
│ datetime[μs]        ┆ i64         ┆ f64           │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20          ┆ 68.0          │
│ 2023-01-02 00:00:00 ┆ 21          ┆ 69.8          │
│ 2023-01-03 00:00:00 ┆ 19          ┆ 66.2          │
│ 2023-01-04 00:00:00 ┆ 26          ┆ 78.8          │
│ 2023-01-05 00:00:00 ┆ 24          ┆ 75.2          │
│ 2023-01-06 00:00:00 ┆ 26          ┆ 78.8          │
│ 2023-01-07 00:00:00 ┆ 28          ┆ 82.4          │
│ 2023-01-08 00:00:00 ┆ 30          ┆ 86.0          │
│ 2023-01-09 00:00:00 ┆ 31          ┆ 87.8          │
│ 2023-01-10 00:00:00 ┆ 29          ┆ 84.2          │
└─────────────────────┴─────────────┴───────────────┘

If an update to the DataFrame results in a change to the primary datetime values, resolution or periodicity, then an error will be raised. A new TimeSeries object should be created.

# Try and update the DataFrame by filtering columns
# (which inherently removes some of the time series)
try:
    ts.df = ts.df.filter(pl.col("precipitation") > 0)
except ValueError as w:
    print(f"Warning: {w}")

Warning: Time column has mutated.

Working with Columns

The TimeSeries class provides methods for working with different column types:

Column Types

There are four column types:

Primary Time Column: The datetime column
Data Columns: Regular data columns (default type)
Supplementary Columns: Metadata or contextual information
Flag Columns: Flag markers giving specific information about data points

Creating Supplementary Columns

Supplementary columns can be specified on initialisation of the TimeSeries object:

# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 1, 26, 24, 26, 28, 41, 51, None]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]
observer_comments = ["", "", "Power cut between 8am and 1pm", "", "", "",
                     "Agricultural work in adjacent field", "", "", "Tree felling"]

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation,
    "observer_comments": observer_comments
})

# Create a TimeSeries
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    supplementary_columns=["observer_comments"]
)

print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
print("Observer comments column: ", ts.observer_comments)

Data columns:  {'temperature': DataColumn('temperature'), 'precipitation': DataColumn('precipitation')}
Supplementary columns:  {'observer_comments': SupplementaryColumn('observer_comments')}
Observer comments column:  shape: (10, 2)
┌─────────────────────┬─────────────────────────────────┐
│ timestamp           ┆ observer_comments               │
│ ---                 ┆ ---                             │
│ datetime[μs]        ┆ str                             │
╞═════════════════════╪═════════════════════════════════╡
│ 2023-01-01 00:00:00 ┆                                 │
│ 2023-01-02 00:00:00 ┆                                 │
│ 2023-01-03 00:00:00 ┆ Power cut between 8am and 1pm   │
│ 2023-01-04 00:00:00 ┆                                 │
│ 2023-01-05 00:00:00 ┆                                 │
│ 2023-01-06 00:00:00 ┆                                 │
│ 2023-01-07 00:00:00 ┆ Agricultural work in adjacent … │
│ 2023-01-08 00:00:00 ┆                                 │
│ 2023-01-09 00:00:00 ┆                                 │
│ 2023-01-10 00:00:00 ┆ Tree felling                    │
└─────────────────────┴─────────────────────────────────┘

Existing data columns can be converted to be a supplementary column:

# Convert an existing column to supplementary
ts.set_supplementary_column("precipitation")

print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)

Data columns:  {'temperature': DataColumn('temperature')}
Supplementary columns:  {'precipitation': SupplementaryColumn('precipitation'), 'observer_comments': SupplementaryColumn('observer_comments')}

Or a completely new column can be initialised as a supplementary column:

# Add a new supplementary column, with new data
new_data = [12, 15, 6, 12, 10, 14, 19, 17, 16, 13]
ts.init_supplementary_column("battery_voltage", new_data)

print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
print("Battery voltage column: ", ts.battery_voltage)

Data columns:  {'temperature': DataColumn('temperature')}
Supplementary columns:  {'precipitation': SupplementaryColumn('precipitation'), 'observer_comments': SupplementaryColumn('observer_comments'), 'battery_voltage': SupplementaryColumn('battery_voltage')}
Battery voltage column:  shape: (10, 2)
┌─────────────────────┬─────────────────┐
│ timestamp           ┆ battery_voltage │
│ ---                 ┆ ---             │
│ datetime[μs]        ┆ i64             │
╞═════════════════════╪═════════════════╡
│ 2023-01-01 00:00:00 ┆ 12              │
│ 2023-01-02 00:00:00 ┆ 15              │
│ 2023-01-03 00:00:00 ┆ 6               │
│ 2023-01-04 00:00:00 ┆ 12              │
│ 2023-01-05 00:00:00 ┆ 10              │
│ 2023-01-06 00:00:00 ┆ 14              │
│ 2023-01-07 00:00:00 ┆ 19              │
│ 2023-01-08 00:00:00 ┆ 17              │
│ 2023-01-09 00:00:00 ┆ 16              │
│ 2023-01-10 00:00:00 ┆ 13              │
└─────────────────────┴─────────────────┘

Creating Flag Columns

Flag columns are inherently linked to a flag system. The flag system sets out the meanings of values that can be added to the flag column.

If they already exist, flag columns and their associated flag systems can be specified on initialisation of the TimeSeries object:

# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 1, 26, 24, 26, 28, 41, 51, None]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]

temperature_qc_flags = [0, 0, 2, 0, 0, 0, 0, 0, 3, 8]
flag_systems = {"quality_control_checks": {"OUT_OF_RANGE": 1, "SPIKE": 2, "LOW_VOLTAGE": 4, "MISSING": 8}}

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation,
    "temperature_qc_flags": temperature_qc_flags,
})

# Create a TimeSeries
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    flag_systems=flag_systems,
    flag_columns={"temperature_qc_flags": "quality_control_checks"}
)

print("Data columns: ", ts.data_columns)
print("Flag columns: ", ts.flag_columns)
print("Flag systems: ", ts.flag_systems)
print("Temperature flag column: ", ts.temperature_qc_flags)

Data columns:  {'temperature': DataColumn('temperature'), 'precipitation': DataColumn('precipitation')}
Flag columns:  {'temperature_qc_flags': FlagColumn('temperature_qc_flags')}
Flag systems:  {'quality_control_checks': <quality_control_checks (OUT_OF_RANGE=1, SPIKE=2, LOW_VOLTAGE=4, MISSING=8)>}
Temperature flag column:  shape: (10, 2)
┌─────────────────────┬──────────────────────┐
│ timestamp           ┆ temperature_qc_flags │
│ ---                 ┆ ---                  │
│ datetime[μs]        ┆ i64                  │
╞═════════════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 0                    │
│ 2023-01-02 00:00:00 ┆ 0                    │
│ 2023-01-03 00:00:00 ┆ 2                    │
│ 2023-01-04 00:00:00 ┆ 0                    │
│ 2023-01-05 00:00:00 ┆ 0                    │
│ 2023-01-06 00:00:00 ┆ 0                    │
│ 2023-01-07 00:00:00 ┆ 0                    │
│ 2023-01-08 00:00:00 ┆ 0                    │
│ 2023-01-09 00:00:00 ┆ 3                    │
│ 2023-01-10 00:00:00 ┆ 8                    │
└─────────────────────┴──────────────────────┘

Otherwise, flag columns can be initialised dynamically on the TimeSeries object:

# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 1, 26, 24, 26, 28, 41, 51, None]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]

flag_systems = {"quality_control_checks": {"OUT_OF_RANGE": 1, "SPIKE": 2, "LOW_VOLTAGE": 4, "MISSING": 8}}

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation
})

# Create a TimeSeries
ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    flag_systems=flag_systems
)

# Add a flag column for temperature data, which will use the quality_control_checks flag system
ts.init_flag_column("quality_control_checks", "temperature_qc_flags")

Methods are available to add (or remove) flags to a Flag Column:

# Add flags
ts.add_flag("temperature_qc_flags", "OUT_OF_RANGE", pl.col("temperature") > 40)
ts.add_flag("temperature_qc_flags", "MISSING", pl.col("temperature").is_null())

print(ts.temperature_qc_flags)

# Remove a flag
ts.remove_flag("temperature_qc_flags", "OUT_OF_RANGE", pl.col("temperature") <= 45)

print(ts.temperature_qc_flags)

shape: (10, 2)
┌─────────────────────┬──────────────────────┐
│ timestamp           ┆ temperature_qc_flags │
│ ---                 ┆ ---                  │
│ datetime[μs]        ┆ i64                  │
╞═════════════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 0                    │
│ 2023-01-02 00:00:00 ┆ 0                    │
│ 2023-01-03 00:00:00 ┆ 0                    │
│ 2023-01-04 00:00:00 ┆ 0                    │
│ 2023-01-05 00:00:00 ┆ 0                    │
│ 2023-01-06 00:00:00 ┆ 0                    │
│ 2023-01-07 00:00:00 ┆ 0                    │
│ 2023-01-08 00:00:00 ┆ 1                    │
│ 2023-01-09 00:00:00 ┆ 1                    │
│ 2023-01-10 00:00:00 ┆ 8                    │
└─────────────────────┴──────────────────────┘
shape: (10, 2)
┌─────────────────────┬──────────────────────┐
│ timestamp           ┆ temperature_qc_flags │
│ ---                 ┆ ---                  │
│ datetime[μs]        ┆ i64                  │
╞═════════════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 0                    │
│ 2023-01-02 00:00:00 ┆ 0                    │
│ 2023-01-03 00:00:00 ┆ 0                    │
│ 2023-01-04 00:00:00 ┆ 0                    │
│ 2023-01-05 00:00:00 ┆ 0                    │
│ 2023-01-06 00:00:00 ┆ 0                    │
│ 2023-01-07 00:00:00 ┆ 0                    │
│ 2023-01-08 00:00:00 ┆ 0                    │
│ 2023-01-09 00:00:00 ┆ 1                    │
│ 2023-01-10 00:00:00 ┆ 8                    │
└─────────────────────┴──────────────────────┘

Column Relationships

You can define relationships between columns that are linked together in some way. Data columns can be given a relationship to both supplementary and flag columns, though supplementary and flag columns cannot be given a relationship to each other.

# Starting with an example TimeSeries, with supplementary and flag columns:
print(ts)

print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
print("Flag columns: ", ts.flag_columns)

# Create a relationship between the supplementary column and data columns (using method on the TimeSeries object)
ts.add_column_relationship("battery_voltage", ["temperature", "precipitation"])

# Create a relationship between temperature and its flags (using method on the Column object)
ts.temperature.add_relationship("temperature_qc_flags")

print("")
print("Temperature column relationships: ", ts.temperature.get_relationships())
print("Battery voltage column relationships: ", ts.battery_voltage.get_relationships())

shape: (10, 6)
┌─────────────────┬─────────────┬───────────────┬────────────────┬────────────────┬────────────────┐
│ timestamp       ┆ temperature ┆ precipitation ┆ observer_comme ┆ battery_voltag ┆ temperature_qc │
│ ---             ┆ ---         ┆ ---           ┆ nts            ┆ e              ┆ _flags         │
│ datetime[μs]    ┆ i64         ┆ i64           ┆ ---            ┆ ---            ┆ ---            │
│                 ┆             ┆               ┆ str            ┆ i64            ┆ i64            │
╞═════════════════╪═════════════╪═══════════════╪════════════════╪════════════════╪════════════════╡
│ 2023-01-01      ┆ 20          ┆ 0             ┆                ┆ 12             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-02      ┆ 21          ┆ 0             ┆                ┆ 15             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-03      ┆ 1           ┆ 5             ┆ Power cut      ┆ 6              ┆ 0              │
│ 00:00:00        ┆             ┆               ┆ between 8am    ┆                ┆                │
│                 ┆             ┆               ┆ and 1pm        ┆                ┆                │
│ 2023-01-04      ┆ 26          ┆ 10            ┆                ┆ 12             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-05      ┆ 24          ┆ 2             ┆                ┆ 10             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-06      ┆ 26          ┆ 0             ┆                ┆ 14             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-07      ┆ 28          ┆ 0             ┆ Agricultural   ┆ 19             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆ work in        ┆                ┆                │
│                 ┆             ┆               ┆ adjacent …     ┆                ┆                │
│ 2023-01-08      ┆ 41          ┆ 3             ┆                ┆ 17             ┆ 1              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-09      ┆ 51          ┆ 1             ┆                ┆ 16             ┆ 1              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
│ 2023-01-10      ┆ null        ┆ 0             ┆ Tree felling   ┆ 13             ┆ 0              │
│ 00:00:00        ┆             ┆               ┆                ┆                ┆                │
└─────────────────┴─────────────┴───────────────┴────────────────┴────────────────┴────────────────┘
Data columns:  {'temperature': DataColumn('temperature'), 'precipitation': DataColumn('precipitation')}
Supplementary columns:  {'observer_comments': SupplementaryColumn('observer_comments'), 'battery_voltage': SupplementaryColumn('battery_voltage')}
Flag columns:  {'temperature_qc_flags': FlagColumn('temperature_qc_flags')}

Temperature column relationships:  [Relationship('temperature - battery_voltage'), Relationship('temperature - temperature_qc_flags')]
Battery voltage column relationships:  [Relationship('temperature - battery_voltage'), Relationship('precipitation - battery_voltage')]

Relationships can be removed:

ts.remove_column_relationship("temperature", "battery_voltage")
print("Temperature column relationships: ", ts.temperature.get_relationships())

Temperature column relationships:  [Relationship('temperature - temperature_qc_flags')]

The relationship also defines what happens when a column is removed. For example, if a Data Column is dropped, then this will cascade to any linked Flag Columns. Any linked Supplementary Columns are not dropped, but the relationship removed:

ts.df = ts.df.drop("temperature")
# Note that temperature and temperature_qc_flags are removed, but battery_voltage remains.
print(ts)

shape: (10, 4)
┌─────────────────────┬───────────────┬─────────────────────────────────┬─────────────────┐
│ timestamp           ┆ precipitation ┆ observer_comments               ┆ battery_voltage │
│ ---                 ┆ ---           ┆ ---                             ┆ ---             │
│ datetime[μs]        ┆ i64           ┆ str                             ┆ i64             │
╞═════════════════════╪═══════════════╪═════════════════════════════════╪═════════════════╡
│ 2023-01-01 00:00:00 ┆ 0             ┆                                 ┆ 12              │
│ 2023-01-02 00:00:00 ┆ 0             ┆                                 ┆ 15              │
│ 2023-01-03 00:00:00 ┆ 5             ┆ Power cut between 8am and 1pm   ┆ 6               │
│ 2023-01-04 00:00:00 ┆ 10            ┆                                 ┆ 12              │
│ 2023-01-05 00:00:00 ┆ 2             ┆                                 ┆ 10              │
│ 2023-01-06 00:00:00 ┆ 0             ┆                                 ┆ 14              │
│ 2023-01-07 00:00:00 ┆ 0             ┆ Agricultural work in adjacent … ┆ 19              │
│ 2023-01-08 00:00:00 ┆ 3             ┆                                 ┆ 17              │
│ 2023-01-09 00:00:00 ┆ 1             ┆                                 ┆ 16              │
│ 2023-01-10 00:00:00 ┆ 0             ┆ Tree felling                    ┆ 13              │
└─────────────────────┴───────────────┴─────────────────────────────────┴─────────────────┘

Aggregating Data

The TimeSeries class provides powerful aggregation capabilities.

Given a year’s worth of minute data:

# The following TimeSeries has 1-year's worth of 1-minute resolution random temperature data:
print(ts)

shape: (525_600, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 15.099343   │
│ 2023-01-01 00:01:00 ┆ 14.994283   │
│ 2023-01-01 00:02:00 ┆ 15.173409   │
│ 2023-01-01 00:03:00 ┆ 15.370413   │
│ 2023-01-01 00:04:00 ┆ 15.04091    │
│ …                   ┆ …           │
│ 2023-12-31 23:55:00 ┆ 14.826904   │
│ 2023-12-31 23:56:00 ┆ 15.06587    │
│ 2023-12-31 23:57:00 ┆ 15.07932    │
│ 2023-12-31 23:58:00 ┆ 15.054764   │
│ 2023-12-31 23:59:00 ┆ 15.175625   │
└─────────────────────┴─────────────┘

We can aggregate this data to various new resolutions.

This example shows an aggregation to monthly mean temperatures. The aggregation function can also be specified by a string ( upper or lower case). Note that this returns a new TimeSeries object, as the primary time attributes have changed.

The returned TimeSeries provides additional context columns:

Expected count of the number of data points expected if the aggregation period was full
Actual count of the number of data points found in the data for the given aggregation period.
For Max and Min, the datetime of the Max/Min data point within the given aggregation period.
A boolean column that indicates where the individual aggregated data point is valid or not (see below).

# Import the required aggregation function
from time_stream.aggregation import Mean

# Create a monthly aggregation of the minute data, either by importing the aggregation class
# or by using a string.  The class can be passed directly, or by setting an instance of the class:
monthly_mean_temp = ts.aggregate(Period.of_months(1), Mean, "temperature")   # Direct class
monthly_mean_temp = ts.aggregate(Period.of_months(1), Mean(), "temperature") # Class instance
monthly_mean_temp = ts.aggregate(Period.of_months(1), "mean", "temperature") # String

print(monthly_mean_temp)

shape: (12, 5)
┌──────────────┬──────────────────┬───────────────────┬────────────────────────┬───────────────────┐
│ timestamp    ┆ mean_temperature ┆ count_temperature ┆ expected_count_timesta ┆ valid_temperature │
│ ---          ┆ ---              ┆ ---               ┆ mp                     ┆ ---               │
│ datetime[μs] ┆ f64              ┆ u32               ┆ ---                    ┆ bool              │
│              ┆                  ┆                   ┆ i64                    ┆                   │
╞══════════════╪══════════════════╪═══════════════════╪════════════════════════╪═══════════════════╡
│ 2023-01-01   ┆ 17.605512        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-02-01   ┆ 21.927644        ┆ 40320             ┆ 40320                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-03-01   ┆ 24.474063        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-04-01   ┆ 24.613575        ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-05-01   ┆ 22.150193        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ …            ┆ …                ┆ …                 ┆ …                      ┆ …                 │
│ 2023-08-01   ┆ 8.087156         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-09-01   ┆ 5.474087         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-10-01   ┆ 5.433785         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-11-01   ┆ 7.960447         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-01   ┆ 12.393433        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
└──────────────┴──────────────────┴───────────────────┴────────────────────────┴───────────────────┘

Missing criteria

By default, the aggregation method will provide data regardless of how many missing data points there are in the period. For example, if we have two 1 minute data points on a given day, doing a mean aggregation would return the mean of those 2 values, even though we’d expect 1440 values for a full day.

To have more control, you can specify criteria for a valid aggregation using the missing_criteria argument.

("missing", 30) Aggregation is valid if there are no more than 30 values missing in the period.
("available", 30) Aggregation is valid if there are at least 30 input values in the period.
("percent", 30) Aggregation is valid if the data in the period is at least 30 percent complete (accepts integers or floats).

The resulting aggregation TimeSeries will contain a valid_<column name> column - a boolean series that indicates whether the individual aggregated data points are valid or not, based on the missing criteria specified. If no missing_criteria are specified, the valid column will be set to True.

# The input Time Series has 50% of January's data removed, and exactly 1440 values (1 day) from February removed.

# Use the "missing" missing criteria - Allow at most 1000 missing values
missing_ts = ts.aggregate(Period.of_months(1), Mean, "temperature", missing_criteria=("missing", 1_000))
print("Missing criteria:", missing_ts)

# Use the "available" missing criteria - Require at least 40,320 values present (28 days)
available_ts = ts.aggregate(Period.of_months(1), Mean, "temperature", missing_criteria=("available", 40_320))
print("Available criteria:", available_ts)

# Use the "percent" missing criteria - Require at least 60% of data to be present
percent_ts = ts.aggregate(Period.of_months(1), Mean, "temperature", missing_criteria=("percent", 60))
print("Percent criteria:", percent_ts)

Missing criteria: shape: (12, 5)
┌──────────────┬──────────────────┬───────────────────┬────────────────────────┬───────────────────┐
│ timestamp    ┆ mean_temperature ┆ count_temperature ┆ expected_count_timesta ┆ valid_temperature │
│ ---          ┆ ---              ┆ ---               ┆ mp                     ┆ ---               │
│ datetime[μs] ┆ f64              ┆ u32               ┆ ---                    ┆ bool              │
│              ┆                  ┆                   ┆ i64                    ┆                   │
╞══════════════╪══════════════════╪═══════════════════╪════════════════════════╪═══════════════════╡
│ 2023-01-01   ┆ 17.624079        ┆ 22320             ┆ 44640                  ┆ false             │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-02-01   ┆ 21.93167         ┆ 38880             ┆ 40320                  ┆ false             │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-03-01   ┆ 24.474063        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-04-01   ┆ 24.613575        ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-05-01   ┆ 22.150193        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ …            ┆ …                ┆ …                 ┆ …                      ┆ …                 │
│ 2023-08-01   ┆ 8.087156         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-09-01   ┆ 5.474087         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-10-01   ┆ 5.433785         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-11-01   ┆ 7.960447         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-01   ┆ 12.393433        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
└──────────────┴──────────────────┴───────────────────┴────────────────────────┴───────────────────┘
Available criteria: shape: (12, 5)
┌──────────────┬──────────────────┬───────────────────┬────────────────────────┬───────────────────┐
│ timestamp    ┆ mean_temperature ┆ count_temperature ┆ expected_count_timesta ┆ valid_temperature │
│ ---          ┆ ---              ┆ ---               ┆ mp                     ┆ ---               │
│ datetime[μs] ┆ f64              ┆ u32               ┆ ---                    ┆ bool              │
│              ┆                  ┆                   ┆ i64                    ┆                   │
╞══════════════╪══════════════════╪═══════════════════╪════════════════════════╪═══════════════════╡
│ 2023-01-01   ┆ 17.624079        ┆ 22320             ┆ 44640                  ┆ false             │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-02-01   ┆ 21.93167         ┆ 38880             ┆ 40320                  ┆ false             │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-03-01   ┆ 24.474063        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-04-01   ┆ 24.613575        ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-05-01   ┆ 22.150193        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ …            ┆ …                ┆ …                 ┆ …                      ┆ …                 │
│ 2023-08-01   ┆ 8.087156         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-09-01   ┆ 5.474087         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-10-01   ┆ 5.433785         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-11-01   ┆ 7.960447         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-01   ┆ 12.393433        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
└──────────────┴──────────────────┴───────────────────┴────────────────────────┴───────────────────┘
Percent criteria: shape: (12, 5)
┌──────────────┬──────────────────┬───────────────────┬────────────────────────┬───────────────────┐
│ timestamp    ┆ mean_temperature ┆ count_temperature ┆ expected_count_timesta ┆ valid_temperature │
│ ---          ┆ ---              ┆ ---               ┆ mp                     ┆ ---               │
│ datetime[μs] ┆ f64              ┆ u32               ┆ ---                    ┆ bool              │
│              ┆                  ┆                   ┆ i64                    ┆                   │
╞══════════════╪══════════════════╪═══════════════════╪════════════════════════╪═══════════════════╡
│ 2023-01-01   ┆ 17.624079        ┆ 22320             ┆ 44640                  ┆ false             │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-02-01   ┆ 21.93167         ┆ 38880             ┆ 40320                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-03-01   ┆ 24.474063        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-04-01   ┆ 24.613575        ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-05-01   ┆ 22.150193        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ …            ┆ …                ┆ …                 ┆ …                      ┆ …                 │
│ 2023-08-01   ┆ 8.087156         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-09-01   ┆ 5.474087         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-10-01   ┆ 5.433785         ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-11-01   ┆ 7.960447         ┆ 43200             ┆ 43200                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-01   ┆ 12.393433        ┆ 44640             ┆ 44640                  ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
└──────────────┴──────────────────┴───────────────────┴────────────────────────┴───────────────────┘

Multiple columns

The aggregation method can accept multiple columns to aggregate. They will all be aggregated using the same method and criteria. The context columns explained above will be generated for each column individually, and be clearly labelled in the column name so you are aware which data they refer to.

# Add some dummy "precipitation" data to the input Time Series
ts.df = ts.df.with_columns(
    pl.Series("precipitation", range(0, len(ts)))
)

# Aggregate both the temperature and precipitation columns
multiple_cols_ts = ts.aggregate(Period.of_months(1), Mean, ["temperature", "precipitation"])
print(multiple_cols_ts)

shape: (12, 8)
┌────────────┬────────────┬────────────┬───────────┬───────────┬───────────┬───────────┬───────────┐
│ timestamp  ┆ mean_tempe ┆ mean_preci ┆ count_tem ┆ count_pre ┆ expected_ ┆ valid_tem ┆ valid_pre │
│ ---        ┆ rature     ┆ pitation   ┆ perature  ┆ cipitatio ┆ count_tim ┆ perature  ┆ cipitatio │
│ datetime[μ ┆ ---        ┆ ---        ┆ ---       ┆ n         ┆ estamp    ┆ ---       ┆ n         │
│ s]         ┆ f64        ┆ f64        ┆ u32       ┆ ---       ┆ ---       ┆ bool      ┆ ---       │
│            ┆            ┆            ┆           ┆ u32       ┆ i64       ┆           ┆ bool      │
╞════════════╪════════════╪════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ 2023-01-01 ┆ 17.605512  ┆ 22319.5    ┆ 44640     ┆ 44640     ┆ 44640     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-02-01 ┆ 21.927644  ┆ 64799.5    ┆ 40320     ┆ 40320     ┆ 40320     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-03-01 ┆ 24.474063  ┆ 107279.5   ┆ 44640     ┆ 44640     ┆ 44640     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-04-01 ┆ 24.613575  ┆ 151199.5   ┆ 43200     ┆ 43200     ┆ 43200     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-05-01 ┆ 22.150193  ┆ 195119.5   ┆ 44640     ┆ 44640     ┆ 44640     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ …          ┆ …          ┆ …          ┆ …         ┆ …         ┆ …         ┆ …         ┆ …         │
│ 2023-08-01 ┆ 8.087156   ┆ 327599.5   ┆ 44640     ┆ 44640     ┆ 44640     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-09-01 ┆ 5.474087   ┆ 371519.5   ┆ 43200     ┆ 43200     ┆ 43200     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-10-01 ┆ 5.433785   ┆ 415439.5   ┆ 44640     ┆ 44640     ┆ 44640     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-11-01 ┆ 7.960447   ┆ 459359.5   ┆ 43200     ┆ 43200     ┆ 43200     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
│ 2023-12-01 ┆ 12.393433  ┆ 503279.5   ┆ 44640     ┆ 44640     ┆ 44640     ┆ true      ┆ true      │
│ 00:00:00   ┆            ┆            ┆           ┆           ┆           ┆           ┆           │
└────────────┴────────────┴────────────┴───────────┴───────────┴───────────┴───────────┴───────────┘

Some more aggregation examples:

# Calculate monthly minimum temperature
monthly_min_temp = ts.aggregate(Period.of_months(1), Min, "temperature")
print(monthly_min_temp)

# Calculate monthly maximum temperature
monthly_max_temp = ts.aggregate(Period.of_months(1), "max", "temperature")
print(monthly_max_temp)

# Use it with other periods
daily_mean_temp = ts.aggregate(Period.of_days(1), Mean(), "temperature")
print(daily_mean_temp)

annual_max_temp = ts.aggregate(Period.of_years(1), Max, "temperature")
print(annual_max_temp)

shape: (12, 6)
┌────────────────┬────────────────┬────────────────┬───────────────┬───────────────┬───────────────┐
│ timestamp      ┆ min_temperatur ┆ timestamp_of_m ┆ count_tempera ┆ expected_coun ┆ valid_tempera │
│ ---            ┆ e              ┆ in_temperature ┆ ture          ┆ t_timestamp   ┆ ture          │
│ datetime[μs]   ┆ ---            ┆ ---            ┆ ---           ┆ ---           ┆ ---           │
│                ┆ f64            ┆ datetime[μs]   ┆ u32           ┆ i64           ┆ bool          │
╞════════════════╪════════════════╪════════════════╪═══════════════╪═══════════════╪═══════════════╡
│ 2023-01-01     ┆ 9.573338       ┆ 2023-01-01     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 18:21:00       ┆               ┆               ┆               │
│ 2023-02-01     ┆ 14.778971      ┆ 2023-02-01     ┆ 40320         ┆ 40320         ┆ true          │
│ 00:00:00       ┆                ┆ 17:22:00       ┆               ┆               ┆               │
│ 2023-03-01     ┆ 18.099118      ┆ 2023-03-02     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 17:43:00       ┆               ┆               ┆               │
│ 2023-04-01     ┆ 18.384095      ┆ 2023-04-30     ┆ 43200         ┆ 43200         ┆ true          │
│ 00:00:00       ┆                ┆ 17:09:00       ┆               ┆               ┆               │
│ 2023-05-01     ┆ 14.498269      ┆ 2023-05-31     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 17:55:00       ┆               ┆               ┆               │
│ …              ┆ …              ┆ …              ┆ …             ┆ …             ┆ …             │
│ 2023-08-01     ┆ 0.838074       ┆ 2023-08-31     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 18:40:00       ┆               ┆               ┆               │
│ 2023-09-01     ┆ -0.572529      ┆ 2023-09-28     ┆ 43200         ┆ 43200         ┆ true          │
│ 00:00:00       ┆                ┆ 18:17:00       ┆               ┆               ┆               │
│ 2023-10-01     ┆ -0.562019      ┆ 2023-10-03     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 17:37:00       ┆               ┆               ┆               │
│ 2023-11-01     ┆ 0.875222       ┆ 2023-11-01     ┆ 43200         ┆ 43200         ┆ true          │
│ 00:00:00       ┆                ┆ 18:14:00       ┆               ┆               ┆               │
│ 2023-12-01     ┆ 4.600979       ┆ 2023-12-01     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 18:22:00       ┆               ┆               ┆               │
└────────────────┴────────────────┴────────────────┴───────────────┴───────────────┴───────────────┘
shape: (12, 6)
┌────────────────┬────────────────┬────────────────┬───────────────┬───────────────┬───────────────┐
│ timestamp      ┆ max_temperatur ┆ timestamp_of_m ┆ count_tempera ┆ expected_coun ┆ valid_tempera │
│ ---            ┆ e              ┆ ax_temperature ┆ ture          ┆ t_timestamp   ┆ ture          │
│ datetime[μs]   ┆ ---            ┆ ---            ┆ ---           ┆ ---           ┆ ---           │
│                ┆ f64            ┆ datetime[μs]   ┆ u32           ┆ i64           ┆ bool          │
╞════════════════╪════════════════╪════════════════╪═══════════════╪═══════════════╪═══════════════╡
│ 2023-01-01     ┆ 25.551551      ┆ 2023-01-31     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 06:39:00       ┆               ┆               ┆               │
│ 2023-02-01     ┆ 28.81754       ┆ 2023-02-28     ┆ 40320         ┆ 40320         ┆ true          │
│ 00:00:00       ┆                ┆ 05:32:00       ┆               ┆               ┆               │
│ 2023-03-01     ┆ 30.577439      ┆ 2023-03-25     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 06:10:00       ┆               ┆               ┆               │
│ 2023-04-01     ┆ 30.624377      ┆ 2023-04-03     ┆ 43200         ┆ 43200         ┆ true          │
│ 00:00:00       ┆                ┆ 05:42:00       ┆               ┆               ┆               │
│ 2023-05-01     ┆ 29.24943       ┆ 2023-05-01     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 06:36:00       ┆               ┆               ┆               │
│ …              ┆ …              ┆ …              ┆ …             ┆ …             ┆ …             │
│ 2023-08-01     ┆ 15.550731      ┆ 2023-08-01     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 05:44:00       ┆               ┆               ┆               │
│ 2023-09-01     ┆ 11.939817      ┆ 2023-09-02     ┆ 43200         ┆ 43200         ┆ true          │
│ 00:00:00       ┆                ┆ 06:17:00       ┆               ┆               ┆               │
│ 2023-10-01     ┆ 11.756357      ┆ 2023-10-30     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 06:19:00       ┆               ┆               ┆               │
│ 2023-11-01     ┆ 15.295177      ┆ 2023-11-30     ┆ 43200         ┆ 43200         ┆ true          │
│ 00:00:00       ┆                ┆ 06:05:00       ┆               ┆               ┆               │
│ 2023-12-01     ┆ 20.211176      ┆ 2023-12-31     ┆ 44640         ┆ 44640         ┆ true          │
│ 00:00:00       ┆                ┆ 06:29:00       ┆               ┆               ┆               │
└────────────────┴────────────────┴────────────────┴───────────────┴───────────────┴───────────────┘
shape: (365, 5)
┌──────────────┬──────────────────┬───────────────────┬────────────────────────┬───────────────────┐
│ timestamp    ┆ mean_temperature ┆ count_temperature ┆ expected_count_timesta ┆ valid_temperature │
│ ---          ┆ ---              ┆ ---               ┆ mp                     ┆ ---               │
│ datetime[μs] ┆ f64              ┆ u32               ┆ ---                    ┆ bool              │
│              ┆                  ┆                   ┆ i32                    ┆                   │
╞══════════════╪══════════════════╪═══════════════════╪════════════════════════╪═══════════════════╡
│ 2023-01-01   ┆ 15.093978        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-01-02   ┆ 15.263434        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-01-03   ┆ 15.422625        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-01-04   ┆ 15.596555        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-01-05   ┆ 15.766958        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ …            ┆ …                ┆ …                 ┆ …                      ┆ …                 │
│ 2023-12-27   ┆ 14.225196        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-28   ┆ 14.392217        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-29   ┆ 14.565449        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-30   ┆ 14.740761        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
│ 2023-12-31   ┆ 14.907083        ┆ 1440              ┆ 1440                   ┆ true              │
│ 00:00:00     ┆                  ┆                   ┆                        ┆                   │
└──────────────┴──────────────────┴───────────────────┴────────────────────────┴───────────────────┘
shape: (1, 6)
┌────────────────┬────────────────┬────────────────┬───────────────┬───────────────┬───────────────┐
│ timestamp      ┆ max_temperatur ┆ timestamp_of_m ┆ count_tempera ┆ expected_coun ┆ valid_tempera │
│ ---            ┆ e              ┆ ax_temperature ┆ ture          ┆ t_timestamp   ┆ ture          │
│ datetime[μs]   ┆ ---            ┆ ---            ┆ ---           ┆ ---           ┆ ---           │
│                ┆ f64            ┆ datetime[μs]   ┆ u32           ┆ i64           ┆ bool          │
╞════════════════╪════════════════╪════════════════╪═══════════════╪═══════════════╪═══════════════╡
│ 2023-01-01     ┆ 30.624377      ┆ 2023-04-03     ┆ 525600        ┆ 525600        ┆ true          │
│ 00:00:00       ┆                ┆ 05:42:00       ┆               ┆               ┆               │
└────────────────┴────────────────┴────────────────┴───────────────┴───────────────┴───────────────┘

For more details on aggregation, see the dedicated <no title> guide.

Best Practices

Always specify resolution and periodicity when creating a TimeSeries to ensure proper validation
Use appropriate column types: - Use data columns for core measurements - Use supplementary columns for metadata or context - Use flag columns for quality control
Define relationships between related columns
Add metadata to enhance understanding of your data

Next Steps

Now that you understand the basics of the TimeSeries class, explore:

Working with Periods - Learn more about working with time periods
<no title> - Dive deeper into aggregation capabilities
flagging - Master the flagging control system
column_relationships - Understand column relationships in detail