Rolling Aggregation¶
Sliding windows over your time series, with the same robust completeness tracking.
What is rolling aggregation?¶
Rolling (or sliding window) aggregation computes a summary statistic for each timestamp using the observations in a fixed-size window around that timestamp. Unlike Aggregation, which reduces the resolution of the data (e.g., 15-minute values aggregated to daily values), rolling aggregation preserves the original timestamps and resolution - the output has the same number of rows as the input.
Property |
||
|---|---|---|
Output rows |
One per aggregation period |
Same as input (one per observation) |
Output resolution |
Aggregation period |
Original resolution |
Timestamps |
Period labels (e.g., midnight each day) |
Original timestamps preserved |
Typical use |
Daily/monthly summaries |
Smoothing, rolling statistics |
One-liner¶
To calculate a 3-hour rolling mean of flow data:
tf.rolling_aggregate("PT3H", "mean", "flow")
That’s it, a single line with clear intent: “I want a 3 hourly rolling mean of my flow data.”
All aggregation functions supported by aggregate()
are equally supported here.
In more detail¶
The rolling_aggregate() method is the entry point for performing rolling aggregations with
timeseries data in Time-Stream. It works similarly to the standard aggregate() method,
utilising the same Polars performance with TimeFrame semantics.
Example
Using the 15-minute flow example data:
tf_rolling = tf.rolling_aggregate("PT3H", "mean", "flow")
shape: (110_977, 5)
┌─────────────────────┬───────────┬────────────┬─────────────────────┬────────────┐
│ time ┆ mean_flow ┆ count_flow ┆ expected_count_time ┆ valid_flow │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ f64 ┆ u32 ┆ u32 ┆ bool │
╞═════════════════════╪═══════════╪════════════╪═════════════════════╪════════════╡
│ 2020-09-01 00:00:00 ┆ 92.860538 ┆ 1 ┆ 12 ┆ true │
│ 2020-09-01 00:15:00 ┆ 92.860538 ┆ 1 ┆ 12 ┆ true │
│ 2020-09-01 00:30:00 ┆ 95.48182 ┆ 2 ┆ 12 ┆ true │
│ 2020-09-01 00:45:00 ┆ 95.48182 ┆ 2 ┆ 12 ┆ true │
│ 2020-09-01 01:00:00 ┆ 95.48182 ┆ 2 ┆ 12 ┆ true │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 2023-10-31 23:00:00 ┆ 81.813845 ┆ 11 ┆ 12 ┆ true │
│ 2023-10-31 23:15:00 ┆ 82.180762 ┆ 11 ┆ 12 ┆ true │
│ 2023-10-31 23:30:00 ┆ 82.526896 ┆ 10 ┆ 12 ┆ true │
│ 2023-10-31 23:45:00 ┆ 82.526896 ┆ 10 ┆ 12 ┆ true │
│ 2023-11-01 00:00:00 ┆ 82.81015 ┆ 10 ┆ 12 ┆ true │
└─────────────────────┴───────────┴────────────┴─────────────────────┴────────────┘
There are some specific parameters that are provided to the rolling aggregation method, explained below.
Note
The resulting TimeFrame contains the same data completeness information as the standard
aggregation - this time based on completeness of each rolling window. See Data completeness below.
Window size¶
The size of the time window you want to do a rolling aggregation over. This can be specified as an ISO-8601 duration string, and can be combined with the window alignment (see below) to fine-tune the rolling window.
Common examples:
"P1D"– 1 day window"PT3H"– 3-hour window"PT15M"– 15-minute window
Window alignment¶
The alignment parameter controls where the rolling window is positioned relative to each timestamp.
Alignment |
Window |
Edge effects |
|---|---|---|
|
|
At the start of the series (first rows see partial windows) |
|
|
At the end of the series (last rows see partial windows) |
|
|
At both ends of the series |
Trailing (default)¶
The window looks backward from each timestamp. Each output value summarises the current observation and the preceding ones within the window. This is the conventional default for rolling statistics.
# 3-hour trailing mean: each output reflects the current hour and the two hours before it.
tf_trailing = tf.rolling_aggregate("PT3H", "mean", "flow")
# equivalently:
tf_trailing = tf.rolling_aggregate("PT3H", "mean", "flow", alignment="trailing")
For hourly data with a 3-hour trailing window, the first output row has only 1 observation in its window, the second has 2, and all subsequent rows have 3 (the full window).
Leading¶
The window looks forward from each timestamp. Each output value summarises the current observation and the following ones within the window.
# 3-hour leading mean: each output reflects the current hour and the two hours after it.
tf_leading = tf.rolling_aggregate("PT3H", "mean", "flow", alignment="leading")
For hourly data with a 3-hour leading window, the last two output rows see partial windows.
Center¶
The window is centered on each timestamp, looking equally backward and forward. Edge effects (where the window contains partial data) appear at both the start and end of the series.
# 3-hour centered mean: each output reflects the 1.5 hours before and after the current timestamp.
tf_center = tf.rolling_aggregate("PT3H", "mean", "flow", alignment="center")
Note
CENTER alignment is not supported for calendar-based window sizes (months, years) because
they have variable length and cannot be halved to a fixed offset.
Data completeness¶
Rolling aggregation tracks data completeness in the same way as standard aggregation. The output always includes:
expected_count_<time>: the number of observations expected in a full window.count_<column>: the number of observations actually present.valid_<column>: whether the result passes the completeness check.
When the window extends beyond the edges of the data (edge effects), the actual count will be less than the expected
count. You can use missing_criteria to flag or filter these rows.
For example, to mark a result as invalid unless the window contains at least 3 observations:
tf_rolling = tf.rolling_aggregate(
"PT3H",
"mean",
"flow",
missing_criteria=("available", 3),
)
shape: (110_977, 5)
┌─────────────────────┬───────────┬────────────┬─────────────────────┬────────────┐
│ time ┆ mean_flow ┆ count_flow ┆ expected_count_time ┆ valid_flow │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ f64 ┆ u32 ┆ u32 ┆ bool │
╞═════════════════════╪═══════════╪════════════╪═════════════════════╪════════════╡
│ 2020-09-01 00:00:00 ┆ 92.860538 ┆ 1 ┆ 12 ┆ false │
│ 2020-09-01 00:15:00 ┆ 92.860538 ┆ 1 ┆ 12 ┆ false │
│ 2020-09-01 00:30:00 ┆ 95.48182 ┆ 2 ┆ 12 ┆ false │
│ 2020-09-01 00:45:00 ┆ 95.48182 ┆ 2 ┆ 12 ┆ false │
│ 2020-09-01 01:00:00 ┆ 95.48182 ┆ 2 ┆ 12 ┆ false │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 2023-10-31 23:00:00 ┆ 81.813845 ┆ 11 ┆ 12 ┆ true │
│ 2023-10-31 23:15:00 ┆ 82.180762 ┆ 11 ┆ 12 ┆ true │
│ 2023-10-31 23:30:00 ┆ 82.526896 ┆ 10 ┆ 12 ┆ true │
│ 2023-10-31 23:45:00 ┆ 82.526896 ┆ 10 ┆ 12 ┆ true │
│ 2023-11-01 00:00:00 ┆ 82.81015 ┆ 10 ┆ 12 ┆ true │
└─────────────────────┴───────────┴────────────┴─────────────────────┴────────────┘
Rows where count_flow < 3 have valid_flow = false, as visible at the start of the series where the trailing
window has not yet accumulated enough observations.
See Missing data criteria in the aggregation guide for the full list of available criteria.