Quick Start
This quick start guide will walk you through creating and working with time series data using the Time Series Package.
Creating a Time Series
First, import the necessary modules:
from datetime import datetime, timedelta
import polars as pl
from time_stream import TimeSeries, Period
Create a simple dataframe with a datetime column and a value column:
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(31)]
values = [10, 12, 15, 14, 13, 17, 19, 21, 18, 17,
5, 9, 0, 1, 5, 11, 12, 10, 21, 16,
10, 11, 8, 6, 14, 17, 12, 10, 10, 8, 5]
df = pl.DataFrame({
"timestamp": dates,
"temperature": values
})
And create a Time Series object:
# Specify the resolution and periodicity of the data
resolution = Period.of_days(1)
periodicity = Period.of_days(1)
# Build the time series object
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=resolution,
periodicity=periodicity
)
shape: (31, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 10 │
│ 2023-01-02 00:00:00 ┆ 12 │
│ 2023-01-03 00:00:00 ┆ 15 │
│ 2023-01-04 00:00:00 ┆ 14 │
│ 2023-01-05 00:00:00 ┆ 13 │
│ … ┆ … │
│ 2023-01-27 00:00:00 ┆ 12 │
│ 2023-01-28 00:00:00 ┆ 10 │
│ 2023-01-29 00:00:00 ┆ 10 │
│ 2023-01-30 00:00:00 ┆ 8 │
│ 2023-01-31 00:00:00 ┆ 5 │
└─────────────────────┴─────────────┘
Note
More information about resolution and periodicity can be found in the concepts page.
Aggregating Data
Aggregating time series data is straightforward:
from time_stream.aggregation import Mean
# Aggregate temperature data by month
monthly_period = Period.of_months(1)
monthly_temp = ts.aggregate(monthly_period, Mean, "temperature")
shape: (1, 5)
┌─────────────────────┬──────────────────┬───────────────────┬──────────────────────────┬───────┐
│ timestamp ┆ mean_temperature ┆ count_temperature ┆ expected_count_timestamp ┆ valid │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ u32 ┆ i64 ┆ bool │
╞═════════════════════╪══════════════════╪═══════════════════╪══════════════════════════╪═══════╡
│ 2023-01-01 00:00:00 ┆ 11.516129 ┆ 31 ┆ 31 ┆ true │
└─────────────────────┴──────────────────┴───────────────────┴──────────────────────────┴───────┘
By default, this will aggregate the data regardless of how many missing data points there are in the period. For example, if we have two 1 minute data points on a given day, doing a mean aggregation would return the mean of those 2 values, even though we’d expect 1440 values for a full day.
You can specify criteria for a valid aggregation using the missing_criteria
argument.
{"missing": 30}
Aggregation is valid if there are no more than 30 values missing in the period.{"available": 30}
Aggregation is valid if there are at least 30 input values in the period.{"percent": 30}
Aggregation is valid if the data in the period is at least 30 percent complete (accepts integers or floats).
If no missing_criteria
are specified, the valid
column will be set to True
.
Adding Flags for Quality Control
The Time Series object contains functionality for adding data “flags” that provide detail to specific data points. One example usage of this is to provide information about what quality control has been carried out on the data.
Create a “flagging system” as dictionary and provide it to the Time Series initialisation:
quality_flags = {"MISSING": 1, "SUSPICIOUS": 2, "ESTIMATED": 4}
ts = TimeSeries(
df=df,
time_name="timestamp",
flag_systems={"quality": quality_flags}
)
print(ts.flag_systems)
{'quality': <quality (MISSING=1, SUSPICIOUS=2, ESTIMATED=4)>}
Now we can use this flagging system to add information to our data points:
# Create a flag column
ts.init_flag_column("quality", "temperature_qc_flags")
# Flag suspicious values
ts.add_flag("temperature_qc_flags", "SUSPICIOUS", pl.col("temperature") > 15)
shape: (31, 3)
┌─────────────────────┬─────────────┬──────────────────────┐
│ timestamp ┆ temperature ┆ temperature_qc_flags │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 10 ┆ 0 │
│ 2023-01-02 00:00:00 ┆ 12 ┆ 0 │
│ 2023-01-03 00:00:00 ┆ 15 ┆ 0 │
│ 2023-01-04 00:00:00 ┆ 14 ┆ 0 │
│ 2023-01-05 00:00:00 ┆ 13 ┆ 0 │
│ 2023-01-06 00:00:00 ┆ 17 ┆ 2 │
│ 2023-01-07 00:00:00 ┆ 19 ┆ 2 │
│ 2023-01-08 00:00:00 ┆ 21 ┆ 2 │
│ 2023-01-09 00:00:00 ┆ 18 ┆ 2 │
│ 2023-01-10 00:00:00 ┆ 17 ┆ 2 │
│ 2023-01-11 00:00:00 ┆ 5 ┆ 0 │
│ 2023-01-12 00:00:00 ┆ 9 ┆ 0 │
│ 2023-01-13 00:00:00 ┆ 0 ┆ 0 │
│ 2023-01-14 00:00:00 ┆ 1 ┆ 0 │
│ 2023-01-15 00:00:00 ┆ 5 ┆ 0 │
│ 2023-01-16 00:00:00 ┆ 11 ┆ 0 │
│ 2023-01-17 00:00:00 ┆ 12 ┆ 0 │
│ 2023-01-18 00:00:00 ┆ 10 ┆ 0 │
│ 2023-01-19 00:00:00 ┆ 21 ┆ 2 │
│ 2023-01-20 00:00:00 ┆ 16 ┆ 2 │
│ 2023-01-21 00:00:00 ┆ 10 ┆ 0 │
│ 2023-01-22 00:00:00 ┆ 11 ┆ 0 │
│ 2023-01-23 00:00:00 ┆ 8 ┆ 0 │
│ 2023-01-24 00:00:00 ┆ 6 ┆ 0 │
│ 2023-01-25 00:00:00 ┆ 14 ┆ 0 │
│ 2023-01-26 00:00:00 ┆ 17 ┆ 2 │
│ 2023-01-27 00:00:00 ┆ 12 ┆ 0 │
│ 2023-01-28 00:00:00 ┆ 10 ┆ 0 │
│ 2023-01-29 00:00:00 ┆ 10 ┆ 0 │
│ 2023-01-30 00:00:00 ┆ 8 ┆ 0 │
│ 2023-01-31 00:00:00 ┆ 5 ┆ 0 │
└─────────────────────┴─────────────┴──────────────────────┘