TimeSeries Basics
This guide covers the fundamentals of working with the TimeSeries
class, the core data structure in the
Time Series Package.
Creating a TimeSeries
A TimeSeries
wraps a Polars DataFrame and adds specialized functionality for time series operations.
Basic Creation
from datetime import datetime, timedelta
import polars as pl
from time_stream import TimeSeries, Period
from time_stream.aggregation import Mean, Min, Max
# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 19, 26, 24, 26, 28, 30, 31, 29]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation
})
# Create a simple TimeSeries
ts = TimeSeries(
df=df,
time_name="timestamp" # Specify which column contains the primary datetime values
)
Without specifying resolution and periodicity, the default initialisation sets these properties to “1 microsecond”, to account for any set of datetime values:
print(ts.resolution)
print(ts.periodicity)
PT0.000001S
PT0.000001S
With Resolution and Periodicity
Although the default of 1 microsecond will account for any datetime values, for more control over certain time series functionality it is important to specify the actual resolution and periodicity if known:
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution = Period.of_days(1), # Each timestamp is at day precision
periodicity = Period.of_days(1) # Data points are spaced 1 day apart
)
print(ts.resolution)
print(ts.periodicity)
P1D
P1D
With Metadata
TimeSeries
can be initialised with metadata to describe your data. This can be metadata about the time series as
a whole, or about the individual columns.
Keeping the metadata and the data together in one object like this can help simplify downstream processes, such as derivation functions, running infilling routines, plotting data, etc.
# TimeSeries metadata
ts_metadata = {
"location": "River Thames",
"elevation": 100,
"station_id": "ABC123"
}
# Column metadata
col_metadata = {
"temperature": {
"units": "°C",
"description": "Average temperature"
},
"precipitation": {
"units": "mm",
"description": "Precipitation amount",
"instrument_type": "Tipping bucket"
# Note that metadata keys are not required to be the same for all columns
}
}
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
metadata=ts_metadata,
column_metadata=col_metadata
)
Time series level metadata can be accessed via the .metadata()
method:
print("All metadata: ", ts.metadata())
print("Specific keys: ", ts.metadata(["location", "elevation"]))
All metadata: {'location': 'River Thames', 'elevation': 100, 'station_id': 'ABC123'}
Specific keys: {'location': 'River Thames', 'elevation': 100}
Column level metadata can be accessed via attributes on the column itself, or via the column.metadata()
method:
# Access via attributes:
print(ts.temperature.units)
print(ts.precipitation.description)
# Or via metadata method:
print(ts.precipitation.metadata(["units", "instrument_type"]))
°C
Precipitation amount
{'units': 'mm', 'instrument_type': 'Tipping bucket'}
Time Validation
The TimeSeries
performs validation on timestamps:
Resolution Validation
The resolution defines how precise the timestamps should be:
# This will raise a warning because some timestamps don't align to midnight (00:00:00),
# as required by daily resolution
timestamps = [
datetime(2023, 1, 1, 0, 0, 0), # Aligned to midnight
datetime(2023, 1, 2, 0, 0, 0), # Aligned to midnight
datetime(2023, 1, 3, 12, 0, 0), # Not aligned (noon)
]
df = pl.DataFrame({"timestamp": timestamps, "value": [1, 2, 3]})
# This will raise a UserWarning about resolution alignment
try:
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1)
)
except UserWarning as w:
print(f"Warning: {w}")
Warning: Values in time field: "timestamp" are not aligned to resolution: P1D
Periodicity Validation
The periodicity defines how frequently data points should appear:
# This will raise a warning because we have two points within the same day
timestamps = [
datetime(2023, 1, 1, 0, 0, 0),
datetime(2023, 1, 1, 12, 0, 0), # Same day as above
datetime(2023, 1, 2, 0, 0, 0),
]
df = pl.DataFrame({"timestamp": timestamps, "value": [1, 2, 3]})
# This will raise a UserWarning about periodicity
try:
ts = TimeSeries(
df=df,
time_name="timestamp",
periodicity=Period.of_days(1)
)
except UserWarning as w:
print(f"Warning: {w}")
Warning: Values in time field: "timestamp" do not conform to periodicity: P1D
Duplicate Detection
The TimeSeries
automatically checks for rows with duplicates in the specified time column. You have control over
what the model should do when it detects rows with duplicate time values. Consider this DataFrame with duplicate
time values:
# Create sample data
dates = [
datetime(2023, 1, 1),
datetime(2023, 1, 1),
datetime(2023, 2, 1),
datetime(2023, 3, 1),
datetime(2023, 4, 1),
datetime(2023, 5, 1),
datetime(2023, 6, 1),
datetime(2023, 6, 1),
datetime(2023, 6, 1),
datetime(2023, 7, 1),
]
temperatures = [20, None, 19, 26, 24, 26, 28, 30, None, 29]
precipitation = [None, 0, 5, 10, 2, 0, None, 3, 4, 0]
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation
})
print(df)
shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ null │
│ 2023-01-01 00:00:00 ┆ null ┆ 0 │
│ 2023-02-01 00:00:00 ┆ 19 ┆ 5 │
│ 2023-03-01 00:00:00 ┆ 26 ┆ 10 │
│ 2023-04-01 00:00:00 ┆ 24 ┆ 2 │
│ 2023-05-01 00:00:00 ┆ 26 ┆ 0 │
│ 2023-06-01 00:00:00 ┆ 28 ┆ null │
│ 2023-06-01 00:00:00 ┆ 30 ┆ 3 │
│ 2023-06-01 00:00:00 ┆ null ┆ 4 │
│ 2023-07-01 00:00:00 ┆ 29 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
The following strategies are available to use with the on_duplicate
argument:
1. Error Strategy (Default): on_duplicate="error"
Raises an error when duplicate rows are found. This is the default behavior to ensure data integrity.
# Raises an error if duplicate timestamps exist. This is the default if `on_duplicate` is not specified.
try:
ts = TimeSeries(
df=df,
time_name="timestamp",
on_duplicates="error"
)
except ValueError as w:
print(f"Warning: {w}")
Warning: Duplicate time values found: [datetime.datetime(2023, 1, 1, 0, 0), datetime.datetime(2023, 6, 1, 0, 0)]
2. Keep First Strategy: on_duplicate="keep_first"
For a given group of rows with the same time value, keeps only the first row and discards the others.
# Keeps the first row found in groups of duplicate rows
ts = TimeSeries(
df=df,
time_name="timestamp",
on_duplicates="keep_first"
)
print(ts)
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ null │
│ 2023-02-01 00:00:00 ┆ 19 ┆ 5 │
│ 2023-03-01 00:00:00 ┆ 26 ┆ 10 │
│ 2023-04-01 00:00:00 ┆ 24 ┆ 2 │
│ 2023-05-01 00:00:00 ┆ 26 ┆ 0 │
│ 2023-06-01 00:00:00 ┆ 28 ┆ null │
│ 2023-07-01 00:00:00 ┆ 29 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
3. Keep Last Strategy: on_duplicate="keep_last"
For a given group of rows with the same time value, keeps only the last row and discards the others.
# Keeps the last row found in groups of duplicate rows
ts = TimeSeries(
df=df,
time_name="timestamp",
on_duplicates="keep_last"
)
print(ts)
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ null ┆ 0 │
│ 2023-02-01 00:00:00 ┆ 19 ┆ 5 │
│ 2023-03-01 00:00:00 ┆ 26 ┆ 10 │
│ 2023-04-01 00:00:00 ┆ 24 ┆ 2 │
│ 2023-05-01 00:00:00 ┆ 26 ┆ 0 │
│ 2023-06-01 00:00:00 ┆ null ┆ 4 │
│ 2023-07-01 00:00:00 ┆ 29 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
4. Drop Strategy: on_duplicate="drop"
Removes all rows that have duplicate timestamps. This strategy is appropriate when you are unsure of the integrity of duplicate rows and only want unique, unambiguous data.
# Drops all duplicate rows
ts = TimeSeries(
df=df,
time_name="timestamp",
on_duplicates="drop"
)
print(ts)
shape: (5, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-02-01 00:00:00 ┆ 19 ┆ 5 │
│ 2023-03-01 00:00:00 ┆ 26 ┆ 10 │
│ 2023-04-01 00:00:00 ┆ 24 ┆ 2 │
│ 2023-05-01 00:00:00 ┆ 26 ┆ 0 │
│ 2023-07-01 00:00:00 ┆ 29 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
5. Merge Strategy: on_duplicate="merge"
For a given group of rows with the same time value, performs a merge of all rows. This combines values with a top-down approach that preserves the first non-null value for each column.
# Merges groups of duplicate rows
ts = TimeSeries(
df=df,
time_name="timestamp",
on_duplicates="merge"
)
print(ts)
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ 0 │
│ 2023-02-01 00:00:00 ┆ 19 ┆ 5 │
│ 2023-03-01 00:00:00 ┆ 26 ┆ 10 │
│ 2023-04-01 00:00:00 ┆ 24 ┆ 2 │
│ 2023-05-01 00:00:00 ┆ 26 ┆ 0 │
│ 2023-06-01 00:00:00 ┆ 28 ┆ 3 │
│ 2023-07-01 00:00:00 ┆ 29 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
Missing Rows
The TimeSeries
class provides functionality to automatically pad missing time points within a time series,
ensuring complete temporal coverage without gaps. Consider daily data with some missing days:
# Create sample data with gaps
dates = [
datetime(2023, 1, 1),
datetime(2023, 1, 3),
datetime(2023, 1, 4),
datetime(2023, 1, 5),
datetime(2023, 1, 7),
]
temperatures = [20, 19, 26, 24, 26]
precipitation = [0, 5, 10, 2, 0]
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation
})
print(df)
shape: (5, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ 0 │
│ 2023-01-03 00:00:00 ┆ 19 ┆ 5 │
│ 2023-01-04 00:00:00 ┆ 26 ┆ 10 │
│ 2023-01-05 00:00:00 ┆ 24 ┆ 2 │
│ 2023-01-07 00:00:00 ┆ 26 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
The padding is controlled by the pad parameter during TimeSeries
initialisation:
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
pad=True # Enable padding
)
print(ts)
shape: (7, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ 0 │
│ 2023-01-02 00:00:00 ┆ null ┆ null │
│ 2023-01-03 00:00:00 ┆ 19 ┆ 5 │
│ 2023-01-04 00:00:00 ┆ 26 ┆ 10 │
│ 2023-01-05 00:00:00 ┆ 24 ┆ 2 │
│ 2023-01-06 00:00:00 ┆ null ┆ null │
│ 2023-01-07 00:00:00 ┆ 26 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
The padding functionality respects the resolution and periodicity of your data. The above example was simple, with
missing daily data being filled in the datetime of the missing days. It gets more complex when you are dealing with
time series that may have different resolution to periodicities. For example, consider a time series that is the
“annual maximum of 15-minute river flow data in a given UK water-year”. The resolution would be 15-minutes,
however the periodicity would be P1Y+9MT9H
, because a water-year starts at 9am on the 1st October:
# Create sample water-year data with gaps
dates = [
datetime(2023, 5, 16, 10, 15, 0), # Water year 2022-2023
datetime(2023, 10, 3, 19, 30, 0), # Water year 2023-2024
# Missing water year 2024-2025
datetime(2025, 11, 30, 0, 0, 0), # Water year 2025-2026
]
max_flow = [20, 25, 30]
df = pl.DataFrame({
"timestamp": dates,
"max_flow": max_flow,
})
print(df)
shape: (3, 2)
┌─────────────────────┬──────────┐
│ timestamp ┆ max_flow │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════════╡
│ 2023-05-16 10:15:00 ┆ 20 │
│ 2023-10-03 19:30:00 ┆ 25 │
│ 2025-11-30 00:00:00 ┆ 30 │
└─────────────────────┴──────────┘
The padding takes this resolution and periodicity into account, and sets missing rows to the start of the period:
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_minutes(15),
periodicity=Period.of_years(1).with_month_offset(9).with_hour_offset(9),
pad=True # Enable padding
)
print(ts)
shape: (4, 2)
┌─────────────────────┬──────────┐
│ timestamp ┆ max_flow │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════════╡
│ 2023-05-16 10:15:00 ┆ 20 │
│ 2023-10-03 19:30:00 ┆ 25 │
│ 2024-10-01 09:00:00 ┆ null │
│ 2025-11-30 00:00:00 ┆ 30 │
└─────────────────────┴──────────┘
Warning
It is very important to set the periodicity
and resolution
parameters if you want to pad your data.
Otherwise, the padding process will use the default periods of 1 microsecond, and try to pad your entire dataset
with microsecond data, which will almost definitely result in a memory error!
Accessing Data
There are multiple ways to access data from a TimeSeries
:
Accessing the DataFrame
# Get the full DataFrame
df = ts.df
This gives the underlying Polars DataFrame, with which you can carry out normal Polars functionality:
# Select specific columns from the DataFrame
temp_precip_df = ts.df.select(["timestamp", "temperature", "precipitation"])
# Filter the DataFrame
rainy_days_df = ts.df.filter(pl.col("precipitation") > 0)
Accessing Columns
The TimeSeries
class gives other ways to access data within the time series, whilst maintaining the core link to
primary datetime column.
# Access column as a TimeSeriesColumn object
temperature_col = ts.temperature
print("Type temperature_col: ", type(temperature_col))
# Get the underlying data from a column
temperature_data = ts.temperature.data
print("Type temperature_data: ", type(temperature_data))
# Access column properties
temperature_units = ts.temperature.units
print("Temperature units: ", temperature_units)
# Get column as a TimeSeries
temperature_ts = ts["temperature"]
print("Type temperature_ts: ", type(temperature_ts))
print(temperature_ts)
# Select multiple columns as a TimeSeries
selected_ts = ts.select(["temperature", "precipitation"])
# or
selected_ts = ts[["temperature", "precipitation"]]
print("Type selected_ts: ", type(selected_ts))
print(selected_ts)
Type temperature_col: <class 'time_stream.columns.DataColumn'>
Type temperature_data: <class 'polars.dataframe.frame.DataFrame'>
Temperature units: °C
Type temperature_ts: <class 'time_stream.base.TimeSeries'>
shape: (10, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 20 │
│ 2023-01-02 00:00:00 ┆ 21 │
│ 2023-01-03 00:00:00 ┆ 19 │
│ 2023-01-04 00:00:00 ┆ 26 │
│ 2023-01-05 00:00:00 ┆ 24 │
│ 2023-01-06 00:00:00 ┆ 26 │
│ 2023-01-07 00:00:00 ┆ 28 │
│ 2023-01-08 00:00:00 ┆ 30 │
│ 2023-01-09 00:00:00 ┆ 31 │
│ 2023-01-10 00:00:00 ┆ 29 │
└─────────────────────┴─────────────┘
Type selected_ts: <class 'time_stream.base.TimeSeries'>
shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ precipitation │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ 0 │
│ 2023-01-02 00:00:00 ┆ 21 ┆ 0 │
│ 2023-01-03 00:00:00 ┆ 19 ┆ 5 │
│ 2023-01-04 00:00:00 ┆ 26 ┆ 10 │
│ 2023-01-05 00:00:00 ┆ 24 ┆ 2 │
│ 2023-01-06 00:00:00 ┆ 26 ┆ 0 │
│ 2023-01-07 00:00:00 ┆ 28 ┆ 0 │
│ 2023-01-08 00:00:00 ┆ 30 ┆ 3 │
│ 2023-01-09 00:00:00 ┆ 31 ┆ 1 │
│ 2023-01-10 00:00:00 ┆ 29 ┆ 0 │
└─────────────────────┴─────────────┴───────────────┘
Updating a TimeSeries
You can update the underlying DataFrame (while preserving column settings), as long as the primary datetime column remains unchanged.
# Update the DataFrame by adding a new column
ts.df = ts.df.with_columns(
(pl.col("temperature") * 1.8 + 32).alias("temperature_f")
)
# The new column will be available as a DataColumn
print("New temperature column in fahrenheit: ", ts[["temperature", "temperature_f"]])
New temperature column in fahrenheit: shape: (10, 3)
┌─────────────────────┬─────────────┬───────────────┐
│ timestamp ┆ temperature ┆ temperature_f │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ f64 │
╞═════════════════════╪═════════════╪═══════════════╡
│ 2023-01-01 00:00:00 ┆ 20 ┆ 68.0 │
│ 2023-01-02 00:00:00 ┆ 21 ┆ 69.8 │
│ 2023-01-03 00:00:00 ┆ 19 ┆ 66.2 │
│ 2023-01-04 00:00:00 ┆ 26 ┆ 78.8 │
│ 2023-01-05 00:00:00 ┆ 24 ┆ 75.2 │
│ 2023-01-06 00:00:00 ┆ 26 ┆ 78.8 │
│ 2023-01-07 00:00:00 ┆ 28 ┆ 82.4 │
│ 2023-01-08 00:00:00 ┆ 30 ┆ 86.0 │
│ 2023-01-09 00:00:00 ┆ 31 ┆ 87.8 │
│ 2023-01-10 00:00:00 ┆ 29 ┆ 84.2 │
└─────────────────────┴─────────────┴───────────────┘
If an update to the DataFrame results in a change to the primary datetime values, resolution or periodicity, then an error will be raised. A new TimeSeries object should be created.
# Try and update the DataFrame by filtering columns
# (which inherently removes some of the time series)
try:
ts.df = ts.df.filter(pl.col("precipitation") > 0)
except ValueError as w:
print(f"Warning: {w}")
Warning: Time column has mutated.
Working with Columns
The TimeSeries
class provides methods for working with different column types:
Column Types
There are four column types:
Primary Time Column: The datetime column
Data Columns: Regular data columns (default type)
Supplementary Columns: Metadata or contextual information
Flag Columns: Flag markers giving specific information about data points
Creating Supplementary Columns
Supplementary columns can be specified on initialisation of the TimeSeries
object:
# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 1, 26, 24, 26, 28, 41, 51, None]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]
observer_comments = ["", "", "Power cut between 8am and 1pm", "", "", "",
"Agricultural work in adjacent field", "", "", "Tree felling"]
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation,
"observer_comments": observer_comments
})
# Create a TimeSeries
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
supplementary_columns=["observer_comments"]
)
print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
print("Observer comments column: ", ts.observer_comments)
Data columns: {'temperature': DataColumn('temperature'), 'precipitation': DataColumn('precipitation')}
Supplementary columns: {'observer_comments': SupplementaryColumn('observer_comments')}
Observer comments column: shape: (10, 2)
┌─────────────────────┬─────────────────────────────────┐
│ timestamp ┆ observer_comments │
│ --- ┆ --- │
│ datetime[μs] ┆ str │
╞═════════════════════╪═════════════════════════════════╡
│ 2023-01-01 00:00:00 ┆ │
│ 2023-01-02 00:00:00 ┆ │
│ 2023-01-03 00:00:00 ┆ Power cut between 8am and 1pm │
│ 2023-01-04 00:00:00 ┆ │
│ 2023-01-05 00:00:00 ┆ │
│ 2023-01-06 00:00:00 ┆ │
│ 2023-01-07 00:00:00 ┆ Agricultural work in adjacent … │
│ 2023-01-08 00:00:00 ┆ │
│ 2023-01-09 00:00:00 ┆ │
│ 2023-01-10 00:00:00 ┆ Tree felling │
└─────────────────────┴─────────────────────────────────┘
Existing data columns can be converted to be a supplementary column:
# Convert an existing column to supplementary
ts.set_supplementary_column("precipitation")
print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
Data columns: {'temperature': DataColumn('temperature')}
Supplementary columns: {'precipitation': SupplementaryColumn('precipitation'), 'observer_comments': SupplementaryColumn('observer_comments')}
Or a completely new column can be initialised as a supplementary column:
# Add a new supplementary column, with new data
new_data = [12, 15, 6, 12, 10, 14, 19, 17, 16, 13]
ts.init_supplementary_column("battery_voltage", new_data)
print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
print("Battery voltage column: ", ts.battery_voltage)
Data columns: {'temperature': DataColumn('temperature')}
Supplementary columns: {'precipitation': SupplementaryColumn('precipitation'), 'observer_comments': SupplementaryColumn('observer_comments'), 'battery_voltage': SupplementaryColumn('battery_voltage')}
Battery voltage column: shape: (10, 2)
┌─────────────────────┬─────────────────┐
│ timestamp ┆ battery_voltage │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═════════════════╡
│ 2023-01-01 00:00:00 ┆ 12 │
│ 2023-01-02 00:00:00 ┆ 15 │
│ 2023-01-03 00:00:00 ┆ 6 │
│ 2023-01-04 00:00:00 ┆ 12 │
│ 2023-01-05 00:00:00 ┆ 10 │
│ 2023-01-06 00:00:00 ┆ 14 │
│ 2023-01-07 00:00:00 ┆ 19 │
│ 2023-01-08 00:00:00 ┆ 17 │
│ 2023-01-09 00:00:00 ┆ 16 │
│ 2023-01-10 00:00:00 ┆ 13 │
└─────────────────────┴─────────────────┘
Creating Flag Columns
Flag columns are inherently linked to a flag system. The flag system sets out the meanings of values that can be added to the flag column.
If they already exist, flag columns and their associated flag systems can be specified on initialisation of the TimeSeries object:
# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 1, 26, 24, 26, 28, 41, 51, None]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]
temperature_qc_flags = [0, 0, 2, 0, 0, 0, 0, 0, 3, 8]
flag_systems = {"quality_control_checks": {"OUT_OF_RANGE": 1, "SPIKE": 2, "LOW_VOLTAGE": 4, "MISSING": 8}}
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation,
"temperature_qc_flags": temperature_qc_flags,
})
# Create a TimeSeries
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
flag_systems=flag_systems,
flag_columns={"temperature_qc_flags": "quality_control_checks"}
)
print("Data columns: ", ts.data_columns)
print("Flag columns: ", ts.flag_columns)
print("Flag systems: ", ts.flag_systems)
print("Temperature flag column: ", ts.temperature_qc_flags)
Data columns: {'temperature': DataColumn('temperature'), 'precipitation': DataColumn('precipitation')}
Flag columns: {'temperature_qc_flags': FlagColumn('temperature_qc_flags')}
Flag systems: {'quality_control_checks': <quality_control_checks (OUT_OF_RANGE=1, SPIKE=2, LOW_VOLTAGE=4, MISSING=8)>}
Temperature flag column: shape: (10, 2)
┌─────────────────────┬──────────────────────┐
│ timestamp ┆ temperature_qc_flags │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 0 │
│ 2023-01-02 00:00:00 ┆ 0 │
│ 2023-01-03 00:00:00 ┆ 2 │
│ 2023-01-04 00:00:00 ┆ 0 │
│ 2023-01-05 00:00:00 ┆ 0 │
│ 2023-01-06 00:00:00 ┆ 0 │
│ 2023-01-07 00:00:00 ┆ 0 │
│ 2023-01-08 00:00:00 ┆ 0 │
│ 2023-01-09 00:00:00 ┆ 3 │
│ 2023-01-10 00:00:00 ┆ 8 │
└─────────────────────┴──────────────────────┘
Otherwise, flag columns can be initialised dynamically on the TimeSeries object:
# Create sample data
dates = [datetime(2023, 1, 1) + timedelta(days=i) for i in range(10)]
temperatures = [20, 21, 1, 26, 24, 26, 28, 41, 51, None]
precipitation = [0, 0, 5, 10, 2, 0, 0, 3, 1, 0]
flag_systems = {"quality_control_checks": {"OUT_OF_RANGE": 1, "SPIKE": 2, "LOW_VOLTAGE": 4, "MISSING": 8}}
df = pl.DataFrame({
"timestamp": dates,
"temperature": temperatures,
"precipitation": precipitation
})
# Create a TimeSeries
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
flag_systems=flag_systems
)
# Add a flag column for temperature data, which will use the quality_control_checks flag system
ts.init_flag_column("quality_control_checks", "temperature_qc_flags")
Methods are available to add (or remove) flags to a Flag Column:
# Add flags
ts.add_flag("temperature_qc_flags", "OUT_OF_RANGE", pl.col("temperature") > 40)
ts.add_flag("temperature_qc_flags", "MISSING", pl.col("temperature").is_null())
print(ts.temperature_qc_flags)
# Remove a flag
ts.remove_flag("temperature_qc_flags", "OUT_OF_RANGE", pl.col("temperature") <= 45)
print(ts.temperature_qc_flags)
shape: (10, 2)
┌─────────────────────┬──────────────────────┐
│ timestamp ┆ temperature_qc_flags │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 0 │
│ 2023-01-02 00:00:00 ┆ 0 │
│ 2023-01-03 00:00:00 ┆ 0 │
│ 2023-01-04 00:00:00 ┆ 0 │
│ 2023-01-05 00:00:00 ┆ 0 │
│ 2023-01-06 00:00:00 ┆ 0 │
│ 2023-01-07 00:00:00 ┆ 0 │
│ 2023-01-08 00:00:00 ┆ 1 │
│ 2023-01-09 00:00:00 ┆ 1 │
│ 2023-01-10 00:00:00 ┆ 8 │
└─────────────────────┴──────────────────────┘
shape: (10, 2)
┌─────────────────────┬──────────────────────┐
│ timestamp ┆ temperature_qc_flags │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════════════════════╡
│ 2023-01-01 00:00:00 ┆ 0 │
│ 2023-01-02 00:00:00 ┆ 0 │
│ 2023-01-03 00:00:00 ┆ 0 │
│ 2023-01-04 00:00:00 ┆ 0 │
│ 2023-01-05 00:00:00 ┆ 0 │
│ 2023-01-06 00:00:00 ┆ 0 │
│ 2023-01-07 00:00:00 ┆ 0 │
│ 2023-01-08 00:00:00 ┆ 0 │
│ 2023-01-09 00:00:00 ┆ 1 │
│ 2023-01-10 00:00:00 ┆ 8 │
└─────────────────────┴──────────────────────┘
Column Relationships
You can define relationships between columns that are linked together in some way. Data columns can be given a relationship to both supplementary and flag columns, though supplementary and flag columns cannot be given a relationship to each other.
# Starting with an example TimeSeries, with supplementary and flag columns:
print(ts)
print("Data columns: ", ts.data_columns)
print("Supplementary columns: ", ts.supplementary_columns)
print("Flag columns: ", ts.flag_columns)
# Create a relationship between the supplementary column and data columns (using method on the TimeSeries object)
ts.add_column_relationship("battery_voltage", ["temperature", "precipitation"])
# Create a relationship between temperature and its flags (using method on the Column object)
ts.temperature.add_relationship("temperature_qc_flags")
print("")
print("Temperature column relationships: ", ts.temperature.get_relationships())
print("Battery voltage column relationships: ", ts.battery_voltage.get_relationships())
shape: (10, 6)
┌─────────────────┬─────────────┬───────────────┬────────────────┬────────────────┬────────────────┐
│ timestamp ┆ temperature ┆ precipitation ┆ observer_comme ┆ battery_voltag ┆ temperature_qc │
│ --- ┆ --- ┆ --- ┆ nts ┆ e ┆ _flags │
│ datetime[μs] ┆ i64 ┆ i64 ┆ --- ┆ --- ┆ --- │
│ ┆ ┆ ┆ str ┆ i64 ┆ i64 │
╞═════════════════╪═════════════╪═══════════════╪════════════════╪════════════════╪════════════════╡
│ 2023-01-01 ┆ 20 ┆ 0 ┆ ┆ 12 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-02 ┆ 21 ┆ 0 ┆ ┆ 15 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-03 ┆ 1 ┆ 5 ┆ Power cut ┆ 6 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ between 8am ┆ ┆ │
│ ┆ ┆ ┆ and 1pm ┆ ┆ │
│ 2023-01-04 ┆ 26 ┆ 10 ┆ ┆ 12 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-05 ┆ 24 ┆ 2 ┆ ┆ 10 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-06 ┆ 26 ┆ 0 ┆ ┆ 14 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-07 ┆ 28 ┆ 0 ┆ Agricultural ┆ 19 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ work in ┆ ┆ │
│ ┆ ┆ ┆ adjacent … ┆ ┆ │
│ 2023-01-08 ┆ 41 ┆ 3 ┆ ┆ 17 ┆ 1 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-09 ┆ 51 ┆ 1 ┆ ┆ 16 ┆ 1 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
│ 2023-01-10 ┆ null ┆ 0 ┆ Tree felling ┆ 13 ┆ 0 │
│ 00:00:00 ┆ ┆ ┆ ┆ ┆ │
└─────────────────┴─────────────┴───────────────┴────────────────┴────────────────┴────────────────┘
Data columns: {'temperature': DataColumn('temperature'), 'precipitation': DataColumn('precipitation')}
Supplementary columns: {'observer_comments': SupplementaryColumn('observer_comments'), 'battery_voltage': SupplementaryColumn('battery_voltage')}
Flag columns: {'temperature_qc_flags': FlagColumn('temperature_qc_flags')}
Temperature column relationships: [Relationship('temperature - battery_voltage'), Relationship('temperature - temperature_qc_flags')]
Battery voltage column relationships: [Relationship('temperature - battery_voltage'), Relationship('precipitation - battery_voltage')]
Relationships can be removed:
ts.remove_column_relationship("temperature", "battery_voltage")
print("Temperature column relationships: ", ts.temperature.get_relationships())
Temperature column relationships: [Relationship('temperature - temperature_qc_flags')]
The relationship also defines what happens when a column is removed. For example, if a Data Column is dropped, then this will cascade to any linked Flag Columns. Any linked Supplementary Columns are not dropped, but the relationship removed:
ts.df = ts.df.drop("temperature")
# Note that temperature and temperature_qc_flags are removed, but battery_voltage remains.
print(ts)
shape: (10, 4)
┌─────────────────────┬───────────────┬─────────────────────────────────┬─────────────────┐
│ timestamp ┆ precipitation ┆ observer_comments ┆ battery_voltage │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ str ┆ i64 │
╞═════════════════════╪═══════════════╪═════════════════════════════════╪═════════════════╡
│ 2023-01-01 00:00:00 ┆ 0 ┆ ┆ 12 │
│ 2023-01-02 00:00:00 ┆ 0 ┆ ┆ 15 │
│ 2023-01-03 00:00:00 ┆ 5 ┆ Power cut between 8am and 1pm ┆ 6 │
│ 2023-01-04 00:00:00 ┆ 10 ┆ ┆ 12 │
│ 2023-01-05 00:00:00 ┆ 2 ┆ ┆ 10 │
│ 2023-01-06 00:00:00 ┆ 0 ┆ ┆ 14 │
│ 2023-01-07 00:00:00 ┆ 0 ┆ Agricultural work in adjacent … ┆ 19 │
│ 2023-01-08 00:00:00 ┆ 3 ┆ ┆ 17 │
│ 2023-01-09 00:00:00 ┆ 1 ┆ ┆ 16 │
│ 2023-01-10 00:00:00 ┆ 0 ┆ Tree felling ┆ 13 │
└─────────────────────┴───────────────┴─────────────────────────────────┴─────────────────┘
Aggregating Data
The TimeSeries
class provides powerful aggregation capabilities.
Given a year’s worth of minute data:
# The following TimeSeries has 1-year's worth of 1-minute resolution random temperature data:
print(ts)
shape: (525_600, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2023-01-01 00:00:00 ┆ 15.099343 │
│ 2023-01-01 00:01:00 ┆ 14.994283 │
│ 2023-01-01 00:02:00 ┆ 15.173409 │
│ 2023-01-01 00:03:00 ┆ 15.370413 │
│ 2023-01-01 00:04:00 ┆ 15.04091 │
│ … ┆ … │
│ 2023-12-31 23:55:00 ┆ 14.826904 │
│ 2023-12-31 23:56:00 ┆ 15.06587 │
│ 2023-12-31 23:57:00 ┆ 15.07932 │
│ 2023-12-31 23:58:00 ┆ 15.054764 │
│ 2023-12-31 23:59:00 ┆ 15.175625 │
└─────────────────────┴─────────────┘
We can aggregate this data to various new resolutions.
This example shows an aggregation to monthly mean temperatures. The aggregation function can also be specified by a string ( upper or lower case). Note that this returns a new TimeSeries object, as the primary time attributes have changed.
The returned TimeSeries provides additional context columns:
Expected count of the number of data points expected if the aggregation period was full
Actual count of the number of data points found in the data for the given aggregation period.
For Max and Min, the datetime of the Max/Min data point within the given aggregation period.
# Import the required aggregation function
from time_stream.aggregation import Mean
# Create a monthly aggregation of the minute data, either by importing the aggregation function
# or by using a string
monthly_mean_temp = ts.aggregate(Period.of_months(1), Mean, "temperature")
monthly_mean_temp = ts.aggregate(Period.of_months(1), "mean", "temperature")
print(monthly_mean_temp)
shape: (12, 5)
┌─────────────────────┬──────────────────┬───────────────────┬──────────────────────────┬───────┐
│ timestamp ┆ mean_temperature ┆ count_temperature ┆ expected_count_timestamp ┆ valid │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ u32 ┆ i64 ┆ bool │
╞═════════════════════╪══════════════════╪═══════════════════╪══════════════════════════╪═══════╡
│ 2023-01-01 00:00:00 ┆ 17.605512 ┆ 44640 ┆ 44640 ┆ true │
│ 2023-02-01 00:00:00 ┆ 21.927644 ┆ 40320 ┆ 40320 ┆ true │
│ 2023-03-01 00:00:00 ┆ 24.474063 ┆ 44640 ┆ 44640 ┆ true │
│ 2023-04-01 00:00:00 ┆ 24.613575 ┆ 43200 ┆ 43200 ┆ true │
│ 2023-05-01 00:00:00 ┆ 22.150193 ┆ 44640 ┆ 44640 ┆ true │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 2023-08-01 00:00:00 ┆ 8.087156 ┆ 44640 ┆ 44640 ┆ true │
│ 2023-09-01 00:00:00 ┆ 5.474087 ┆ 43200 ┆ 43200 ┆ true │
│ 2023-10-01 00:00:00 ┆ 5.433785 ┆ 44640 ┆ 44640 ┆ true │
│ 2023-11-01 00:00:00 ┆ 7.960447 ┆ 43200 ┆ 43200 ┆ true │
│ 2023-12-01 00:00:00 ┆ 12.393433 ┆ 44640 ┆ 44640 ┆ true │
└─────────────────────┴──────────────────┴───────────────────┴──────────────────────────┴───────┘
By default, this will aggregate the data regardless of how many missing data points there are in the period. For example, if we have two 1 minute data points on a given day, doing a mean aggregation would return the mean of those 2 values, even though we’d expect 1440 values for a full day.
You can specify criteria for a valid aggregation using the missing_criteria
argument.
{"missing": 30}
Aggregation is valid if there are no more than 30 values missing in the period.{"available": 30}
Aggregation is valid if there are at least 30 input values in the period.{"percent": 30}
Aggregation is valid if the data in the period is at least 30 percent complete (accepts integers or floats).
If no missing_criteria
are specified, the valid
column will be set to True
.
Some more aggregation examples:
# Calculate monthly minimum temperature
monthly_min_temp = ts.aggregate(Period.of_months(1), Min, "temperature")
print(monthly_min_temp)
# Calculate monthly maximum temperature
monthly_max_temp = ts.aggregate(Period.of_months(1), "Max", "temperature")
print(monthly_max_temp)
# Use it with other periods
daily_mean_temp = ts.aggregate(Period.of_days(1), Mean, "temperature")
print(daily_mean_temp)
annual_max_temp = ts.aggregate(Period.of_years(1), Max, "temperature")
print(annual_max_temp)
shape: (12, 6)
┌──────────────┬───────────────────┬─────────────────┬──────────────────┬──────────────────┬───────┐
│ timestamp ┆ timestamp_of_min ┆ min_temperature ┆ count_temperatur ┆ expected_count_t ┆ valid │
│ --- ┆ --- ┆ --- ┆ e ┆ imestamp ┆ --- │
│ datetime[μs] ┆ datetime[μs] ┆ f64 ┆ --- ┆ --- ┆ bool │
│ ┆ ┆ ┆ u32 ┆ i64 ┆ │
╞══════════════╪═══════════════════╪═════════════════╪══════════════════╪══════════════════╪═══════╡
│ 2023-01-01 ┆ 2023-01-01 ┆ 9.573338 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 18:21:00 ┆ ┆ ┆ ┆ │
│ 2023-02-01 ┆ 2023-02-01 ┆ 14.778971 ┆ 40320 ┆ 40320 ┆ true │
│ 00:00:00 ┆ 17:22:00 ┆ ┆ ┆ ┆ │
│ 2023-03-01 ┆ 2023-03-02 ┆ 18.099118 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 17:43:00 ┆ ┆ ┆ ┆ │
│ 2023-04-01 ┆ 2023-04-30 ┆ 18.384095 ┆ 43200 ┆ 43200 ┆ true │
│ 00:00:00 ┆ 17:09:00 ┆ ┆ ┆ ┆ │
│ 2023-05-01 ┆ 2023-05-31 ┆ 14.498269 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 17:55:00 ┆ ┆ ┆ ┆ │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 2023-08-01 ┆ 2023-08-31 ┆ 0.838074 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 18:40:00 ┆ ┆ ┆ ┆ │
│ 2023-09-01 ┆ 2023-09-28 ┆ -0.572529 ┆ 43200 ┆ 43200 ┆ true │
│ 00:00:00 ┆ 18:17:00 ┆ ┆ ┆ ┆ │
│ 2023-10-01 ┆ 2023-10-03 ┆ -0.562019 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 17:37:00 ┆ ┆ ┆ ┆ │
│ 2023-11-01 ┆ 2023-11-01 ┆ 0.875222 ┆ 43200 ┆ 43200 ┆ true │
│ 00:00:00 ┆ 18:14:00 ┆ ┆ ┆ ┆ │
│ 2023-12-01 ┆ 2023-12-01 ┆ 4.600979 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 18:22:00 ┆ ┆ ┆ ┆ │
└──────────────┴───────────────────┴─────────────────┴──────────────────┴──────────────────┴───────┘
shape: (12, 6)
┌──────────────┬───────────────────┬─────────────────┬──────────────────┬──────────────────┬───────┐
│ timestamp ┆ timestamp_of_max ┆ max_temperature ┆ count_temperatur ┆ expected_count_t ┆ valid │
│ --- ┆ --- ┆ --- ┆ e ┆ imestamp ┆ --- │
│ datetime[μs] ┆ datetime[μs] ┆ f64 ┆ --- ┆ --- ┆ bool │
│ ┆ ┆ ┆ u32 ┆ i64 ┆ │
╞══════════════╪═══════════════════╪═════════════════╪══════════════════╪══════════════════╪═══════╡
│ 2023-01-01 ┆ 2023-01-31 ┆ 25.551551 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 06:39:00 ┆ ┆ ┆ ┆ │
│ 2023-02-01 ┆ 2023-02-28 ┆ 28.81754 ┆ 40320 ┆ 40320 ┆ true │
│ 00:00:00 ┆ 05:32:00 ┆ ┆ ┆ ┆ │
│ 2023-03-01 ┆ 2023-03-25 ┆ 30.577439 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 06:10:00 ┆ ┆ ┆ ┆ │
│ 2023-04-01 ┆ 2023-04-03 ┆ 30.624377 ┆ 43200 ┆ 43200 ┆ true │
│ 00:00:00 ┆ 05:42:00 ┆ ┆ ┆ ┆ │
│ 2023-05-01 ┆ 2023-05-01 ┆ 29.24943 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 06:36:00 ┆ ┆ ┆ ┆ │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 2023-08-01 ┆ 2023-08-01 ┆ 15.550731 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 05:44:00 ┆ ┆ ┆ ┆ │
│ 2023-09-01 ┆ 2023-09-02 ┆ 11.939817 ┆ 43200 ┆ 43200 ┆ true │
│ 00:00:00 ┆ 06:17:00 ┆ ┆ ┆ ┆ │
│ 2023-10-01 ┆ 2023-10-30 ┆ 11.756357 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 06:19:00 ┆ ┆ ┆ ┆ │
│ 2023-11-01 ┆ 2023-11-30 ┆ 15.295177 ┆ 43200 ┆ 43200 ┆ true │
│ 00:00:00 ┆ 06:05:00 ┆ ┆ ┆ ┆ │
│ 2023-12-01 ┆ 2023-12-31 ┆ 20.211176 ┆ 44640 ┆ 44640 ┆ true │
│ 00:00:00 ┆ 06:29:00 ┆ ┆ ┆ ┆ │
└──────────────┴───────────────────┴─────────────────┴──────────────────┴──────────────────┴───────┘
shape: (365, 5)
┌─────────────────────┬──────────────────┬───────────────────┬──────────────────────────┬───────┐
│ timestamp ┆ mean_temperature ┆ count_temperature ┆ expected_count_timestamp ┆ valid │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ u32 ┆ i32 ┆ bool │
╞═════════════════════╪══════════════════╪═══════════════════╪══════════════════════════╪═══════╡
│ 2023-01-01 00:00:00 ┆ 15.093978 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-01-02 00:00:00 ┆ 15.263434 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-01-03 00:00:00 ┆ 15.422625 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-01-04 00:00:00 ┆ 15.596555 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-01-05 00:00:00 ┆ 15.766958 ┆ 1440 ┆ 1440 ┆ true │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 2023-12-27 00:00:00 ┆ 14.225196 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-12-28 00:00:00 ┆ 14.392217 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-12-29 00:00:00 ┆ 14.565449 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-12-30 00:00:00 ┆ 14.740761 ┆ 1440 ┆ 1440 ┆ true │
│ 2023-12-31 00:00:00 ┆ 14.907083 ┆ 1440 ┆ 1440 ┆ true │
└─────────────────────┴──────────────────┴───────────────────┴──────────────────────────┴───────┘
shape: (1, 6)
┌──────────────┬───────────────────┬─────────────────┬──────────────────┬──────────────────┬───────┐
│ timestamp ┆ timestamp_of_max ┆ max_temperature ┆ count_temperatur ┆ expected_count_t ┆ valid │
│ --- ┆ --- ┆ --- ┆ e ┆ imestamp ┆ --- │
│ datetime[μs] ┆ datetime[μs] ┆ f64 ┆ --- ┆ --- ┆ bool │
│ ┆ ┆ ┆ u32 ┆ i64 ┆ │
╞══════════════╪═══════════════════╪═════════════════╪══════════════════╪══════════════════╪═══════╡
│ 2023-01-01 ┆ 2023-04-03 ┆ 30.624377 ┆ 525600 ┆ 525600 ┆ true │
│ 00:00:00 ┆ 05:42:00 ┆ ┆ ┆ ┆ │
└──────────────┴───────────────────┴─────────────────┴──────────────────┴──────────────────┴───────┘
For more details on aggregation, see the dedicated <no title> guide.
Best Practices
Always specify resolution and periodicity when creating a TimeSeries to ensure proper validation
Use appropriate column types: - Use data columns for core measurements - Use supplementary columns for metadata or context - Use flag columns for quality control
Define relationships between related columns
Add metadata to enhance understanding of your data
Next Steps
Now that you understand the basics of the TimeSeries
class, explore:
Working with Periods - Learn more about working with time periods
<no title> - Dive deeper into aggregation capabilities
flagging - Master the flagging control system
column_relationships - Understand column relationships in detail