Infilling

Missing data happens. Fill the gaps with precision and care.

Why use Time-Stream?

It is inevitable that real-world monitoring data has gaps, whether that’s from: communications outages, sensor swaps or power cuts. With Time-Stream, you can fill those missing values with a robust infilling procedure that benefits from deep knowledge of the time properties of your data.

One-liner

With Time-Stream you state intent, not mechanics:

tf.infill("linear", "flow", max_gap=3)

That’s it, a single line with clear intent: “I want to use the linear infill method on my flow data, but only for gaps ≤ 3 steps”.

Complex example

Let’s take our example 15-minute river flow data that contains a few short outages. You might want to:

  • Fill only gaps up to 3 consecutive steps (≤45 minutes).

  • Use linear interpolation to infill tiny gaps (1 step) and a more complex interpolator (e.g. PCHIP) for ≥2 step gaps.

Input:

15-minute river flow timeseries, including some missing data.

shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time                ┆ flow      │
│ ---                 ┆ ---       │
│ datetime[ns]        ┆ f64       │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 92.860538 │
│ 2020-09-01 00:15:00 ┆ null      │
│ 2020-09-01 00:30:00 ┆ 98.103103 │
│ 2020-09-01 00:45:00 ┆ null      │
│ 2020-09-01 01:00:00 ┆ null      │
│ 2020-09-01 01:15:00 ┆ null      │
│ 2020-09-01 01:30:00 ┆ null      │
│ 2020-09-01 01:45:00 ┆ 92.085242 │
│ …                   ┆ …         │
│ 2023-10-31 22:15:00 ┆ 84.677897 │
│ 2023-10-31 22:30:00 ┆ 86.0179   │
│ 2023-10-31 22:45:00 ┆ 83.122459 │
│ 2023-10-31 23:00:00 ┆ 76.928613 │
│ 2023-10-31 23:15:00 ┆ 83.320365 │
│ 2023-10-31 23:30:00 ┆ null      │
│ 2023-10-31 23:45:00 ┆ null      │
│ 2023-11-01 00:00:00 ┆ 84.721752 │
└─────────────────────┴───────────┘

Code:

import time_stream as ts

# Wrap the DataFrame in a TimeFrame object
tf = ts.TimeFrame(df, "time", resolution="PT15M", periodicity="PT15M")

# Infill gaps
tf_infill = tf.infill(
    "linear", "flow", max_gap_size=1
).infill(
    "pchip", "flow", max_gap_size=3
)

Output:

<time_stream.TimeFrame> Size (estimated): 1.71 MB
Time properties:
    Time column : time  [2020-09-01 00:00:00, ..., 2023-11-01 00:00:00]
    Type        : Datetime(time_unit='ns', time_zone=None)
    Resolution  : PT15M
    Offset      : None
    Alignment   : PT15M
    Periodicity : PT15M
    Anchor      : TimeAnchor.START
Columns:
    flow        : Float64  880.56 KB  [92.86053821660515, ..., 84.72175228556347]

Key benefits

  • Conservative: You set the rules; the library enforces them.

  • Time aware: Honours the resolution and periodicity properties of your data.

  • Simple code: One call conveys the method, scope, and policy.

In more detail

The infill() method is the entry point for infilling your timeseries data in Time-Stream. There are various infill methods available; from using alternative data from another source, to delegating to well established methods from the SciPy data science library. All methods are combined with the time-integrity of your TimeFrame.

Let’s look at the method in more detail:

TimeFrame.infill(infill_method, column_name, observation_interval=None, max_gap_size=None, **kwargs)[source]

Apply an infilling method to a column in the TimeFrame to fill in missing data.

Parameters:
  • infill_method (Union[str, Type[InfillMethod], InfillMethod]) – The method to use for infilling

  • column_name (str) – The column to infill

  • observation_interval (tuple[datetime, datetime | None] | None) – Optional time interval to limit the check to.

  • max_gap_size (int | None) – The maximum size of consecutive null gaps that should be filled. Any gap larger than this will not be infilled and will remain as null.

  • **kwargs – Parameters specific to the infill method.

Return type:

Self

Returns:

A TimeFrame containing the aggregated data.

Infill methods

The infill_method parameter lets you choose how missing values are estimated by passing a method name as a string. Each method has its strengths, depending on your data. The currently available methods are:

Simple infilling techniques

  • "alt_data" - infill using data from an alternative source.

    Either another column in your TimeFrame, or data from a different DataFrame entirely.

Polynomial interpolation

  • "linear" - straight-line interpolation between neighbouring points.

    Simple and neutral; best for short gaps.

  • "quadratic" - second-order polynomial curve.

    Captures gentle curvature; suitable when changes aren’t linear.

  • "cubic" - third-order polynomial curve.

    Smooth transitions; can be useful for variables with cyclical patterns.

  • "bspline" - B-spline interpolation (configurable order).

    Flexible piecewise polynomials; user decides.

Shape-preserving methods

  • "pchip" - Piecewise Cubic Hermite Interpolating Polynomial.

    Preserves monotonicity and avoids overshoot; can help to avoid unrealistic fluctuations between values.

  • "akima" - Akima spline.

    A smooth curve fit for data with significant local variations and potential outliers.

Note

All methods honour the maximum gap limit: they will only fill runs of missing values up to your chosen length, leaving longer gaps as NaN.

Note

For infill methods using interpolation techniques, NaN values at the very beginning and very end of a timeseries will remain NaN; there is no pre- or post- data to constrain the infilling method.

Column selection

The column_name parameter lets you specify which column to infill; only this column will be used by the infill function.

Observation interval

The observation_interval parameter lets you specify an observation interval to restrict infilling to a specific time window. This is useful when:

  • You only want to work with a subset of data (e.g. one hydrological year).

  • You want to fill recent gaps without touching the historical record.

  • You need to use different methods for different parts of your timeseries.

Example:

from datetime import datetime

tf_recent = tf.infill(
    "linear",
    "flow",
    observation_interval=(datetime(2024, 1, 1), datetime(2024, 12, 31)),
)

This will only attempt infilling between January to Decemeber 2024; gaps outside that interval remain untouched.

Max gap size

Use the max_gap_size parameter to prevent over-eager interpolation. Only gaps less than this (measured in consecutive missing steps) will be infilled.

Example:

# Fill single-step gaps only (≤ 15 minutes at 15-min resolution)
tf1 = tf.infill("linear", "flow", max_gap_size=1)

# Fill gaps up to 2 steps (≤ 30 minutes)
tf2 = tf.infill("akima", "flow", max_gap_size=2)

Note

The definition of “gap size” depends on the TimeFrame resolution. At 15-minute resolution, max_gap_size=2 = 30 minutes; at daily resolution, max_gap_size=2 = 2 days.

Examples

Alternative data infilling

The "alt_data" infill method allows you to fill missing values in a column using data from an alternative source.

You can specify the alternative data in two ways:

  1. From a column within the same TimeFrame: If the alternative data is already present as a column in your current TimeFrame object, you can directly reference it.

  2. From a separate DataFrame: You can provide an entirely separate Polars DataFrame containing the alternative data.

In both cases, you can also apply a correction_factor to the alternative data before it’s used for infilling.

Infilling from a separate DataFrame

Let’s say you have a primary dataset with missing “flow” values, and a separate alt_df with “alt_data” that can be used to infill these gaps.

Input:

shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time                ┆ flow      │
│ ---                 ┆ ---       │
│ datetime[ns]        ┆ f64       │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 92.860538 │
│ 2020-09-01 00:15:00 ┆ null      │
│ 2020-09-01 00:30:00 ┆ 98.103103 │
│ 2020-09-01 00:45:00 ┆ null      │
│ 2020-09-01 01:00:00 ┆ null      │
│ …                   ┆ …         │
│ 2023-10-31 23:00:00 ┆ 76.928613 │
│ 2023-10-31 23:15:00 ┆ 83.320365 │
│ 2023-10-31 23:30:00 ┆ null      │
│ 2023-10-31 23:45:00 ┆ null      │
│ 2023-11-01 00:00:00 ┆ 84.721752 │
└─────────────────────┴───────────┘
shape: (110_977, 2)
┌─────────────────────┬────────────┐
│ time                ┆ alt_flow   │
│ ---                 ┆ ---        │
│ datetime[ns]        ┆ f64        │
╞═════════════════════╪════════════╡
│ 2020-09-01 00:00:00 ┆ 116.075673 │
│ 2020-09-01 00:15:00 ┆ 124.726315 │
│ 2020-09-01 00:30:00 ┆ 122.628878 │
│ 2020-09-01 00:45:00 ┆ 125.585763 │
│ 2020-09-01 01:00:00 ┆ 116.101802 │
│ …                   ┆ …          │
│ 2023-10-31 23:00:00 ┆ 96.160767  │
│ 2023-10-31 23:15:00 ┆ 104.150457 │
│ 2023-10-31 23:30:00 ┆ 105.125655 │
│ 2023-10-31 23:45:00 ┆ 100.174939 │
│ 2023-11-01 00:00:00 ┆ 105.90219  │
└─────────────────────┴────────────┘

Code:

tf_infill = tf.infill("alt_data", "flow", alt_df=alt_df, correction_factor=0.75, alt_data_column="alt_flow")

Output:

shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time                ┆ flow      │
│ ---                 ┆ ---       │
│ datetime[ns]        ┆ f64       │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 92.860538 │
│ 2020-09-01 00:15:00 ┆ 93.544737 │
│ 2020-09-01 00:30:00 ┆ 98.103103 │
│ 2020-09-01 00:45:00 ┆ 94.189322 │
│ 2020-09-01 01:00:00 ┆ 87.076351 │
│ …                   ┆ …         │
│ 2023-10-31 23:00:00 ┆ 76.928613 │
│ 2023-10-31 23:15:00 ┆ 83.320365 │
│ 2023-10-31 23:30:00 ┆ 78.844241 │
│ 2023-10-31 23:45:00 ┆ 75.131204 │
│ 2023-11-01 00:00:00 ┆ 84.721752 │
└─────────────────────┴───────────┘

Visualisation of interpolation methods

A quick visualisation of the results from the different interpolation infill methods is sometimes useful. However, bear in mind that this is a very simplistic example and the correct method to use is dependent on your data. You should do your research into which is most appropriate.

shape: (16, 7)
┌─────────────────────┬──────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ time                ┆ original ┆ linear   ┆ quadratic ┆ cubic    ┆ pchip    ┆ akima    │
│ ---                 ┆ ---      ┆ ---      ┆ ---       ┆ ---      ┆ ---      ┆ ---      │
│ datetime[μs]        ┆ f64      ┆ f64      ┆ f64       ┆ f64      ┆ f64      ┆ f64      │
╞═════════════════════╪══════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428  ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471  ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null     ┆ 1.259424 ┆ 0.546161  ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377  ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606  ┆ 4.54606  ┆ 4.54606   ┆ 4.54606  ┆ 4.54606  ┆ 4.54606  │
│ 2024-01-06 00:00:00 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693  ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726  ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null     ┆ 3.407293 ┆ 3.727706  ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null     ┆ 4.782859 ┆ 5.658006  ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426  ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869  ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051  ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null     ┆ 3.692068 ┆ 1.96472   ┆ 1.197094 ┆ 3.10049  ┆ 2.34322  │
│ 2024-01-14 00:00:00 ┆ null     ┆ 4.323086 ┆ 2.019954  ┆ 0.375071 ┆ 3.37656  ┆ 2.68997  │
│ 2024-01-15 00:00:00 ┆ null     ┆ 4.954103 ┆ 3.226754  ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512  ┆ 5.58512  ┆ 5.58512   ┆ 5.58512  ┆ 5.58512  ┆ 5.58512  │
└─────────────────────┴──────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
../_images/examples_infilling_plot_all_infills.svg