Infilling

Missing data happens. Fill the gaps with precision and care.

Why use Time-Stream?

It is inevitable that real-world monitoring data has gaps, whether that’s from: communications outages, sensor swaps or power cuts. With Time-Stream, you can fill those missing values with a robust infilling procedure that benefits from deep knowledge of the time properties of your data.

One-liner

With Time-Stream you state intent, not mechanics:

tf.infill("linear", "flow", max_gap=3)

That’s it, a single line with clear intent: “I want to use the linear infill method on my flow data, but only for gaps ≤ 3 steps”.

Complex example

Let’s take our example 15-minute river flow data that contains a few short outages. You might want to:

  • Fill only gaps up to 3 consecutive steps (≤45 minutes).

  • Use linear interpolation to infill tiny gaps (1 step) and a more complex interpolator (e.g. PCHIP) for ≥2 step gaps.

Input:

15-minute river flow timeseries, including some missing data.

shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time                ┆ flow      │
│ ---                 ┆ ---       │
│ datetime[ns]        ┆ f64       │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 35.744844 │
│ 2020-09-01 00:15:00 ┆ null      │
│ 2020-09-01 00:30:00 ┆ 79.728137 │
│ 2020-09-01 00:45:00 ┆ null      │
│ 2020-09-01 01:00:00 ┆ null      │
│ 2020-09-01 01:15:00 ┆ null      │
│ 2020-09-01 01:30:00 ┆ null      │
│ 2020-09-01 01:45:00 ┆ 17.576324 │
│ …                   ┆ …         │
│ 2023-10-31 22:15:00 ┆ 66.126464 │
│ 2023-10-31 22:30:00 ┆ 79.396944 │
│ 2023-10-31 22:45:00 ┆ 54.537908 │
│ 2023-10-31 23:00:00 ┆ -0.017411 │
│ 2023-10-31 23:15:00 ┆ 58.686884 │
│ 2023-10-31 23:30:00 ┆ null      │
│ 2023-10-31 23:45:00 ┆ null      │
│ 2023-11-01 00:00:00 ┆ 74.769198 │
└─────────────────────┴───────────┘

Code:

import time_stream as ts

# Wrap the DataFrame in a TimeFrame object
tf = ts.TimeFrame(df, "time", resolution="PT15M", periodicity="PT15M")

# Infill gaps
tf_infill = tf.infill(
    "linear", "flow", max_gap_size=1
).infill(
    "pchip", "flow", max_gap_size=3
)

Output:

shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time                ┆ flow      │
│ ---                 ┆ ---       │
│ datetime[ns]        ┆ f64       │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 35.744844 │
│ 2020-09-01 00:15:00 ┆ 57.736491 │
│ 2020-09-01 00:30:00 ┆ 79.728137 │
│ 2020-09-01 00:45:00 ┆ null      │
│ 2020-09-01 01:00:00 ┆ null      │
│ 2020-09-01 01:15:00 ┆ null      │
│ 2020-09-01 01:30:00 ┆ null      │
│ 2020-09-01 01:45:00 ┆ 17.576324 │
│ …                   ┆ …         │
│ 2023-10-31 22:15:00 ┆ 66.126464 │
│ 2023-10-31 22:30:00 ┆ 79.396944 │
│ 2023-10-31 22:45:00 ┆ 54.537908 │
│ 2023-10-31 23:00:00 ┆ -0.017411 │
│ 2023-10-31 23:15:00 ┆ 58.686884 │
│ 2023-10-31 23:30:00 ┆ 67.926355 │
│ 2023-10-31 23:45:00 ┆ 73.1347   │
│ 2023-11-01 00:00:00 ┆ 74.769198 │
└─────────────────────┴───────────┘

Key benefits

  • Conservative: You set the rules; the library enforces them.

  • Time aware: Honours the resolution and periodicity properties of your data.

  • Simple code: One call conveys the method, scope, and policy.

In more detail

The infill() method is the entry point for infilling your timeseries data in Time-Stream. It delegates to well established methods from the SciPy data science library, combined with the time-integrity of your TimeFrame.

Infill methods

Choose how missing values are estimated by passing a method name as a string. Each method has its strengths, depending on your data.

Polynomial interpolation

  • "linear" - straight-line interpolation between neighbouring points.

    Simple and neutral; best for very short gaps (1–2 steps).

  • "quadratic" - second-order polynomial curve.

    Captures gentle curvature; suitable when changes aren’t linear.

  • "cubic" - third-order polynomial curve.

    Smooth transitions; can be useful for variables with cyclical patterns.

  • "bspline" - B-spline interpolation (configurable order).

    Flexible piecewise polynomials; user decides.*

Shape-preserving methods

  • "pchip" - Piecewise Cubic Hermite Interpolating Polynomial.

    Preserves monotonicity and avoids overshoot; can help to avoid unrealistic fluctuations between values.

  • "akima" - Akima spline.

    A smooth curve fit for data with significant local variations and potential outliers.

Note

All methods honour the maximum gap limit: they will only fill runs of missing values up to your chosen length, leaving longer gaps as NaN.

Note

NaN values at the very beginning and very end of a timeseries will remain NaN; there is no pre- or post- data to constrain the infilling method.

Column selection

Specify which column to infill; only this column will be used by the infill function.

Column selection

Specify which column to infill; only this column will be used by the infill function.

Observation interval

Specify an observation interval to restrict infilling to a specific time window. This is useful when:

  • You only want to work with a subset of data (e.g. one hydrological year).

  • You want to fill recent gaps without touching the historical record.

  • You need to use different methods for different parts of your timeseries.

Example:

from datetime import datetime

tf_recent = tf.infill(
    "linear",
    "flow",
    observation_interval=(datetime(2024, 1, 1), datetime(2024, 12, 31)),
)

This will only attempt infilling between January to Decemeber 2024; gaps outside that interval remain untouched.

Max gap size

Use the maximum gap size to prevent over-eager interpolation. Only gaps less than this (measured in consecutive missing steps) will be infilled.

Example:

# Fill single-step gaps only (≤ 15 minutes at 15-min resolution)
tf1 = tf.infill("linear", "flow", max_gap_size=1)

# Fill gaps up to 2 steps (≤ 30 minutes)
tf2 = tf.infill("akima", "flow", max_gap_size=2)

Note

The definition of “gap size” depends on the TimeFrame resolution. At 15-minute resolution, max_gap_size=2 = 30 minutes; at daily resolution, max_gap_size=2 = 2 days.

Visualisation of methods

A quick visualisation of the results from the different infill methods is sometimes useful. However, bear in mind that this is a very simplistic example and the correct method to use is dependent on your data. You should do your research into which is most appropriate.

shape: (16, 7)
┌─────────────────────┬──────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ time                ┆ original ┆ linear   ┆ quadratic ┆ cubic    ┆ pchip    ┆ akima    │
│ ---                 ┆ ---      ┆ ---      ┆ ---       ┆ ---      ┆ ---      ┆ ---      │
│ datetime[μs]        ┆ f64      ┆ f64      ┆ f64       ┆ f64      ┆ f64      ┆ f64      │
╞═════════════════════╪══════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428  ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471  ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null     ┆ 1.259424 ┆ 0.546161  ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377  ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606  ┆ 4.54606  ┆ 4.54606   ┆ 4.54606  ┆ 4.54606  ┆ 4.54606  │
│ 2024-01-06 00:00:00 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693  ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726  ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null     ┆ 3.407293 ┆ 3.727706  ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null     ┆ 4.782859 ┆ 5.658006  ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426  ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869  ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051  ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null     ┆ 3.692068 ┆ 1.96472   ┆ 1.197094 ┆ 3.10049  ┆ 2.34322  │
│ 2024-01-14 00:00:00 ┆ null     ┆ 4.323086 ┆ 2.019954  ┆ 0.375071 ┆ 3.37656  ┆ 2.68997  │
│ 2024-01-15 00:00:00 ┆ null     ┆ 4.954103 ┆ 3.226754  ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512  ┆ 5.58512  ┆ 5.58512   ┆ 5.58512  ┆ 5.58512  ┆ 5.58512  │
└─────────────────────┴──────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
../_images/examples_infilling_plot_all_infills.svg