Infilling¶
Missing data happens. Fill the gaps with precision and care.
Why use Time-Stream?¶
It is inevitable that real-world monitoring data has gaps, whether that’s from: communications outages, sensor swaps or power cuts. With Time-Stream, you can fill those missing values with a robust infilling procedure that benefits from deep knowledge of the time properties of your data.
One-liner¶
With Time-Stream you state intent, not mechanics:
tf.infill("linear", "flow", max_gap=3)
That’s it, a single line with clear intent: “I want to use the linear infill method on my flow data, but only for gaps ≤ 3 steps”.
Complex example¶
Let’s take our example 15-minute river flow data that contains a few short outages. You might want to:
Fill only gaps up to 3 consecutive steps (≤45 minutes).
Use linear interpolation to infill tiny gaps (1 step) and a more complex interpolator (e.g. PCHIP) for ≥2 step gaps.
Input:
15-minute river flow timeseries, including some missing data.
shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time ┆ flow │
│ --- ┆ --- │
│ datetime[ns] ┆ f64 │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 35.744844 │
│ 2020-09-01 00:15:00 ┆ null │
│ 2020-09-01 00:30:00 ┆ 79.728137 │
│ 2020-09-01 00:45:00 ┆ null │
│ 2020-09-01 01:00:00 ┆ null │
│ 2020-09-01 01:15:00 ┆ null │
│ 2020-09-01 01:30:00 ┆ null │
│ 2020-09-01 01:45:00 ┆ 17.576324 │
│ … ┆ … │
│ 2023-10-31 22:15:00 ┆ 66.126464 │
│ 2023-10-31 22:30:00 ┆ 79.396944 │
│ 2023-10-31 22:45:00 ┆ 54.537908 │
│ 2023-10-31 23:00:00 ┆ -0.017411 │
│ 2023-10-31 23:15:00 ┆ 58.686884 │
│ 2023-10-31 23:30:00 ┆ null │
│ 2023-10-31 23:45:00 ┆ null │
│ 2023-11-01 00:00:00 ┆ 74.769198 │
└─────────────────────┴───────────┘
Code:
import time_stream as ts
# Wrap the DataFrame in a TimeFrame object
tf = ts.TimeFrame(df, "time", resolution="PT15M", periodicity="PT15M")
# Infill gaps
tf_infill = tf.infill(
"linear", "flow", max_gap_size=1
).infill(
"pchip", "flow", max_gap_size=3
)
Output:
shape: (110_977, 2)
┌─────────────────────┬───────────┐
│ time ┆ flow │
│ --- ┆ --- │
│ datetime[ns] ┆ f64 │
╞═════════════════════╪═══════════╡
│ 2020-09-01 00:00:00 ┆ 35.744844 │
│ 2020-09-01 00:15:00 ┆ 57.736491 │
│ 2020-09-01 00:30:00 ┆ 79.728137 │
│ 2020-09-01 00:45:00 ┆ null │
│ 2020-09-01 01:00:00 ┆ null │
│ 2020-09-01 01:15:00 ┆ null │
│ 2020-09-01 01:30:00 ┆ null │
│ 2020-09-01 01:45:00 ┆ 17.576324 │
│ … ┆ … │
│ 2023-10-31 22:15:00 ┆ 66.126464 │
│ 2023-10-31 22:30:00 ┆ 79.396944 │
│ 2023-10-31 22:45:00 ┆ 54.537908 │
│ 2023-10-31 23:00:00 ┆ -0.017411 │
│ 2023-10-31 23:15:00 ┆ 58.686884 │
│ 2023-10-31 23:30:00 ┆ 67.926355 │
│ 2023-10-31 23:45:00 ┆ 73.1347 │
│ 2023-11-01 00:00:00 ┆ 74.769198 │
└─────────────────────┴───────────┘
Key benefits¶
Conservative: You set the rules; the library enforces them.
Time aware: Honours the resolution and periodicity properties of your data.
Simple code: One call conveys the method, scope, and policy.
In more detail¶
The infill()
method is the entry point for infilling your
timeseries data in Time-Stream. It delegates to well established methods from the SciPy data science library, combined with the time-integrity of your TimeFrame.
Infill methods¶
Choose how missing values are estimated by passing a method name as a string. Each method has its strengths, depending on your data.
Polynomial interpolation¶
"linear"
- straight-line interpolation between neighbouring points.Simple and neutral; best for very short gaps (1–2 steps).
"quadratic"
- second-order polynomial curve.Captures gentle curvature; suitable when changes aren’t linear.
"cubic"
- third-order polynomial curve.Smooth transitions; can be useful for variables with cyclical patterns.
"bspline"
- B-spline interpolation (configurable order).Flexible piecewise polynomials; user decides.*
Shape-preserving methods¶
"pchip"
- Piecewise Cubic Hermite Interpolating Polynomial.Preserves monotonicity and avoids overshoot; can help to avoid unrealistic fluctuations between values.
"akima"
- Akima spline.A smooth curve fit for data with significant local variations and potential outliers.
Note
All methods honour the maximum gap limit: they will only fill runs of missing values up to your chosen length, leaving longer gaps as NaN.
Note
NaN values at the very beginning and very end of a timeseries will remain NaN; there is no pre- or post- data to constrain the infilling method.
Column selection¶
Specify which column to infill; only this column will be used by the infill function.
Column selection¶
Specify which column to infill; only this column will be used by the infill function.
Observation interval¶
Specify an observation interval to restrict infilling to a specific time window. This is useful when:
You only want to work with a subset of data (e.g. one hydrological year).
You want to fill recent gaps without touching the historical record.
You need to use different methods for different parts of your timeseries.
Example:
from datetime import datetime
tf_recent = tf.infill(
"linear",
"flow",
observation_interval=(datetime(2024, 1, 1), datetime(2024, 12, 31)),
)
This will only attempt infilling between January to Decemeber 2024; gaps outside that interval remain untouched.
Max gap size¶
Use the maximum gap size to prevent over-eager interpolation. Only gaps less than this (measured in consecutive missing steps) will be infilled.
Example:
# Fill single-step gaps only (≤ 15 minutes at 15-min resolution)
tf1 = tf.infill("linear", "flow", max_gap_size=1)
# Fill gaps up to 2 steps (≤ 30 minutes)
tf2 = tf.infill("akima", "flow", max_gap_size=2)
Note
The definition of “gap size” depends on the TimeFrame resolution.
At 15-minute resolution, max_gap_size=2
= 30 minutes; at daily resolution,
max_gap_size=2
= 2 days.
Visualisation of methods¶
A quick visualisation of the results from the different infill methods is sometimes useful. However, bear in mind that this is a very simplistic example and the correct method to use is dependent on your data. You should do your research into which is most appropriate.
shape: (16, 7)
┌─────────────────────┬──────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ time ┆ original ┆ linear ┆ quadratic ┆ cubic ┆ pchip ┆ akima │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪══════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null ┆ 1.259424 ┆ 0.546161 ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null ┆ 3.407293 ┆ 3.727706 ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null ┆ 4.782859 ┆ 5.658006 ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null ┆ 3.692068 ┆ 1.96472 ┆ 1.197094 ┆ 3.10049 ┆ 2.34322 │
│ 2024-01-14 00:00:00 ┆ null ┆ 4.323086 ┆ 2.019954 ┆ 0.375071 ┆ 3.37656 ┆ 2.68997 │
│ 2024-01-15 00:00:00 ┆ null ┆ 4.954103 ┆ 3.226754 ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 │
└─────────────────────┴──────────┴──────────┴───────────┴──────────┴──────────┴──────────┘