Infilling
The infill module of the Time Stream library provides various methods for filling missing values in your time series data. Missing data is a common challenge in time series analysis, whether due to sensor failures, network outages, data transmission errors, or scheduled maintenance periods.
The infill system in the Time Stream library provides a flexible framework for filling in missing values in your
time series data. It allows users to define and apply infilling to individual columns of a TimeSeries
object.
Applying an Infilling Procedure
To apply infilling, call the TimeSeries.infill
method on a TimeSeries
object. This method allows you to:
Specify the infill method (see below for available built-in methods)
Choose the column to infill
Optionally limit the infilling to a time observation window
Optionally limit the infilling to a maximum gap window size, to avoid unrealistic estimates across large missing periods
Built-in Infilling Methods
Several built-in infilling methods are available. These are built upon well established methods from the SciPy data science library:
- Polynomial Interpolation
Linear: Simple straight-line interpolation between points
Quadratic: Smooth curves using second-order polynomials
Cubic: Natural-looking curves using third-order polynomials
B-Spline: Flexible piecewise polynomials with configurable order
- Shape-Preserving Methods
PCHIP: Preserves monotonicity and avoids overshoots
Akima: Reduces oscillations in data with rapid changes
Each method supports configuration through parameters specific to that method.
The examples given below all use this TimeSeries
object:
np.random.seed(42)
# Set up a daily time series with varying gaps
dates = [
datetime(2024, 1, 1), datetime(2024, 1, 2), # One-day gap,
datetime(2024, 1, 4), datetime(2024, 1, 5), datetime(2024, 1, 6),
datetime(2024, 1, 7), # Two-day gap,
datetime(2024, 1, 10), datetime(2024, 1, 11), datetime(2024, 1, 12),
# Three-day gap,
datetime(2024, 1, 16)
]
# Create example random column data
df = pl.DataFrame({
"timestamp": dates,
"temperature": np.arange(len(dates)) * 0.5 + np.random.normal(0, 2, len(dates)),
})
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
pad=True
)
shape: (16, 7)
┌─────────────────────┬─────────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ timestamp ┆ temperature ┆ linear ┆ quadratic ┆ cubic ┆ pchip ┆ akima │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪═════════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null ┆ 1.259424 ┆ 0.546161 ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null ┆ 3.407293 ┆ 3.727706 ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null ┆ 4.782859 ┆ 5.658006 ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null ┆ 3.692068 ┆ 1.96472 ┆ 1.197094 ┆ 3.10049 ┆ 2.34322 │
│ 2024-01-14 00:00:00 ┆ null ┆ 4.323086 ┆ 2.019954 ┆ 0.375071 ┆ 3.37656 ┆ 2.68997 │
│ 2024-01-15 00:00:00 ┆ null ┆ 4.954103 ┆ 3.226754 ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 │
└─────────────────────┴─────────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null │
│ 2024-01-09 00:00:00 ┆ null │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ 5.58512 │
└─────────────────────┴─────────────┘
Examples
methods = ["linear", "quadratic", "cubic", "pchip", "akima"]
result = ts.df.clone()
for method in methods:
ts_infilled = ts.infill(method, "temperature")
ts_infilled.df = ts_infilled.df.rename({"temperature": method})
result = result.join(ts_infilled.df, on="timestamp", how="full").drop("timestamp_right")
shape: (16, 7)
┌─────────────────────┬─────────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ timestamp ┆ temperature ┆ linear ┆ quadratic ┆ cubic ┆ pchip ┆ akima │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪═════════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null ┆ 1.259424 ┆ 0.546161 ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null ┆ 3.407293 ┆ 3.727706 ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null ┆ 4.782859 ┆ 5.658006 ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null ┆ 3.692068 ┆ 1.96472 ┆ 1.197094 ┆ 3.10049 ┆ 2.34322 │
│ 2024-01-14 00:00:00 ┆ null ┆ 4.323086 ┆ 2.019954 ┆ 0.375071 ┆ 3.37656 ┆ 2.68997 │
│ 2024-01-15 00:00:00 ┆ null ┆ 4.954103 ┆ 3.226754 ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 │
└─────────────────────┴─────────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
Specifying Maximum Gap Size
The infill
method accepts a max_gap_size
parameter, which indicates the maximum size of consecutive null gaps
that should be filled. Any gap larger than this will not be infilled and will remain as null.
# The gap of 3 missing dates will not be infilled
ts_infilled = ts.infill("linear", "temperature", max_gap_size=2)
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ 1.259424 │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ 3.407293 │
│ 2024-01-09 00:00:00 ┆ 4.782859 │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ 5.58512 │
└─────────────────────┴─────────────┘
Applying Infilling during a specific time range
The observation_interval
argument can be used to constrain the infilling to a chunk of your time series.
# Only infill data between specific dates
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 1, 5)
ts_infilled = ts.infill("linear", "temperature", observation_interval=(start_date, end_date))
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ 1.259424 │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null │
│ 2024-01-09 00:00:00 ┆ null │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ 5.58512 │
└─────────────────────┴─────────────┘
Linear Interpolation
Name: "linear"
- class time_stream.infill.LinearInterpolation(**kwargs)[source]
Linear spline interpolation (Convenience wrapper around B-spline with order=1). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html
Initialize linear interpolation.
Quadratic Interpolation
Name: "quadratic"
- class time_stream.infill.QuadraticInterpolation(**kwargs)[source]
Quadratic spline interpolation (Convenience wrapper around B-spline with order=2). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html
Initialize quadratic interpolation.
Quadratic Interpolation
Name: "cubic"
- class time_stream.infill.CubicInterpolation(**kwargs)[source]
Cubic spline interpolation (Convenience wrapper around B-spline with order=3). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html
Initialize cubic interpolation.
Akima Interpolation
Name: "akima"
- class time_stream.infill.AkimaInterpolation(**kwargs)[source]
Akima interpolation using scipy (good for avoiding oscillations). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.Akima1DInterpolator.html
Initialize a scipy interpolation method.
- Parameters:
**kwargs – Additional parameters passed to scipy interpolator method.
PCHIP Interpolation
Name: "pchip"
- class time_stream.infill.PchipInterpolation(**kwargs)[source]
PCHIP interpolation using scipy (preserves monotonicity). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.PchipInterpolator.html
Initialize a scipy interpolation method.
- Parameters:
**kwargs – Additional parameters passed to scipy interpolator method.