Infilling
The infill module of the Time Stream library provides various methods for filling missing values in your time series data. Missing data is a common challenge in time series analysis, whether due to sensor failures, network outages, data transmission errors, or scheduled maintenance periods.
The infill system in the Time Stream library provides a flexible framework for filling in missing values in your
time series data. It allows users to define and apply infilling to individual columns of a TimeSeries
object.
Applying an Infilling Procedure
To apply infilling, call the TimeSeries.infill
method on a TimeSeries
object. This method allows you to:
Specify the infill method (see below for available built-in methods)
Choose the column to infill
Optionally limit the infilling to a time observation window
Optionally limit the infilling to a maximum gap window size, to avoid unrealistic estimates across large missing periods
Note
Nulls at the beginning and end of the time series remain will remain null as there is no pre- or post- data to constrain the infilling method.
Built-in Infilling Methods
Several built-in infilling methods are available. These are built upon well established methods from the SciPy data science library:
- Polynomial Interpolation
Linear: Simple straight-line interpolation between points
Quadratic: Smooth curves using second-order polynomials
Cubic: Natural-looking curves using third-order polynomials
B-Spline: Flexible piecewise polynomials with configurable order
- Shape-Preserving Methods
PCHIP: Preserves monotonicity and avoids overshoots
Akima: Reduces oscillations in data with rapid changes
Each method supports configuration through parameters specific to that method.
The examples given below all use this TimeSeries
object:
np.random.seed(42)
# Set up a daily time series with varying gaps
dates = [
datetime(2024, 1, 1), datetime(2024, 1, 2), # One-day gap,
datetime(2024, 1, 4), datetime(2024, 1, 5), datetime(2024, 1, 6),
datetime(2024, 1, 7), # Two-day gap,
datetime(2024, 1, 10), datetime(2024, 1, 11), datetime(2024, 1, 12),
# Three-day gap,
datetime(2024, 1, 16)
]
# Create example random column data
df = pl.DataFrame({
"timestamp": dates,
"temperature": np.arange(len(dates)) * 0.5 + np.random.normal(0, 2, len(dates)),
})
ts = TimeSeries(
df=df,
time_name="timestamp",
resolution=Period.of_days(1),
periodicity=Period.of_days(1),
pad=True
)
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null │
│ 2024-01-09 00:00:00 ┆ null │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ 5.58512 │
└─────────────────────┴─────────────┘
Examples
methods = ["linear", "quadratic", "cubic", "pchip", "akima"]
result = ts.df.clone()
for method in methods:
ts_infilled = ts.infill(method, "temperature")
ts_infilled.df = ts_infilled.df.rename({"temperature": method})
result = result.join(ts_infilled.df, on="timestamp", how="full").drop("timestamp_right")
shape: (16, 7)
┌─────────────────────┬─────────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ timestamp ┆ temperature ┆ linear ┆ quadratic ┆ cubic ┆ pchip ┆ akima │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪═════════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null ┆ 1.259424 ┆ 0.546161 ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null ┆ 3.407293 ┆ 3.727706 ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null ┆ 4.782859 ┆ 5.658006 ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null ┆ 3.692068 ┆ 1.96472 ┆ 1.197094 ┆ 3.10049 ┆ 2.34322 │
│ 2024-01-14 00:00:00 ┆ null ┆ 4.323086 ┆ 2.019954 ┆ 0.375071 ┆ 3.37656 ┆ 2.68997 │
│ 2024-01-15 00:00:00 ┆ null ┆ 4.954103 ┆ 3.226754 ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 ┆ 5.58512 │
└─────────────────────┴─────────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
Specifying Maximum Gap Size
The infill
method accepts a max_gap_size
parameter, which indicates the maximum size of consecutive null gaps
that should be filled. Any gap larger than this will not be infilled and will remain as null.
# The gap of 3 missing dates will not be infilled
ts_infilled = ts.infill("linear", "temperature", max_gap_size=2)
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ 1.259424 │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ 3.407293 │
│ 2024-01-09 00:00:00 ┆ 4.782859 │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ 5.58512 │
└─────────────────────┴─────────────┘
Applying Infilling during a specific time range
The observation_interval
argument can be used to constrain the infilling to a chunk of your time series.
# Only infill data between specific dates
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 1, 5)
ts_infilled = ts.infill("linear", "temperature", observation_interval=(start_date, end_date))
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ 1.259424 │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null │
│ 2024-01-09 00:00:00 ┆ null │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ 5.58512 │
└─────────────────────┴─────────────┘
Nulls at the start and end are maintained
# Set gaps at the start and end
ts.df = ts.df.with_columns(
pl.when(pl.col("timestamp").is_in([datetime(2024, 1, 1), datetime(2024, 1, 16)]))
.then(None)
.otherwise(pl.col("temperature"))
.alias("temperature")
)
ts_infilled = ts.infill("linear", "temperature")
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp ┆ temperature │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ null │
│ 2024-01-02 00:00:00 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ 1.259424 │
│ 2024-01-04 00:00:00 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606 │
│ 2024-01-06 00:00:00 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ 3.407293 │
│ 2024-01-09 00:00:00 ┆ 4.782859 │
│ 2024-01-10 00:00:00 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null │
│ 2024-01-14 00:00:00 ┆ null │
│ 2024-01-15 00:00:00 ┆ null │
│ 2024-01-16 00:00:00 ┆ null │
└─────────────────────┴─────────────┘
Linear Interpolation
Name: "linear"
- class time_stream.infill.LinearInterpolation(**kwargs)[source]
Linear spline interpolation (Convenience wrapper around B-spline with order=1). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html
Initialize linear interpolation.
Quadratic Interpolation
Name: "quadratic"
- class time_stream.infill.QuadraticInterpolation(**kwargs)[source]
Quadratic spline interpolation (Convenience wrapper around B-spline with order=2). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html
Initialize quadratic interpolation.
Quadratic Interpolation
Name: "cubic"
- class time_stream.infill.CubicInterpolation(**kwargs)[source]
Cubic spline interpolation (Convenience wrapper around B-spline with order=3). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html
Initialize cubic interpolation.
Akima Interpolation
Name: "akima"
- class time_stream.infill.AkimaInterpolation(**kwargs)[source]
Akima interpolation using scipy (good for avoiding oscillations). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.Akima1DInterpolator.html
Initialize a scipy interpolation method.
- Parameters:
**kwargs – Additional parameters passed to scipy interpolator method.
PCHIP Interpolation
Name: "pchip"
- class time_stream.infill.PchipInterpolation(**kwargs)[source]
PCHIP interpolation using scipy (preserves monotonicity). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.PchipInterpolator.html
Initialize a scipy interpolation method.
- Parameters:
**kwargs – Additional parameters passed to scipy interpolator method.