Infilling

The infill module of the Time Stream library provides various methods for filling missing values in your time series data. Missing data is a common challenge in time series analysis, whether due to sensor failures, network outages, data transmission errors, or scheduled maintenance periods.

The infill system in the Time Stream library provides a flexible framework for filling in missing values in your time series data. It allows users to define and apply infilling to individual columns of a TimeSeries object.

Applying an Infilling Procedure

To apply infilling, call the TimeSeries.infill method on a TimeSeries object. This method allows you to:

  • Specify the infill method (see below for available built-in methods)

  • Choose the column to infill

  • Optionally limit the infilling to a time observation window

  • Optionally limit the infilling to a maximum gap window size, to avoid unrealistic estimates across large missing periods

Built-in Infilling Methods

Several built-in infilling methods are available. These are built upon well established methods from the SciPy data science library:

Polynomial Interpolation
  • Linear: Simple straight-line interpolation between points

  • Quadratic: Smooth curves using second-order polynomials

  • Cubic: Natural-looking curves using third-order polynomials

  • B-Spline: Flexible piecewise polynomials with configurable order

Shape-Preserving Methods
  • PCHIP: Preserves monotonicity and avoids overshoots

  • Akima: Reduces oscillations in data with rapid changes

Each method supports configuration through parameters specific to that method.

The examples given below all use this TimeSeries object:

np.random.seed(42)

# Set up a daily time series with varying gaps
dates = [
    datetime(2024, 1, 1), datetime(2024, 1, 2), # One-day gap,
    datetime(2024, 1, 4), datetime(2024, 1, 5), datetime(2024, 1, 6),
    datetime(2024, 1, 7), # Two-day gap,
    datetime(2024, 1, 10), datetime(2024, 1, 11), datetime(2024, 1, 12),
    # Three-day gap,
    datetime(2024, 1, 16)
]

# Create example random column data
df = pl.DataFrame({
    "timestamp": dates,
    "temperature": np.arange(len(dates)) * 0.5 + np.random.normal(0, 2, len(dates)),
})

ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    pad=True
)
shape: (16, 7)
┌─────────────────────┬─────────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ timestamp           ┆ temperature ┆ linear   ┆ quadratic ┆ cubic    ┆ pchip    ┆ akima    │
│ ---                 ┆ ---         ┆ ---      ┆ ---       ┆ ---      ┆ ---      ┆ ---      │
│ datetime[μs]        ┆ f64         ┆ f64      ┆ f64       ┆ f64      ┆ f64      ┆ f64      │
╞═════════════════════╪═════════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    ┆ 0.993428 ┆ 0.993428  ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471    ┆ 0.223471 ┆ 0.223471  ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null        ┆ 1.259424 ┆ 0.546161  ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377    ┆ 2.295377 ┆ 2.295377  ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606     ┆ 4.54606  ┆ 4.54606   ┆ 4.54606  ┆ 4.54606  ┆ 4.54606  │
│ 2024-01-06 00:00:00 ┆ 1.531693    ┆ 1.531693 ┆ 1.531693  ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726    ┆ 2.031726 ┆ 2.031726  ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null        ┆ 3.407293 ┆ 3.727706  ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null        ┆ 4.782859 ┆ 5.658006  ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426    ┆ 6.158426 ┆ 6.158426  ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869    ┆ 5.034869 ┆ 5.034869  ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051    ┆ 3.061051 ┆ 3.061051  ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null        ┆ 3.692068 ┆ 1.96472   ┆ 1.197094 ┆ 3.10049  ┆ 2.34322  │
│ 2024-01-14 00:00:00 ┆ null        ┆ 4.323086 ┆ 2.019954  ┆ 0.375071 ┆ 3.37656  ┆ 2.68997  │
│ 2024-01-15 00:00:00 ┆ null        ┆ 4.954103 ┆ 3.226754  ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512     ┆ 5.58512  ┆ 5.58512   ┆ 5.58512  ┆ 5.58512  ┆ 5.58512  │
└─────────────────────┴─────────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ null        │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ null        │
│ 2024-01-09 00:00:00 ┆ null        │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ 5.58512     │
└─────────────────────┴─────────────┘

Examples

methods = ["linear", "quadratic", "cubic", "pchip", "akima"]
result = ts.df.clone()
for method in methods:
    ts_infilled = ts.infill(method, "temperature")
    ts_infilled.df = ts_infilled.df.rename({"temperature": method})
    result = result.join(ts_infilled.df, on="timestamp", how="full").drop("timestamp_right")
shape: (16, 7)
┌─────────────────────┬─────────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ timestamp           ┆ temperature ┆ linear   ┆ quadratic ┆ cubic    ┆ pchip    ┆ akima    │
│ ---                 ┆ ---         ┆ ---      ┆ ---       ┆ ---      ┆ ---      ┆ ---      │
│ datetime[μs]        ┆ f64         ┆ f64      ┆ f64       ┆ f64      ┆ f64      ┆ f64      │
╞═════════════════════╪═════════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    ┆ 0.993428 ┆ 0.993428  ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471    ┆ 0.223471 ┆ 0.223471  ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null        ┆ 1.259424 ┆ 0.546161  ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377    ┆ 2.295377 ┆ 2.295377  ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606     ┆ 4.54606  ┆ 4.54606   ┆ 4.54606  ┆ 4.54606  ┆ 4.54606  │
│ 2024-01-06 00:00:00 ┆ 1.531693    ┆ 1.531693 ┆ 1.531693  ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726    ┆ 2.031726 ┆ 2.031726  ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null        ┆ 3.407293 ┆ 3.727706  ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null        ┆ 4.782859 ┆ 5.658006  ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426    ┆ 6.158426 ┆ 6.158426  ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869    ┆ 5.034869 ┆ 5.034869  ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051    ┆ 3.061051 ┆ 3.061051  ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null        ┆ 3.692068 ┆ 1.96472   ┆ 1.197094 ┆ 3.10049  ┆ 2.34322  │
│ 2024-01-14 00:00:00 ┆ null        ┆ 4.323086 ┆ 2.019954  ┆ 0.375071 ┆ 3.37656  ┆ 2.68997  │
│ 2024-01-15 00:00:00 ┆ null        ┆ 4.954103 ┆ 3.226754  ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512     ┆ 5.58512  ┆ 5.58512   ┆ 5.58512  ┆ 5.58512  ┆ 5.58512  │
└─────────────────────┴─────────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
../_images/examples_infilling_plot_all_infills.svg

Specifying Maximum Gap Size

The infill method accepts a max_gap_size parameter, which indicates the maximum size of consecutive null gaps that should be filled. Any gap larger than this will not be infilled and will remain as null.

# The gap of 3 missing dates will not be infilled
ts_infilled = ts.infill("linear", "temperature", max_gap_size=2)
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ 1.259424    │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ 3.407293    │
│ 2024-01-09 00:00:00 ┆ 4.782859    │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ 5.58512     │
└─────────────────────┴─────────────┘

Applying Infilling during a specific time range

The observation_interval argument can be used to constrain the infilling to a chunk of your time series.

# Only infill data between specific dates
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 1, 5)

ts_infilled = ts.infill("linear", "temperature", observation_interval=(start_date, end_date))
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ 1.259424    │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ null        │
│ 2024-01-09 00:00:00 ┆ null        │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ 5.58512     │
└─────────────────────┴─────────────┘

Linear Interpolation

Name: "linear"

class time_stream.infill.LinearInterpolation(**kwargs)[source]

Linear spline interpolation (Convenience wrapper around B-spline with order=1). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html

Initialize linear interpolation.

Quadratic Interpolation

Name: "quadratic"

class time_stream.infill.QuadraticInterpolation(**kwargs)[source]

Quadratic spline interpolation (Convenience wrapper around B-spline with order=2). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html

Initialize quadratic interpolation.

Quadratic Interpolation

Name: "cubic"

class time_stream.infill.CubicInterpolation(**kwargs)[source]

Cubic spline interpolation (Convenience wrapper around B-spline with order=3). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html

Initialize cubic interpolation.

Akima Interpolation

Name: "akima"

class time_stream.infill.AkimaInterpolation(**kwargs)[source]

Akima interpolation using scipy (good for avoiding oscillations). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.Akima1DInterpolator.html

Initialize a scipy interpolation method.

Parameters:

**kwargs – Additional parameters passed to scipy interpolator method.

PCHIP Interpolation

Name: "pchip"

class time_stream.infill.PchipInterpolation(**kwargs)[source]

PCHIP interpolation using scipy (preserves monotonicity). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.PchipInterpolator.html

Initialize a scipy interpolation method.

Parameters:

**kwargs – Additional parameters passed to scipy interpolator method.