Infilling

The infill module of the Time Stream library provides various methods for filling missing values in your time series data. Missing data is a common challenge in time series analysis, whether due to sensor failures, network outages, data transmission errors, or scheduled maintenance periods.

The infill system in the Time Stream library provides a flexible framework for filling in missing values in your time series data. It allows users to define and apply infilling to individual columns of a TimeSeries object.

Applying an Infilling Procedure

To apply infilling, call the TimeSeries.infill method on a TimeSeries object. This method allows you to:

  • Specify the infill method (see below for available built-in methods)

  • Choose the column to infill

  • Optionally limit the infilling to a time observation window

  • Optionally limit the infilling to a maximum gap window size, to avoid unrealistic estimates across large missing periods

Note

Nulls at the beginning and end of the time series remain will remain null as there is no pre- or post- data to constrain the infilling method.

Built-in Infilling Methods

Several built-in infilling methods are available. These are built upon well established methods from the SciPy data science library:

Polynomial Interpolation
  • Linear: Simple straight-line interpolation between points

  • Quadratic: Smooth curves using second-order polynomials

  • Cubic: Natural-looking curves using third-order polynomials

  • B-Spline: Flexible piecewise polynomials with configurable order

Shape-Preserving Methods
  • PCHIP: Preserves monotonicity and avoids overshoots

  • Akima: Reduces oscillations in data with rapid changes

Each method supports configuration through parameters specific to that method.

The examples given below all use this TimeSeries object:

np.random.seed(42)

# Set up a daily time series with varying gaps
dates = [
    datetime(2024, 1, 1), datetime(2024, 1, 2), # One-day gap,
    datetime(2024, 1, 4), datetime(2024, 1, 5), datetime(2024, 1, 6),
    datetime(2024, 1, 7), # Two-day gap,
    datetime(2024, 1, 10), datetime(2024, 1, 11), datetime(2024, 1, 12),
    # Three-day gap,
    datetime(2024, 1, 16)
]

# Create example random column data
df = pl.DataFrame({
    "timestamp": dates,
    "temperature": np.arange(len(dates)) * 0.5 + np.random.normal(0, 2, len(dates)),
})

ts = TimeSeries(
    df=df,
    time_name="timestamp",
    resolution=Period.of_days(1),
    periodicity=Period.of_days(1),
    pad=True
)
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ null        │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ null        │
│ 2024-01-09 00:00:00 ┆ null        │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ 5.58512     │
└─────────────────────┴─────────────┘

Examples

methods = ["linear", "quadratic", "cubic", "pchip", "akima"]
result = ts.df.clone()
for method in methods:
    ts_infilled = ts.infill(method, "temperature")
    ts_infilled.df = ts_infilled.df.rename({"temperature": method})
    result = result.join(ts_infilled.df, on="timestamp", how="full").drop("timestamp_right")
shape: (16, 7)
┌─────────────────────┬─────────────┬──────────┬───────────┬──────────┬──────────┬──────────┐
│ timestamp           ┆ temperature ┆ linear   ┆ quadratic ┆ cubic    ┆ pchip    ┆ akima    │
│ ---                 ┆ ---         ┆ ---      ┆ ---       ┆ ---      ┆ ---      ┆ ---      │
│ datetime[μs]        ┆ f64         ┆ f64      ┆ f64       ┆ f64      ┆ f64      ┆ f64      │
╞═════════════════════╪═════════════╪══════════╪═══════════╪══════════╪══════════╪══════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    ┆ 0.993428 ┆ 0.993428  ┆ 0.993428 ┆ 0.993428 ┆ 0.993428 │
│ 2024-01-02 00:00:00 ┆ 0.223471    ┆ 0.223471 ┆ 0.223471  ┆ 0.223471 ┆ 0.223471 ┆ 0.223471 │
│ 2024-01-03 00:00:00 ┆ null        ┆ 1.259424 ┆ 0.546161  ┆ 0.365671 ┆ 0.889524 ┆ 1.000306 │
│ 2024-01-04 00:00:00 ┆ 2.295377    ┆ 2.295377 ┆ 2.295377  ┆ 2.295377 ┆ 2.295377 ┆ 2.295377 │
│ 2024-01-05 00:00:00 ┆ 4.54606     ┆ 4.54606  ┆ 4.54606   ┆ 4.54606  ┆ 4.54606  ┆ 4.54606  │
│ 2024-01-06 00:00:00 ┆ 1.531693    ┆ 1.531693 ┆ 1.531693  ┆ 1.531693 ┆ 1.531693 ┆ 1.531693 │
│ 2024-01-07 00:00:00 ┆ 2.031726    ┆ 2.031726 ┆ 2.031726  ┆ 2.031726 ┆ 2.031726 ┆ 2.031726 │
│ 2024-01-08 00:00:00 ┆ null        ┆ 3.407293 ┆ 3.727706  ┆ 4.038041 ┆ 3.404058 ┆ 3.527325 │
│ 2024-01-09 00:00:00 ┆ null        ┆ 4.782859 ┆ 5.658006  ┆ 5.671975 ┆ 5.239764 ┆ 5.265495 │
│ 2024-01-10 00:00:00 ┆ 6.158426    ┆ 6.158426 ┆ 6.158426  ┆ 6.158426 ┆ 6.158426 ┆ 6.158426 │
│ 2024-01-11 00:00:00 ┆ 5.034869    ┆ 5.034869 ┆ 5.034869  ┆ 5.034869 ┆ 5.034869 ┆ 5.034869 │
│ 2024-01-12 00:00:00 ┆ 3.061051    ┆ 3.061051 ┆ 3.061051  ┆ 3.061051 ┆ 3.061051 ┆ 3.061051 │
│ 2024-01-13 00:00:00 ┆ null        ┆ 3.692068 ┆ 1.96472   ┆ 1.197094 ┆ 3.10049  ┆ 2.34322  │
│ 2024-01-14 00:00:00 ┆ null        ┆ 4.323086 ┆ 2.019954  ┆ 0.375071 ┆ 3.37656  ┆ 2.68997  │
│ 2024-01-15 00:00:00 ┆ null        ┆ 4.954103 ┆ 3.226754  ┆ 1.527056 ┆ 4.125893 ┆ 3.853278 │
│ 2024-01-16 00:00:00 ┆ 5.58512     ┆ 5.58512  ┆ 5.58512   ┆ 5.58512  ┆ 5.58512  ┆ 5.58512  │
└─────────────────────┴─────────────┴──────────┴───────────┴──────────┴──────────┴──────────┘
../_images/examples_infilling_plot_all_infills.svg

Specifying Maximum Gap Size

The infill method accepts a max_gap_size parameter, which indicates the maximum size of consecutive null gaps that should be filled. Any gap larger than this will not be infilled and will remain as null.

# The gap of 3 missing dates will not be infilled
ts_infilled = ts.infill("linear", "temperature", max_gap_size=2)
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ 1.259424    │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ 3.407293    │
│ 2024-01-09 00:00:00 ┆ 4.782859    │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ 5.58512     │
└─────────────────────┴─────────────┘

Applying Infilling during a specific time range

The observation_interval argument can be used to constrain the infilling to a chunk of your time series.

# Only infill data between specific dates
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 1, 5)

ts_infilled = ts.infill("linear", "temperature", observation_interval=(start_date, end_date))
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ 0.993428    │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ 1.259424    │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ null        │
│ 2024-01-09 00:00:00 ┆ null        │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ 5.58512     │
└─────────────────────┴─────────────┘

Nulls at the start and end are maintained

# Set gaps at the start and end
ts.df = ts.df.with_columns(
    pl.when(pl.col("timestamp").is_in([datetime(2024, 1, 1), datetime(2024, 1, 16)]))
    .then(None)
    .otherwise(pl.col("temperature"))
    .alias("temperature")
)

ts_infilled = ts.infill("linear", "temperature")
shape: (16, 2)
┌─────────────────────┬─────────────┐
│ timestamp           ┆ temperature │
│ ---                 ┆ ---         │
│ datetime[μs]        ┆ f64         │
╞═════════════════════╪═════════════╡
│ 2024-01-01 00:00:00 ┆ null        │
│ 2024-01-02 00:00:00 ┆ 0.223471    │
│ 2024-01-03 00:00:00 ┆ 1.259424    │
│ 2024-01-04 00:00:00 ┆ 2.295377    │
│ 2024-01-05 00:00:00 ┆ 4.54606     │
│ 2024-01-06 00:00:00 ┆ 1.531693    │
│ 2024-01-07 00:00:00 ┆ 2.031726    │
│ 2024-01-08 00:00:00 ┆ 3.407293    │
│ 2024-01-09 00:00:00 ┆ 4.782859    │
│ 2024-01-10 00:00:00 ┆ 6.158426    │
│ 2024-01-11 00:00:00 ┆ 5.034869    │
│ 2024-01-12 00:00:00 ┆ 3.061051    │
│ 2024-01-13 00:00:00 ┆ null        │
│ 2024-01-14 00:00:00 ┆ null        │
│ 2024-01-15 00:00:00 ┆ null        │
│ 2024-01-16 00:00:00 ┆ null        │
└─────────────────────┴─────────────┘

Linear Interpolation

Name: "linear"

class time_stream.infill.LinearInterpolation(**kwargs)[source]

Linear spline interpolation (Convenience wrapper around B-spline with order=1). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html

Initialize linear interpolation.

Quadratic Interpolation

Name: "quadratic"

class time_stream.infill.QuadraticInterpolation(**kwargs)[source]

Quadratic spline interpolation (Convenience wrapper around B-spline with order=2). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html

Initialize quadratic interpolation.

Quadratic Interpolation

Name: "cubic"

class time_stream.infill.CubicInterpolation(**kwargs)[source]

Cubic spline interpolation (Convenience wrapper around B-spline with order=3). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.make_interp_spline.html

Initialize cubic interpolation.

Akima Interpolation

Name: "akima"

class time_stream.infill.AkimaInterpolation(**kwargs)[source]

Akima interpolation using scipy (good for avoiding oscillations). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.Akima1DInterpolator.html

Initialize a scipy interpolation method.

Parameters:

**kwargs – Additional parameters passed to scipy interpolator method.

PCHIP Interpolation

Name: "pchip"

class time_stream.infill.PchipInterpolation(**kwargs)[source]

PCHIP interpolation using scipy (preserves monotonicity). https://docs.scipy.org/doc/scipy-1.16.1/reference/generated/scipy.interpolate.PchipInterpolator.html

Initialize a scipy interpolation method.

Parameters:

**kwargs – Additional parameters passed to scipy interpolator method.