Quality control checks

The quality control (QC) system in the Time Stream library provides a flexible framework for flagging potential issues in time series data. It allows users to define and apply QC checks to individual columns of a TimeSeries object, the results of which can then be used to update flag columns or lead to further inspection or filtering of your data.

Applying a QC Check

To apply a QC check, call the TimeSeries.qc_check method on a TimeSeries object. This method allows you to:

  • Specify the check type (see below for available built-in quality control checks)

  • Choose the column to evaluate

  • Optionally limit the QC check to a time observation window

Built-in Quality Control Checks

Several built-in QC checks are available. Each check encapsulates a validation rule and supports configuration through parameters specific to that check.

The examples given below all use this TimeSeries object:

dates = [datetime(2023, 1, 1) + timedelta(hours=i) for i in range(10)]
temperatures = [24, 22, -35, 26, 24, 26, 28, 50, 52, 29]
precipitation = [-3, 0, 5, 10, 2, 0, 0, 3, 1, 0]
sensor_codes = [992, 1, 1, 1, 1, 1, 1, 991, 995, 1]

df = pl.DataFrame({
    "timestamp": dates,
    "temperature": temperatures,
    "precipitation": precipitation,
    "sensor_codes": sensor_codes
})

ts = TimeSeries(
    df=df,
    time_name="timestamp"
)

Comparison Check

Compares values in the TimeSeries with a constant value using a specified operator.

Name: "comparison"

class time_stream.qc.ComparisonCheck(compare_to, operator, flag_na=False)[source]

Compares values against a given value using a comparison operator.

Initialize comparison check.

Parameters:
  • compare_to (float | List) – The value for comparison.

  • operator (str) – Comparison operator. One of: ‘>’, ‘>=’, ‘<’, ‘<=’, ‘==’, ‘!=’, ‘is_in’.

  • flag_na (bool | None) – If True, also flag NaN/null values as failing the check. Defaults to False.

The is_in operator is a special case, where you must pass a list of values to the check against. The check then flags results based on whether a value in the TimeSeries is within this list.

Examples

1. Temperature greater than or equal to 50

ts.df = ts.df.with_columns(
    ts.qc_check(
        "comparison",
        check_column="temperature",
        compare_to=50,
        operator=">="
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ false     │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ false     │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ true      │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ true      │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ false     │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

2. Precipitation less than 0

ts.df = ts.df.with_columns(
    ts.qc_check(
        "comparison",
        check_column="precipitation",
        compare_to=0,
        operator="<"
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ true      │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ false     │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ false     │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ false     │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ false     │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

3. Sensor codes within a list

ts.df = ts.df.with_columns(
    ts.qc_check(
        "comparison",
        check_column="sensor_codes",
        compare_to=[991, 992, 993, 994, 995],
        operator="is_in"
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ true      │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ false     │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ true      │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ true      │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ false     │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

Range Check

Flags values in the TimeSeries outside or within a specified value range.

Name: "range"

class time_stream.qc.RangeCheck(min_value, max_value, closed='both', within=True)[source]

Check that values fall within an acceptable range.

Initialize range check.

Parameters:
  • min_value (float | time | date | datetime) – Minimum of the range.

  • max_value (float | time | date | datetime) – Maximum of the range.

  • closed (str | ClosedInterval | None) – Define which sides of the interval are closed (inclusive) {‘both’, ‘left’, ‘right’, ‘none’} (default = “both”)

  • within (bool | None) – Whether values get flagged when within or outside the range (default = True (within)).

Examples

1. Temperatures outside of min and max range (below -30 and above 50)

ts.df = ts.df.with_columns(
    ts.qc_check(
        "range",
        check_column="temperature",
        min_value=-10,
        max_value=50,
        closed="none",  # Range is not inclusive of min and max value
        within=False, # Flag values outside of this range
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ false     │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ true      │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ true      │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ true      │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ false     │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

2. Precipitation values between -3 and 1

ts.df = ts.df.with_columns(
    ts.qc_check(
        "range",
        check_column="precipitation",
        min_value=-3,
        max_value=1,
        closed="both",  # Range is inclusive of min and max value
        within=True, # Flag values inside of this range
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ true      │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ false     │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ false     │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ true      │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ true      │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

Time Range Check

Flags values in the TimeSeries outside or within a specified time range in the TimeSeries primary time column.

This can either be used with min / max values of:

  • datetime.time : Useful for scenarios where there are consistent errors at a certain time of day, e.g., during an automated sensor calibration time.

  • datetime.date : Useful for scenarios where a specific date range is known to be bad, e.g., during a date range of known sensor malfunction.

  • datetime.datetime : As above, but where there you need to add a time to the date range as well.

Name: "time_range"

Note

This is equivalent to using RangeCheck with check_column = ts.time_name. This is a convenience method to be explicit that we are working with the primary time column in the TimeSeries object.

Examples

1. Flag values between the hours of 01:00 and 03:00

ts.df = ts.df.with_columns(
    ts.qc_check(
        "time_range",
        check_column="temperature",
        min_value=time(1, 0),
        max_value=time(3, 0)
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ false     │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ true      │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ true      │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ false     │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ false     │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ false     │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

2. Flag values between 03:30 on the 1st January and 09:30 on the 1st January

ts.df = ts.df.with_columns(
    ts.qc_check(
        "time_range",
        check_column="temperature",
        min_value=datetime(2023, 1, 1, 3, 30),
        max_value=datetime(2023, 1, 1, 9, 30),
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ false     │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ false     │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ true      │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ true      │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ true      │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ true      │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

3. Flag values between 1st January and the 2nd January

ts.df = ts.df.with_columns(
    ts.qc_check(
        "time_range",
        check_column="temperature",
        min_value=date(2023, 1, 1),
        max_value=date(2023, 1, 2),
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ true      │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ true      │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ true      │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ true      │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ true      │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ true      │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ true      │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ true      │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

Spike Check

Flags sudden jumps between values based on their differences with adjacent values (both previous and next).

Note

The first and last values in a time series cannot be assessed by the spike test as it requires neighbouring values. The result for the first and last items will be set to NULL.

Name: "spike"

class time_stream.qc.SpikeCheck(threshold)[source]

Detect spikes by assessing differences with neighboring values.

Initialize spike detection check.

Parameters:

threshold (float) – The spike detection threshold.

Examples

Spike check on temperature

Note

Note that the result doesn’t flag the neighbouring high values of 50, 52. The spike test is really for detecting a sudden jump with one value between “normal” values.

ts.df = ts.df.with_columns(
        ts.qc_check(
        "spike",
        check_column="temperature",
        threshold=10.0
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ null      │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ true      │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ false     │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ false     │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ null      │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘

Applying QC checks during a specific time range

The observation_interval argument can be used to constrain the QC check to a chunk of your time series.

# Only check data from specific dates
start_date = datetime(2023, 1, 5)
end_date = datetime(2023, 1, 10)

ts.df = ts.df.with_columns(
    ts.qc_check(
        "range",
        check_column="temperature",
        min_value=-10,
        max_value=50,
        closed="none",
        within=False,
        observation_interval=(start_date, end_date)
    ).alias("qc_result")
)

print(ts)
shape: (10, 5)
┌─────────────────────┬─────────────┬───────────────┬──────────────┬───────────┐
│ timestamp           ┆ temperature ┆ precipitation ┆ sensor_codes ┆ qc_result │
│ ---                 ┆ ---         ┆ ---           ┆ ---          ┆ ---       │
│ datetime[μs]        ┆ i64         ┆ i64           ┆ i64          ┆ bool      │
╞═════════════════════╪═════════════╪═══════════════╪══════════════╪═══════════╡
│ 2023-01-01 00:00:00 ┆ 24          ┆ -3            ┆ 992          ┆ false     │
│ 2023-01-01 01:00:00 ┆ 22          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 02:00:00 ┆ -35         ┆ 5             ┆ 1            ┆ false     │
│ 2023-01-01 03:00:00 ┆ 26          ┆ 10            ┆ 1            ┆ false     │
│ 2023-01-01 04:00:00 ┆ 24          ┆ 2             ┆ 1            ┆ false     │
│ 2023-01-01 05:00:00 ┆ 26          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 06:00:00 ┆ 28          ┆ 0             ┆ 1            ┆ false     │
│ 2023-01-01 07:00:00 ┆ 50          ┆ 3             ┆ 991          ┆ false     │
│ 2023-01-01 08:00:00 ┆ 52          ┆ 1             ┆ 995          ┆ false     │
│ 2023-01-01 09:00:00 ┆ 29          ┆ 0             ┆ 1            ┆ false     │
└─────────────────────┴─────────────┴───────────────┴──────────────┴───────────┘